Recursive Blocked Algorithms for Dense Matrix Computations

Erik Elmroth1, Fred Gustavson1, Isak Jonsson2, and Bo Kågström2

Abstract

Today block-based algorithms are the standard in state-of-the-art software for dense linear algebra computations. These algorithms perform most of their computations in calls to high-performance level 3 BLAS. The level 3 BLAS obtain most of their performance by data rearrangements and the use of high-performance atomic kernel routines. The current state-of-the-art BLAS are designed to exploit the architecture design while maintaining the functionality of the BLAS and thereby guarantee high performance and portability of dense linear algebra codes. The level 3 factorization algorithms typifed by LAPACK make repeated calls to the level 3 BLAS with matrix operands equal to submatrices of fixed size. This results in multiple data copying on operands that are related. How can this be challenged? An answer is to change the data structure of the BLAS input matrices to reflect this relationship, thereby removing any data copying from the BLAS. The obvious choice is to store matrices as blocks, i.e., submatrices of block-partioned matrices, instead of the standard column-major (Fortran) or row-major (C) orderings.

  Recursion is a key concept for matching an algorithm and its data structure. A recursive algorithm leads to automatic blocking which is variable and "squarish". This layered and variable blocking allow for good data locality, which makes it possible to approach peak performance on today's memory tiered processors.

  In this contribution, we give an overview of our research in recursive blocked algorithms and data formats for dense matrix computations, including level 3 BLAS, dense matrix factorizations for full storage and packed storage, and triangular Sylvestertype matrix equations. One important application of triangular matrix equations is condition estimation and error bounds for different eigenspace problems.

  This research was conducted using the resources of the High Performance Computing Center North (HPC2N) and UNI-C.


  1. IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, U.S.A.
    E-mail: gustav@watson.ibm.com
  2. Department of Computing Science and HPC2N, Umeå University, SE-901 87 Umeå, Sweden
    E-mail: {isak,isak,bokg}@cs.umu.se