Towards Peak Performance on Hierarchical Memory Architectures -
New Recursive Blocked Data Formats and BLAS's

Fred Gustavson1, Isak Jonsson2, Bo Kågström2, and Per Ling2

Abstract

In 1998, a first version of the GEMM-based Level 3 BLAS for superscalar type processors was announced. This version was adopted by many organizations including the ATLAS and PHiPAC projects. Here, we describe recent developments of the project where techniques to handle symmetric multiprocessing (SMP) and deep memory hierarchies efficiently, are incorporated. These techniques include recursive algorithms and recursive blocked data formats which lead to routines that automatically adapt to complex memory hierarchies and naturally facilitate parallelization, due to improved data locality.
  1. IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, U.S.A.
    E-mail: gustav@watson.ibm.com
  2. Department of Computing Science and HPC2N, Umeå University, SE-901 87 Umeå, Sweden
    E-mail: {isak,bokg,pol}@cs.umu.se