next up previous contents index CD CD Algorithms
Next: Determinants and Permanents Up: Numerical Problems Previous: Bandwidth Reduction

Matrix Multiplication



Input description: An tex2html_wrap_inline27002 matrix A and a tex2html_wrap_inline27004 matrix B.

Problem description: The tex2html_wrap_inline27006 matrix tex2html_wrap_inline27008 .

Discussion: Although matrix multiplication is an important problem in linear algebra, its main significance for combinatorial algorithms   is its equivalence to a variety of other problems, such as transitive closure and reduction, solving linear systems, and matrix inversion. Thus a faster algorithm for matrix multiplication implies faster algorithms for all of these problems. Matrix multiplication arises in its own right in computing the results of such coordinate transformations as scaling, rotation, and translation for robotics and computer graphics.        

The straightforward algorithm to compute the product of tex2html_wrap_inline27010 matrix A and tex2html_wrap_inline27012 matrix B runs in O(x y z) time and is tough to beat in practice:

for i=1 to x do

for j = 1 to z


In two multiplying bandwidth-b matrices, where all nonzero elements of A and B lie within b elements of the main diagonals,   a speedup to O(x b z) is possible, since zero elements will not contribute to the product.

Asymptotically faster algorithms for matrix multiplication exist, based on clever divide-and-conquer recurrences. However, these prove difficult to program and require very large matrices to beat the trivial algorithm.   In particular, some empirical results show that Strassen's tex2html_wrap_inline27024 algorithm is unlikely to beat the straightforward algorithm for tex2html_wrap_inline27026 , and it is less numerically stable to boot. Other studies have been more encouraging, claiming that the crossover point is as low as tex2html_wrap_inline27028 . Still, I consider it unlikely that you will speed up any serious application by implementing Strassen's algorithm.

There is a better way to save computation when you are multiplying a chain of more than two matrices together.    Recall that multiplying an tex2html_wrap_inline27030 matrix by a tex2html_wrap_inline27032 matrix creates an tex2html_wrap_inline27034 matrix. Thus multiplying a chain of matrices from left to right might create large intermediate matrices, each taking a lot of time to compute. Matrix multiplication is not commutative, but it is associative, so we can parenthesize the chain in whatever manner we deem best without changing the final product.    A standard dynamic programming algorithm can be used to construct the optimal parenthesization. Whether it pays to do this optimization will depend upon whether your matrices are large enough or your chain is multiplied often enough to justify it. Note that we are optimizing over the sizes of the dimensions in the chain, not the actual matrices themselves. If all your matrices are the same dimensions, you are out of luck, since no such optimization is possible.

Matrix multiplication has a particularly interesting interpretation in counting the number of paths between two vertices in a graph.     Let A be the adjacency matrix of a graph G, meaning A[i,j] = 1 if there is an edge between i and j. Otherwise, A[i,j] = 0.   Now consider the square of this matrix, tex2html_wrap_inline27040 . If tex2html_wrap_inline27042 , this means that there must be a k such that A[i,k]=A[k,j]=1, so i to k to j is a path of length 2 in G. More generally, tex2html_wrap_inline27046 counts the number of paths of length exactly k between i and j. This count includes nonsimple paths, where vertices are repeated, such as i to k to i.

Implementations: The quick and dirty tex2html_wrap_inline27048 algorithm will be your best bet unless your matrices are very large. For example, [CLR90] suggests that n>45 before you have a hope of winning. Experimental results suggest that n > 100 is more realistic [CR76], with Bailey [BLS91] finding a crossover point of n=128 for Cray systems. Strassen's algorithm is difficult to implement efficiently because of the data structures required to maintain the array partitions.     That said, an implementation of Strassen's algorithm in Mathematica by Stan Wagon is offered ``without promise of efficiency'' on the algorithm repository WWW site.

The linear algebra library of choice is LAPACK, a descendant of LINPACK [DMBS79], which includes several routines for matrix multiplication. These Fortran codes are available from Netlib as discussed in Section gif.     

Algorithm 601 [McN83] of the Collected Algorithms of the ACM is a sparse matrix package written in Fortran that includes routines to multiply any combination of sparse and dense matrices.     See Section gif for details.

XTango (see Section gif) is an algorithm animation system for UNIX and X-windows   that includes an animation of the tex2html_wrap_inline27056 matrix multiplication algorithm.   A C++, tex2html_wrap_inline27058 implementation of matrix multiplication is embedded in LEDA (see Section gif).   

Notes: Winograd's algorithm for fast matrix multiplication reduces the number of multiplications by a factor of two over the straightforward algorithm. It is implementable, although the additional bookkeeping required makes it doubtful whether it is a win.   Expositions on Winograd's algorithm [Win68] include [CLR90, Man89, Win80].

In my opinion, the history of theoretical algorithm design began when Strassen published his tex2html_wrap_inline27060 -time matrix multiplication algorithm.   For the first time, improving an algorithm in the asymptotic sense became a respected goal in its own right. Good expositions on Strassen's algorithm [Str69] include [Baa88, CLR90, Cra94]. Progressive improvements to Strassen's algorithm have gotten progressively less practical. The current best result for matrix multiplication is Coppersmith and Winograd's [CW87] tex2html_wrap_inline27062 algorithm, while the conjecture is that tex2html_wrap_inline27064 suffices.

The interest in the squares of graphs goes beyond counting paths.   Fleischner [Fle74] proved that the square of any biconnected graph has a Hamiltonian cycle.       See [LS95] for results on finding the square roots of graphs, i.e. finding A given tex2html_wrap_inline27066 .

The problem of Boolean matrix multiplication can be reduced to that of general matrix multiplication [CLR90]. The four-Russians algorithm for Boolean matrix multiplication [ADKF70] uses preprocessing to construct all subsets of tex2html_wrap_inline27068 rows for fast retreival in performing the actual multiplication, yielding a complexity of tex2html_wrap_inline27070 .   Additional preprocessing can improve this to tex2html_wrap_inline27072 [Ryt85]. An exposition on the four-Russians algorithm, including this speedup, appears in [Man89].  

Good expositions of the matrix-chain algorithm include [Baa88, CLR90], where it is a standard example of dynamic programming.  

Related Problems: Solving linear equations (see page gif), shortest path (see page gif).    

next up previous contents index CD CD Algorithms
Next: Determinants and Permanents Up: Numerical Problems Previous: Bandwidth Reduction

Mon Jun 2 23:33:50 EDT 1997