Mathematical truth VS Computing reality
Posted on
Matrix multiplication is at the core of most computationally expensive software pieces. Instances of huge matrix multiplications notably occur in both artifical intelligence and the Finite Element Method - used for a variety of applications like simulating solid structures' deformations. Those are two very important instances of such problems. Another lesser known example, but as important as those ones, is the Google Matrix. This matrix is huge, yet speed in multiplying it with any given list of vectors is essential to Google. This is why matrix multiplication has always been the focus of the advance of computing power on CPU, and now on GPU.
In this post, I would like to exhibit an example of how to accelerate some matrix multiplications, in the case where at least 3 matrices with different shapes are involved. This is known as a "Matrix Chain Multiplication".
Let A, B and C be 3 matrices. And assume we need to compute A.B.C Then maths tell us that we could either compute A.B, them multiply the result by C. Which we could write: (A.B).C Or the other way around, we could compute B.C in the first place. And then multiply A by the result of that sub-multiplication. In other words: A.(B.C) Those two methods are mathematically equivalent, since matrix multiplication is an associative operation. So it looks like there is no difference in implementing one or the other.
However, it is sometimes worth considering this from the point of view of the computer. For instance, assume the following sizes for the matrices:
- A has size 1x3
- B has size 3x2
- C has size 2x4
Just as a reminder, computing the product of 2 matrices of respective sizes RxM and MxC requires RxCxM multiplications.
Then, computing (A.B).C would require (1x3x2)+(1x2x4) = 6+8 = 14 multiplications.
The second way of doing it would require (3x2x4)+(1x3x4) = 24+12 = 36 multiplications! More than twice the number of multiplications of the first alternative!
Therefore, I encourage you to really take your time and properly choose the best order of matrix multiplication in your algorithms.
Another benefit of doing less multiplications is it can also improve the numerical stability of your computations. Which is a subject I leave here for a later post.