Hsin-Po's Website


Postdoc @ UC Berkeley

Distributed Computation Paper

I have once worked on distributed computation of matrix multiplication.

[PlutoCharon20] deals with distributed matrix-matrix multiplication (DMM) where the workers might straggle or crash. By MM we mean that you want to compute $C = A \times B$, where $A, B$ are huge matrices with compatible dimensions. (One should distinguish this from matrix-vector multiplication, as the latter does not have a fast version.) By distributed we mean that there will be several workers, each computes a single entry multiplication $A_{ij} \times B_{jk}$. By straggling and crashing we mean that some workers might not respond timely, or they might just hang there indefinitely.

Straggling and crashing is a real issue in real world because, spontaneously, the network may be busy, the CPU may be overheat, or the circuit board may be hit by cosmic radiation and cannot recover from it. This makes the overall computation slow because you have to wait for the last worker to tell you the product $A_{ij}B_{jk}$ it is responsible for.

To compensate, you can hire more-than-necessary workers and ask them to do redundant computations. A possible way to create redundancy is to draw random vectors $g, h$ and then ask extra workers to compute $(gA) \times (Bh)$ on top of the usual routine that computes $A \times B$. Once this is done, the associativity equation $(gA)(Bh) = gCh$ will give you some parity checks that help recover the missing entries of $C$. The overhead of using parity checks to recover missing entries of $C$ is usually faster than waiting for the straggling workers to recover. So you can actually save time by paying for more CPU times.

The contribution of [PlutoCharon20] is three-fold.

The smallest working example of the Charon construction is $g, A, B, h$ being $2 \times 4$, $4 \times 4$, $4 \times 4$, $4 \times 2$. The computation of $A \times B$ costs 49 workers, the computation of $(gA) \times (Bh)$ costs 14 workers. Together you need 63 workers, one less than the naive algorithm uses (64). So you are using less works, yet you can recover from four erasures with high probability.