===============================================
Goal: To multiply (2x2) matrices efficiently.
===============================================

Requirements: Implementation shall minimize the
              required computational work.

System Behavior
===============================================

Algorithm computes:

      c_11, c_12, c_21, and c_22.

      c_ij = sum {k = 1,2} a_ik * b_kj
      i = 1 and 2, j = 1 and 2.

System Structure
===============================================

How will we solve the problem:

Algorithm 1 : Compute Row/Column Dot products
===============================================

      c_11 = a_11 * b_11 + a_12 * b_21
      c_21 = a_21 * b_11 + a_22 * b_21
      c_12 = a_11 * b_12 + a_12 * b_22
      c_22 = a_21 * b_12 + a_22 * b_22

Algorithm 2 : Idea for Fast Multiplication
===============================================

How about:

      z1 = ( a_11 + a_22 ) * (b_11 + b_22)
      z2 = ( a_21 + a_22 ) * b_11
      z3 = a_11 * ( b_12 - b_22 )
      z4 = (a_11 + a_12) * b_22
      z5 = a_22 * ( b_21 - b_11 )
      z6 = ( a_21 - a_11 ) * ( b_11 + b_12 )
      z7 = ( a_12 - a_22 ) * ( b_21 + b_22 )

      c_11 = z1 + z5 - z4 + z7
      c_12 = z3 + z4
      c_21 = z2 + z5
      c_22 = z1 - z2 + z3 + z6

Evaluation and Ranking
===============================================

Algorithm 1 : 8 multiplies +  4 additions.
Algorithm 2 : 7 multiplies + 18 addition/subtractions.

Historically, matrix multiplies have been much much more
expensive than additions/subtractions. Suppose:

    work(  1 matrix multiply ) ==
    work( 20 matrix addition/subtractions )

Cost (algorithm 1) = 8*20 + 4  = 164.
Cost (algorithm 2) = 7*20 + 18 = 158.

Choose algorithm 2!!!!