A Sparse Matrix Family to Confound All Solvers

August 29, 2022

Suppose a sparse matrix \( A \in \mathbb{R} ^ {m\times m} \) has \( n _ z \) nonzeros. Trivially we know that it takes \( O( n _ z ) \) bytes to store \( A \) and also trivially the time complexity of matrix-vector products \( Ax \) should be only \( O( n _ z) \). But what about the inverse of \( A \)? It has proven extremely challenging to practically bound the memory and time complexity of \( A^{-1} x \) based only on the sparsity and numerical information available from \( A \). Instead we have bounds for specific algorithms for computing \( A^{-1} x \) and these algorithms only work for very specific families of matrices, I provide A non-exhaustive sampling below

Sparse Matrix family \(A^{-1}x\) Time Complexity \(A^{-1}x\) Memory Complexity Algorithm
Banded matrices\( O(n _ z) \)\( O(n _ z) \)Banded LU factorization,data-sparse decompositions
M matrices\( O(\log(m) n _ z) \)\( O(\log(m) n _ z) \)Algebraic multigrid
Positive Definite matricesProvably halts\( O( n _ z ) \)Most Krylov iterative methods (unpreconditioned)
Diagonally Dominant matricesProvably halts\( O( n _ z ) \)Jacobi iteration, Jacobi preconditioned Krylov method
Sparse w/ no dense rowsGraph theoretic boundsGraph theoretic boundsMultifrontal LU,Cholesky, or QR

This list could go on for pages because researchers are always discovering new properties they can exploit to make solvers more efficient in their specific domain. What does not exist however are general algorithms which can accept any matrix \(A\), precompute a data structure for representing \( A ^ {-1} \) with provable non-trivial bounds on memory and time complexity (A trivial bound in this context would be simply to convert the sparse matrix to a dense matrix and use dense linear algebra).

I will not solve this problem in this blog post but I will solve a related problem. Given an algorithm for computing \( A^{-1} \) which allegedly has nontrivial time and memory complexity bounds, find a family of counter example sparse matrices \( A \) on which the bound fails. Surprisingly this receives very little attention despite how useful it could be to validate solver concepts and understand the limitations of any given solver algorithm. Just like how solver algorithms have picked specific matrix properties to exploit, work in solver counter-examples have largely focused on matrices from very narrow families such as those arising in partial differential equations, structural mechanics, and so on.

I prefer to think about the linear algebraic properties that entail an algorithm’s success or failure rather than the domain a matrix may have come from, and so I gathered a set of core properties for a sparse matrix that seem to me highly critical to the success of any solver and you can mix these properties as well as turn them off - this allows for rapidly exploring the contours of a solver algorithm and seeing where it fails and why.

The Family and explanation of paramaters

The simple function below computes a random sparse matrix \(A = LDL^T + nsym * S\) where \(L\) is a lower unit triangular matrix, \( D \) is a diagonal matrix whose diagonal consists of \(-1,1\) only, and \(S\) is an antisymmetric matrix. I also include the following parameters for adjusting the sparsity and numerical properties of \( A \)

  1. m - size of \(A\)
  2. rho - probability that a diagonal entry of \(D\) is \(1\)
  3. nsym - scaling factor for \(S\) - determines how non-symmetric \(A\) becomes
  4. maxL - the largest possible value for an entry in \(L\) (besides its diagonal)
  5. bands - any list of unique integers from 1,...,m - these will determine sparsity pattern of \(L\)
  6. rng - a numpy rng

These parameters have the following impacts:

  1. m - Changes problem size
  2. rho - 0.0<rho<1.0 makes system indefinite
  3. nsym - Determines nonsymetry
  4. maxL - Impacts condition number \( \kappa (A) \). Bigger maxL results in larger condition number
  5. bands - More bands -> more nonzeros.
import numpy as np
import scipy.sparse as sp
def family(m,rho,nsym,maxL,bands,rng):
    #Make diagonal matrix of -1,1
    #Make lower triangular factor of symmetric part of A
    L=sp.diags([rng.uniform(0.0,maxL,size=m) for b in bands],[-b for b in bands],shape=(m,m))
    #Make lower triangular part of antisymmetric part of A
    S=sp.diags([rng.uniform(0.0,2.0) for b in bands],[-b for b in bands],shape=(m,m))
    #Make antisymmetric part of A
    return L @ D @ L.T + nsym * S

and example usage below

A = family(m,rho,nsym,maxL,bands,rng)

So how do we use this family? Let me summarize what each of these properties will do to a candidate solver:

  1. m - Use this to test solver’s dependence on problem size
  2. rho - Many iterative solvers fail on indefinite systems. rho=0.5 is a “worst case scenario” for many solvers
  3. nsym - High nonsymmetry can break solvers which achieve speed and memory savings by foregoing pivoting
  4. maxL - Some solvers work only within a certain regime of condition numbers (but do have reasonable expectations here, conditioning is not usually a solvable problem)
  5. bands - Many recent solvers work with datasparse formats to acheive their memory bounds. If you evenly space bands and have some large bands it tends to break these solvers

Once you have found good ranges for these inputs which causes a solver algorithm to fail to live up to its promises, you can then generate infinitely many such matrices. This could be helpful to prove a point, but more importantly it will let you continue pushing on the limits of what is possible for a general sparse solver algorithm. If you find a single algorithm which works extremely well on every single matrix this function outputs (within reasonable condition number ranges), then please brag about it and tag me on Twitter or send me an email. I would be thrilled to see what you came up with.