Suppose a sparse matrix \( A \in \mathbb{R} ^ {m\times m} \) has \( n _ z \) nonzeros. Trivially we know that it takes \( O( n _ z ) \) bytes to store \( A \) and also trivially the time complexity of matrix-vector products \( Ax \) should be only \( O( n _ z) \). But what about the inverse of \( A \)? It has proven extremely challenging to practically bound the memory and time complexity of \( A^{-1} x \) based only on the sparsity and numerical information available from \( A \). Instead we have bounds for specific algorithms for computing \( A^{-1} x \) and these algorithms only work for very specific families of matrices, I provide A non-exhaustive sampling below
Sparse Matrix family | \(A^{-1}x\) Time Complexity | \(A^{-1}x\) Memory Complexity | Algorithm |
---|---|---|---|
Banded matrices | \( O(n _ z) \) | \( O(n _ z) \) | Banded LU factorization,data-sparse decompositions |
M matrices | \( O(\log(m) n _ z) \) | \( O(\log(m) n _ z) \) | Algebraic multigrid |
Positive Definite matrices | Provably halts | \( O( n _ z ) \) | Most Krylov iterative methods (unpreconditioned) |
Diagonally Dominant matrices | Provably halts | \( O( n _ z ) \) | Jacobi iteration, Jacobi preconditioned Krylov method |
Sparse w/ no dense rows | Graph theoretic bounds | Graph theoretic bounds | Multifrontal LU,Cholesky, or QR |
This list could go on for pages because researchers are always discovering new properties they can exploit to make solvers more efficient in their specific domain. What does not exist however are general algorithms which can accept any matrix \(A\), precompute a data structure for representing \( A ^ {-1} \) with provable non-trivial bounds on memory and time complexity (A trivial bound in this context would be simply to convert the sparse matrix to a dense matrix and use dense linear algebra).
I will not solve this problem in this blog post but I will solve a related problem. Given an algorithm for computing \( A^{-1} \) which allegedly has nontrivial time and memory complexity bounds, find a family of counter example sparse matrices \( A \) on which the bound fails. Surprisingly this receives very little attention despite how useful it could be to validate solver concepts and understand the limitations of any given solver algorithm. Just like how solver algorithms have picked specific matrix properties to exploit, work in solver counter-examples have largely focused on matrices from very narrow families such as those arising in partial differential equations, structural mechanics, and so on.
I prefer to think about the linear algebraic properties that entail an algorithm’s success or failure rather than the domain a matrix may have come from, and so I gathered a set of core properties for a sparse matrix that seem to me highly critical to the success of any solver and you can mix these properties as well as turn them off - this allows for rapidly exploring the contours of a solver algorithm and seeing where it fails and why.
The simple function below computes a random sparse matrix \(A = LDL^T + nsym * S\) where \(L\) is a lower unit triangular matrix, \( D \) is a diagonal matrix whose diagonal consists of \(-1,1\) only, and \(S\) is an antisymmetric matrix. I also include the following parameters for adjusting the sparsity and numerical properties of \( A \)
m
- size of \(A\)rho
- probability that a diagonal entry of \(D\) is \(1\)nsym
- scaling factor for \(S\) - determines how non-symmetric \(A\) becomesmaxL
- the largest possible value for an entry in \(L\) (besides its diagonal)bands
- any list of unique integers from 1,...,m
- these will determine sparsity pattern of \(L\)rng
- a numpy rngThese parameters have the following impacts:
m
- Changes problem sizerho
- 0.0<rho<1.0
makes system indefinitensym
- Determines nonsymetrymaxL
- Impacts condition number \( \kappa (A) \). Bigger maxL
results in larger condition numberbands
- More bands -> more nonzeros.import numpy as np
import scipy.sparse as sp
def family(m,rho,nsym,maxL,bands,rng):
assert(min(bands)>0)
assert(max(bands)<m)
assert(rho>=0)
assert(rho<=1.0)
assert(nsym>=0.0)
assert(maxL>=0.0)
bands=sorted(bands)
I=sp.eye(m)
#Make diagonal matrix of -1,1
D=sp.diags(rng.choice([-1.0,1.0],size=m,p=[1.0-rho,rho]),shape=(m,m))
#Make lower triangular factor of symmetric part of A
L=sp.diags([rng.uniform(0.0,maxL,size=m) for b in bands],[-b for b in bands],shape=(m,m))
#Make lower triangular part of antisymmetric part of A
S=sp.diags([rng.uniform(0.0,2.0) for b in bands],[-b for b in bands],shape=(m,m))
#Make antisymmetric part of A
S=0.5*(S-S.T)
return L @ D @ L.T + nsym * S
and example usage below
seed=42
rng=np.random.default_rng(seed)
m=128
rho=0.80
nsym=1.0
maxL=5.0
bands=[1,2,3]
A = family(m,rho,nsym,maxL,bands,rng)
So how do we use this family? Let me summarize what each of these properties will do to a candidate solver:
m
- Use this to test solver’s dependence on problem sizerho
- Many iterative solvers fail on indefinite systems. rho=0.5
is a “worst case scenario” for many solversnsym
- High nonsymmetry can break solvers which achieve speed and memory savings by foregoing pivotingmaxL
- Some solvers work only within a certain regime of condition numbers (but do have reasonable expectations here, conditioning is not usually a solvable problem)bands
- Many recent solvers work with datasparse formats to acheive their memory bounds. If you evenly space bands and have some large bands it tends to break these solversOnce you have found good ranges for these inputs which causes a solver algorithm to fail to live up to its promises, you can then generate infinitely many such matrices. This could be helpful to prove a point, but more importantly it will let you continue pushing on the limits of what is possible for a general sparse solver algorithm. If you find a single algorithm which works extremely well on every single matrix this function outputs (within reasonable condition number ranges), then please brag about it and tag me on Twitter or send me an email. I would be thrilled to see what you came up with.