Warning: this is preliminary exploratory work. I outlined a research program around sparse QR factorization. I aim to answer the questions with small steps. The results I show here are not definitive and only apply to small special cases, but suggest value in continuing onto harder and larger cases. Here I show that we can compress QR factorization by exploiting a common property in matrix factorizations (not just QR) that when you factorize a sparse matrix the resulting factors have low-rank (thus compressible) off-diagonal blocks. I did not invent or discover this property see this GitHub repo and references therein and also significant amount of work by researcher Jack Poulson for a wide variety of practical applications of this principle to LU and Cholesky factorizations. To my knowledge however it has not been deeply explored for the QR factorization of sparse matrices. I do that here for a family of sparse matrices and show reasonable approximation. I also show that nested dissection ordering of the matrix results in better compressibility by this block low-rank technique.
Given a matrix \( \mathbf{A} \) its QR factorization is an upper triangular matrix \( \mathbf{R} \) and orthogonal matrix \( \mathbf{Q} \) such that
\[ \mathbf{A} = \mathbf{Q}\mathbf{R} \]
Ordinarily one does not store the full matrix \(\mathbf{Q}\), especially with a sparse input matrix \( \mathbf{A} \). This matrix is typically stored as a sequence of Householder reflectors which will retain some sparsity and thus memory savings. Similarly the matrix \( \mathbf{R} \) inherits sparsity from the sparsity of \( \mathbf{A} \). I propose here that we can achieve memory savings by doing low-rank appproximations of off-diagonal blocks of both \( \mathbf{R} \) and \( \mathbf{Q} \) where \( \mathbf{Q} \) is not stored as householder reflectors but rather the full matrix. In the diagram below
I show off-diagonal blocks in yellow and diagonal blocks in black. When \( \mathbf{A} \) is sparse the yellow blocks are often low-rank, for example by SVD. I explain next how I test whether these off-diagonal blocks can indeed be meaningfully compressed.
I construct a family of matrices based on a simple 2D grid and stencil calculation. I sample from this sample randomly and for each sample I form the QR factorization and then loop through all off-diagonal blocks of the factors \( \mathbf{Q},\mathbf{R} \) and compress them with a fixed-rank SVD. I compute the resulting approximation error, loss of orthogonality for the approximated \( \mathbf{Q} \), and factorization error. I then do the same thing but after performing nested dissection reordering of \( \mathbf{A} \). The key parameters to this experiment are as follows:
The family of matrices is generated with the following Python code
def make_matrix(nx,ny,coeffs):
m=nx*ny
A=np.zeros((m,m))
def id(ix,iy):
return ix + nx*iy
for ix in range(0,nx):
for iy in range(0,ny):
A[id(ix,iy),id(ix,iy)]=coeffs[0]
if ix>0:
A[id(ix-1,iy),id(ix,iy)]=coeffs[1]
if ix<nx-1:
A[id(ix+1,iy),id(ix,iy)]=coeffs[2]
if iy>0:
A[id(ix,iy-1),id(ix,iy)]=coeffs[3]
if iy<ny-1:
A[id(ix,iy+1),id(ix,iy)]=coeffs[4]
return A
the family of matrices defined by the function make_matrix
is fully defined by the parameters nx,ny,coeffs
.
I form the block-low-rank approximation with the following two functions (one for \( \mathbf{Q} \) and another for \( \mathbf{R} \).
def block_approx(Q,rank,blocksize):
k=rank
b=blocksize
m,_=Q.shape
P=Q.copy()
for i in range(0,m,b):
for j in range(0,m,b):
begi=i
endi=min(m,i+b)
begj=j
endj=min(m,j+b)
if i!=j:
#extract off-diagonal block
Qb=Q[np.ix_(range(begi,endi),range(begj,endj))]
#compute SVD
U,s,VT=la.svd(Qb)
S=np.diag(s)
#Form low-rank approximation
Lk=U[:,0:k] @ S[0:k,0:k] @ VT[0:k,:]
#Replace corresponding block in Q with approximation
P[begi:endi,begj:endj]=Lk
return P
#For the R factor which is upper triangular and thus has all zeros below the main diagonal
def block_approx_triangular(Q,rank,blocksize):
k=rank
b=blocksize
m,_=Q.shape
P=Q.copy()
for i in range(0,m,b):
for j in range(0,m,b):
begi=i
endi=min(m,i+b)
begj=j
endj=min(m,j+b)
if j>i:
#extract off-diagonal block
Qb=Q[np.ix_(range(begi,endi),range(begj,endj))]
#compute SVD
U,s,VT=la.svd(Qb)
S=np.diag(s)
#Form low-rank approximation
Lk=U[:,0:k] @ S[0:k,0:k] @ VT[0:k,:]
#Replace corresponding block in Q with approximation
P[begi:endi,begj:endj]=Lk
return P
For all of these results I draw 1000
random samples from the sparse matrix family I defined above in Python code and then perform the indicated calculation both with and without nested dissection permutation. I show results as histograms so that in addition to seeing whether the approximations worked we can also see the impact of the nested dissection ordering.
I use the following notation in the figure labels:
These results show how well the block low rank approximation works for the \( \mathbf{Q} \) factor both with and without nested dissection
Parameters | Result |
---|---|
\(k=1,n_x=n_y=16,b=16\) | |
\(k=4,n_x=n_y=16,b=16\) | |
\(k=8,n_x=n_y=16,b=16\) |
These show how well the block low rank approximation works for the \( \mathbf{R} \) factor both with and without nested dissection
Parameters | Result |
---|---|
\(k=1,n_x=n_y=16,b=16\) | |
\(k=4,n_x=n_y=16,b=16\) | |
\(k=8,n_x=n_y=16,b=16\) |
These show how well the block low rank approximation maintains orthogonality for the \( \mathbf{Q} \) factor both with and without nested dissection
Parameters | Result |
---|---|
\(k=1,n_x=n_y=16,b=16\) | |
\(k=4,n_x=n_y=16,b=16\) | |
\(k=8,n_x=n_y=16,b=16\) |
These show how well the block low rank approximation retains the factorization \( \mathbf{Q}\mathbf{R}=\mathbf{A} \)
Parameters | Result |
---|---|
\(k=1,n_x=n_y=16,b=16\) | |
\(k=4,n_x=n_y=16,b=16\) | |
\(k=8,n_x=n_y=16,b=16\) |
The simple famliy of sparse matrices which generate \( \mathbf{A} \) appear to permit block-low-rank approximations to the factors. These approximations get better as the rank increases (naturally), but also the nested dissection reordering appears to improve reliability and quality of these approximations.
I was curious whether block-low-rank approximation could be applied to QR factorization of a sparse matrix, but I wasn’t interested in this for its own sake. If the approximation works, which it appears to, can it be used as a preconditioner for a Krylov method? That will be the topic of my next post
The block-low-rank approximation does appear to work but this was a very early exploration of the idea. I believe the quality of the approximation could be significantly improved by directly exploiting the tree-like structure one gets from nested-dissection rather than simply choosing an arbitrary block size (I chose \(b=16\) in all above experiments). It is frequently the case that off-diagonal blocks indexed by separators from nested-dissection result in optimal low-rank properties, but that may not be true for the QR factorization.