# Compressing the Sparse QR Factorization with Nested Dissection and Block Low-Rank Approximation

February 03, 2021

Warning: this is preliminary exploratory work. I outlined a research program around sparse QR factorization. I aim to answer the questions with small steps. The results I show here are not definitive and only apply to small special cases, but suggest value in continuing onto harder and larger cases. Here I show that we can compress QR factorization by exploiting a common property in matrix factorizations (not just QR) that when you factorize a sparse matrix the resulting factors have low-rank (thus compressible) off-diagonal blocks. I did not invent or discover this property see this GitHub repo and references therein and also significant amount of work by researcher Jack Poulson for a wide variety of practical applications of this principle to LU and Cholesky factorizations. To my knowledge however it has not been deeply explored for the QR factorization of sparse matrices. I do that here for a family of sparse matrices and show reasonable approximation. I also show that nested dissection ordering of the matrix results in better compressibility by this block low-rank technique.

# Block low-rank approximation of QR factors

Given a matrix $$\mathbf{A}$$ its QR factorization is an upper triangular matrix $$\mathbf{R}$$ and orthogonal matrix $$\mathbf{Q}$$ such that

$\mathbf{A} = \mathbf{Q}\mathbf{R}$

Ordinarily one does not store the full matrix $$\mathbf{Q}$$, especially with a sparse input matrix $$\mathbf{A}$$. This matrix is typically stored as a sequence of Householder reflectors which will retain some sparsity and thus memory savings. Similarly the matrix $$\mathbf{R}$$ inherits sparsity from the sparsity of $$\mathbf{A}$$. I propose here that we can achieve memory savings by doing low-rank appproximations of off-diagonal blocks of both $$\mathbf{R}$$ and $$\mathbf{Q}$$ where $$\mathbf{Q}$$ is not stored as householder reflectors but rather the full matrix. In the diagram below

I show off-diagonal blocks in yellow and diagonal blocks in black. When $$\mathbf{A}$$ is sparse the yellow blocks are often low-rank, for example by SVD. I explain next how I test whether these off-diagonal blocks can indeed be meaningfully compressed.

# Experiment Setup

I construct a family of matrices based on a simple 2D grid and stencil calculation. I sample from this sample randomly and for each sample I form the QR factorization and then loop through all off-diagonal blocks of the factors $$\mathbf{Q},\mathbf{R}$$ and compress them with a fixed-rank SVD. I compute the resulting approximation error, loss of orthogonality for the approximated $$\mathbf{Q}$$, and factorization error. I then do the same thing but after performing nested dissection reordering of $$\mathbf{A}$$. The key parameters to this experiment are as follows:

\begin{aligned} n_x &: \text{ First extent of 2D grid } \\ n_y &: \text{ Second extent of 2D grid } \\ m=n_x n_y &: \text{ Total number of unknowns (number of columns for A) } \\ b &: \text{ Block size } \\ k &: \text{ Desired rank for SVD compression } \end{aligned}

The family of matrices is generated with the following Python code

def make_matrix(nx,ny,coeffs):
m=nx*ny
A=np.zeros((m,m))
def id(ix,iy):
return ix + nx*iy

for ix in range(0,nx):
for iy in range(0,ny):
A[id(ix,iy),id(ix,iy)]=coeffs[0]
if ix>0:
A[id(ix-1,iy),id(ix,iy)]=coeffs[1]
if ix<nx-1:
A[id(ix+1,iy),id(ix,iy)]=coeffs[2]
if iy>0:
A[id(ix,iy-1),id(ix,iy)]=coeffs[3]
if iy<ny-1:
A[id(ix,iy+1),id(ix,iy)]=coeffs[4]
return A


the family of matrices defined by the function make_matrix is fully defined by the parameters nx,ny,coeffs.

I form the block-low-rank approximation with the following two functions (one for $$\mathbf{Q}$$ and another for $$\mathbf{R}$$.

def block_approx(Q,rank,blocksize):
k=rank
b=blocksize
m,_=Q.shape
P=Q.copy()
for i in range(0,m,b):
for j in range(0,m,b):
begi=i
endi=min(m,i+b)
begj=j
endj=min(m,j+b)
if i!=j:
#extract off-diagonal block
Qb=Q[np.ix_(range(begi,endi),range(begj,endj))]
#compute SVD
U,s,VT=la.svd(Qb)
S=np.diag(s)
#Form low-rank approximation
Lk=U[:,0:k] @ S[0:k,0:k] @ VT[0:k,:]
#Replace corresponding block in Q with approximation
P[begi:endi,begj:endj]=Lk
return P
#For the R factor which is upper triangular and thus has all zeros below the main diagonal
def block_approx_triangular(Q,rank,blocksize):
k=rank
b=blocksize
m,_=Q.shape
P=Q.copy()
for i in range(0,m,b):
for j in range(0,m,b):
begi=i
endi=min(m,i+b)
begj=j
endj=min(m,j+b)
if j>i:
#extract off-diagonal block
Qb=Q[np.ix_(range(begi,endi),range(begj,endj))]
#compute SVD
U,s,VT=la.svd(Qb)
S=np.diag(s)
#Form low-rank approximation
Lk=U[:,0:k] @ S[0:k,0:k] @ VT[0:k,:]
#Replace corresponding block in Q with approximation
P[begi:endi,begj:endj]=Lk
return P


# Results

For all of these results I draw 1000 random samples from the sparse matrix family I defined above in Python code and then perform the indicated calculation both with and without nested dissection permutation. I show results as histograms so that in addition to seeing whether the approximations worked we can also see the impact of the nested dissection ordering.

I use the following notation in the figure labels:

\begin{aligned} n_x &: \text{ First extent of 2D grid } \\ n_y &: \text{ Second extent of 2D grid } \\ m=n_x n_y &: \text{ Total number of unknowns (number of columns for A) } \\ b &: \text{ Block size } \\ k &: \text{ Desired rank for SVD compression } \\ \mathbf{A} &: \text{ A matrix from the sparse-matrix family described earlier} \\ \mathbf{R} &: \text{ exact R factor from QR factorization} \\ \mathbf{Q} &: \text{ exact Q factor from QR factorization} \\ \mathbf{\hat{R}} \approx \mathbf{R} &: \text{Block-low-rank approximate R} \\ \mathbf{P} \approx \mathbf{Q} &: \text{Block-low-rank approximate Q} \end{aligned}

## Approximation error for Q

These results show how well the block low rank approximation works for the $$\mathbf{Q}$$ factor both with and without nested dissection

Parameters Result
$$k=1,n_x=n_y=16,b=16$$
$$k=4,n_x=n_y=16,b=16$$
$$k=8,n_x=n_y=16,b=16$$

## Approximation error for R

These show how well the block low rank approximation works for the $$\mathbf{R}$$ factor both with and without nested dissection

Parameters Result
$$k=1,n_x=n_y=16,b=16$$
$$k=4,n_x=n_y=16,b=16$$
$$k=8,n_x=n_y=16,b=16$$

## Loss of orthgogonality in approximated Q

These show how well the block low rank approximation maintains orthogonality for the $$\mathbf{Q}$$ factor both with and without nested dissection

Parameters Result
$$k=1,n_x=n_y=16,b=16$$
$$k=4,n_x=n_y=16,b=16$$
$$k=8,n_x=n_y=16,b=16$$

## Factorization error

These show how well the block low rank approximation retains the factorization $$\mathbf{Q}\mathbf{R}=\mathbf{A}$$

Parameters Result
$$k=1,n_x=n_y=16,b=16$$
$$k=4,n_x=n_y=16,b=16$$
$$k=8,n_x=n_y=16,b=16$$

# Conclusion

The simple famliy of sparse matrices which generate $$\mathbf{A}$$ appear to permit block-low-rank approximations to the factors. These approximations get better as the rank increases (naturally), but also the nested dissection reordering appears to improve reliability and quality of these approximations.

## Next Steps: using as preconditioner

I was curious whether block-low-rank approximation could be applied to QR factorization of a sparse matrix, but I wasn’t interested in this for its own sake. If the approximation works, which it appears to, can it be used as a preconditioner for a Krylov method? That will be the topic of my next post

## Looking ahead: Using nested dissection directly

The block-low-rank approximation does appear to work but this was a very early exploration of the idea. I believe the quality of the approximation could be significantly improved by directly exploiting the tree-like structure one gets from nested-dissection rather than simply choosing an arbitrary block size (I chose $$b=16$$ in all above experiments). It is frequently the case that off-diagonal blocks indexed by separators from nested-dissection result in optimal low-rank properties, but that may not be true for the QR factorization.