Warning: this is *preliminary exploratory* work. I outlined a research program around sparse QR factorization. I aim to answer the questions with small steps. The results I show here are not definitive and only apply to small special cases, but suggest value in continuing onto harder and larger cases. Here I show that we can compress QR factorization by exploiting a common property in matrix factorizations (not just QR) that when you factorize a sparse matrix the resulting factors have low-rank (thus compressible) off-diagonal blocks. I did not invent or discover this property see this GitHub repo and references therein and also significant amount of work by researcher Jack Poulson for a wide variety of practical applications of this principle to LU and Cholesky factorizations. To my knowledge however it has not been deeply explored for the QR factorization of sparse matrices. I do that here for a family of sparse matrices and show reasonable approximation. I also show that nested dissection ordering of the matrix results in better compressibility by this block low-rank technique.

Given a matrix \( \mathbf{A} \) its QR factorization is an upper triangular matrix \( \mathbf{R} \) and orthogonal matrix \( \mathbf{Q} \) such that

\[ \mathbf{A} = \mathbf{Q}\mathbf{R} \]

Ordinarily one does not store the full matrix \(\mathbf{Q}\), *especially* with a sparse input matrix \( \mathbf{A} \). This matrix is typically stored as a sequence of Householder reflectors which will retain some sparsity and thus memory savings. Similarly the matrix \( \mathbf{R} \) inherits sparsity from the sparsity of \( \mathbf{A} \). I propose here that we can achieve memory savings by doing low-rank appproximations of off-diagonal blocks of both \( \mathbf{R} \) and \( \mathbf{Q} \) where \( \mathbf{Q} \) is *not* stored as householder reflectors but rather the *full* matrix. In the diagram below

I show off-diagonal blocks in yellow and diagonal blocks in black. When \( \mathbf{A} \) is sparse the yellow blocks are often low-rank, for example by SVD. I explain next how I test whether these off-diagonal blocks can indeed be meaningfully compressed.

I construct a family of matrices based on a simple 2D grid and stencil calculation. I sample from this sample randomly and for each sample I form the QR factorization and then loop through all off-diagonal blocks of the factors \( \mathbf{Q},\mathbf{R} \) and compress them with a fixed-rank SVD. I compute the resulting approximation error, loss of orthogonality for the approximated \( \mathbf{Q} \), and factorization error. I then do the same thing but after performing nested dissection reordering of \( \mathbf{A} \). The key parameters to this experiment are as follows:

$$ \begin{aligned} n_x &: \text{ First extent of 2D grid } \\ n_y &: \text{ Second extent of 2D grid } \\ m=n_x n_y &: \text{ Total number of unknowns (number of columns for A) } \\ b &: \text{ Block size } \\ k &: \text{ Desired rank for SVD compression } \end{aligned} $$

The family of matrices is generated with the following Python code

```
def make_matrix(nx,ny,coeffs):
m=nx*ny
A=np.zeros((m,m))
def id(ix,iy):
return ix + nx*iy
for ix in range(0,nx):
for iy in range(0,ny):
A[id(ix,iy),id(ix,iy)]=coeffs[0]
if ix>0:
A[id(ix-1,iy),id(ix,iy)]=coeffs[1]
if ix<nx-1:
A[id(ix+1,iy),id(ix,iy)]=coeffs[2]
if iy>0:
A[id(ix,iy-1),id(ix,iy)]=coeffs[3]
if iy<ny-1:
A[id(ix,iy+1),id(ix,iy)]=coeffs[4]
return A
```

the family of matrices defined by the function `make_matrix`

is fully defined by the parameters `nx,ny,coeffs`

.

I form the block-low-rank approximation with the following two functions (one for \( \mathbf{Q} \) and another for \( \mathbf{R} \).

```
def block_approx(Q,rank,blocksize):
k=rank
b=blocksize
m,_=Q.shape
P=Q.copy()
for i in range(0,m,b):
for j in range(0,m,b):
begi=i
endi=min(m,i+b)
begj=j
endj=min(m,j+b)
if i!=j:
#extract off-diagonal block
Qb=Q[np.ix_(range(begi,endi),range(begj,endj))]
#compute SVD
U,s,VT=la.svd(Qb)
S=np.diag(s)
#Form low-rank approximation
Lk=U[:,0:k] @ S[0:k,0:k] @ VT[0:k,:]
#Replace corresponding block in Q with approximation
P[begi:endi,begj:endj]=Lk
return P
#For the R factor which is upper triangular and thus has all zeros below the main diagonal
def block_approx_triangular(Q,rank,blocksize):
k=rank
b=blocksize
m,_=Q.shape
P=Q.copy()
for i in range(0,m,b):
for j in range(0,m,b):
begi=i
endi=min(m,i+b)
begj=j
endj=min(m,j+b)
if j>i:
#extract off-diagonal block
Qb=Q[np.ix_(range(begi,endi),range(begj,endj))]
#compute SVD
U,s,VT=la.svd(Qb)
S=np.diag(s)
#Form low-rank approximation
Lk=U[:,0:k] @ S[0:k,0:k] @ VT[0:k,:]
#Replace corresponding block in Q with approximation
P[begi:endi,begj:endj]=Lk
return P
```

For all of these results I draw `1000`

random samples from the sparse matrix family I defined above in Python code and then perform the indicated calculation both with and without nested dissection permutation. I show results as histograms so that in addition to seeing whether the approximations worked we can also see the impact of the nested dissection ordering.

I use the following notation in the figure labels:

$$ \begin{aligned} n_x &: \text{ First extent of 2D grid } \\ n_y &: \text{ Second extent of 2D grid } \\ m=n_x n_y &: \text{ Total number of unknowns (number of columns for A) } \\ b &: \text{ Block size } \\ k &: \text{ Desired rank for SVD compression } \\ \mathbf{A} &: \text{ A matrix from the sparse-matrix family described earlier} \\ \mathbf{R} &: \text{ exact R factor from QR factorization} \\ \mathbf{Q} &: \text{ exact Q factor from QR factorization} \\ \mathbf{\hat{R}} \approx \mathbf{R} &: \text{Block-low-rank approximate R} \\ \mathbf{P} \approx \mathbf{Q} &: \text{Block-low-rank approximate Q} \end{aligned} $$

These results show how well the block low rank approximation works for the \( \mathbf{Q} \) factor both with and without nested dissection

Parameters | Result |
---|---|

\(k=1,n_x=n_y=16,b=16\) | |

\(k=4,n_x=n_y=16,b=16\) | |

\(k=8,n_x=n_y=16,b=16\) |

These show how well the block low rank approximation works for the \( \mathbf{R} \) factor both with and without nested dissection

Parameters | Result |
---|---|

\(k=1,n_x=n_y=16,b=16\) | |

\(k=4,n_x=n_y=16,b=16\) | |

\(k=8,n_x=n_y=16,b=16\) |

These show how well the block low rank approximation maintains orthogonality for the \( \mathbf{Q} \) factor both with and without nested dissection

Parameters | Result |
---|---|

\(k=1,n_x=n_y=16,b=16\) | |

\(k=4,n_x=n_y=16,b=16\) | |

\(k=8,n_x=n_y=16,b=16\) |

These show how well the block low rank approximation retains the factorization \( \mathbf{Q}\mathbf{R}=\mathbf{A} \)

Parameters | Result |
---|---|

\(k=1,n_x=n_y=16,b=16\) | |

\(k=4,n_x=n_y=16,b=16\) | |

\(k=8,n_x=n_y=16,b=16\) |

The simple famliy of sparse matrices which generate \( \mathbf{A} \) appear to permit block-low-rank approximations to the factors. These approximations get better as the rank increases (naturally), but also the nested dissection reordering appears to improve reliability and quality of these approximations.

I was curious whether block-low-rank approximation could be applied to QR factorization of a sparse matrix, but I wasnâ€™t interested in this for its own sake. If the approximation works, which it appears to, can it be used as a preconditioner for a Krylov method? That will be the topic of my next post

The block-low-rank approximation does appear to work but this was a very early exploration of the idea. I believe the quality of the approximation could be significantly improved by directly exploiting the tree-like structure one gets from nested-dissection rather than simply choosing an arbitrary block size (I chose \(b=16\) in all above experiments). It is frequently the case that off-diagonal blocks indexed by separators from nested-dissection result in optimal low-rank properties, but that may not be true for the QR factorization.