Compressing the Sparse QR Factorization with Nested Dissection and Block Low-Rank Approximation

February 03, 2021

Warning: this is preliminary exploratory work. I outlined a research program around sparse QR factorization. I aim to answer the questions with small steps. The results I show here are not definitive and only apply to small special cases, but suggest value in continuing onto harder and larger cases. Here I show that we can compress QR factorization by exploiting a common property in matrix factorizations (not just QR) that when you factorize a sparse matrix the resulting factors have low-rank (thus compressible) off-diagonal blocks. I did not invent or discover this property see this GitHub repo and references therein and also significant amount of work by researcher Jack Poulson for a wide variety of practical applications of this principle to LU and Cholesky factorizations. To my knowledge however it has not been deeply explored for the QR factorization of sparse matrices. I do that here for a family of sparse matrices and show reasonable approximation. I also show that nested dissection ordering of the matrix results in better compressibility by this block low-rank technique.

Block low-rank approximation of QR factors

Given a matrix \( \mathbf{A} \) its QR factorization is an upper triangular matrix \( \mathbf{R} \) and orthogonal matrix \( \mathbf{Q} \) such that

\[ \mathbf{A} = \mathbf{Q}\mathbf{R} \]

Ordinarily one does not store the full matrix \(\mathbf{Q}\), especially with a sparse input matrix \( \mathbf{A} \). This matrix is typically stored as a sequence of Householder reflectors which will retain some sparsity and thus memory savings. Similarly the matrix \( \mathbf{R} \) inherits sparsity from the sparsity of \( \mathbf{A} \). I propose here that we can achieve memory savings by doing low-rank appproximations of off-diagonal blocks of both \( \mathbf{R} \) and \( \mathbf{Q} \) where \( \mathbf{Q} \) is not stored as householder reflectors but rather the full matrix. In the diagram below

diagram

I show off-diagonal blocks in yellow and diagonal blocks in black. When \( \mathbf{A} \) is sparse the yellow blocks are often low-rank, for example by SVD. I explain next how I test whether these off-diagonal blocks can indeed be meaningfully compressed.

Experiment Setup

I construct a family of matrices based on a simple 2D grid and stencil calculation. I sample from this sample randomly and for each sample I form the QR factorization and then loop through all off-diagonal blocks of the factors \( \mathbf{Q},\mathbf{R} \) and compress them with a fixed-rank SVD. I compute the resulting approximation error, loss of orthogonality for the approximated \( \mathbf{Q} \), and factorization error. I then do the same thing but after performing nested dissection reordering of \( \mathbf{A} \). The key parameters to this experiment are as follows:

$$ \begin{aligned} n_x &: \text{ First extent of 2D grid } \\ n_y &: \text{ Second extent of 2D grid } \\ m=n_x n_y &: \text{ Total number of unknowns (number of columns for A) } \\ b &: \text{ Block size } \\ k &: \text{ Desired rank for SVD compression } \end{aligned} $$

The family of matrices is generated with the following Python code

def make_matrix(nx,ny,coeffs):
    m=nx*ny
    A=np.zeros((m,m))
    def id(ix,iy):
        return ix + nx*iy

    for ix in range(0,nx):
        for iy in range(0,ny):
            A[id(ix,iy),id(ix,iy)]=coeffs[0]
            if ix>0:
                A[id(ix-1,iy),id(ix,iy)]=coeffs[1]
            if ix<nx-1:
                A[id(ix+1,iy),id(ix,iy)]=coeffs[2]
            if iy>0:
                A[id(ix,iy-1),id(ix,iy)]=coeffs[3]
            if iy<ny-1:
                A[id(ix,iy+1),id(ix,iy)]=coeffs[4]
    return A

the family of matrices defined by the function make_matrix is fully defined by the parameters nx,ny,coeffs.

I form the block-low-rank approximation with the following two functions (one for \( \mathbf{Q} \) and another for \( \mathbf{R} \).

def block_approx(Q,rank,blocksize):
    k=rank
    b=blocksize
    m,_=Q.shape
    P=Q.copy()
    for i in range(0,m,b):
        for j in range(0,m,b):
            begi=i
            endi=min(m,i+b)
            begj=j
            endj=min(m,j+b)
            if i!=j:
                #extract off-diagonal block
                Qb=Q[np.ix_(range(begi,endi),range(begj,endj))]
                #compute SVD
                U,s,VT=la.svd(Qb)
                S=np.diag(s)
                #Form low-rank approximation
                Lk=U[:,0:k] @ S[0:k,0:k] @ VT[0:k,:]
                #Replace corresponding block in Q with approximation
                P[begi:endi,begj:endj]=Lk
    return P
#For the R factor which is upper triangular and thus has all zeros below the main diagonal
def block_approx_triangular(Q,rank,blocksize):
    k=rank
    b=blocksize
    m,_=Q.shape
    P=Q.copy()
    for i in range(0,m,b):
        for j in range(0,m,b):
            begi=i
            endi=min(m,i+b)
            begj=j
            endj=min(m,j+b)
            if j>i:
                #extract off-diagonal block
                Qb=Q[np.ix_(range(begi,endi),range(begj,endj))]
                #compute SVD
                U,s,VT=la.svd(Qb)
                S=np.diag(s)
                #Form low-rank approximation
                Lk=U[:,0:k] @ S[0:k,0:k] @ VT[0:k,:]
                #Replace corresponding block in Q with approximation
                P[begi:endi,begj:endj]=Lk
    return P

Results

For all of these results I draw 1000 random samples from the sparse matrix family I defined above in Python code and then perform the indicated calculation both with and without nested dissection permutation. I show results as histograms so that in addition to seeing whether the approximations worked we can also see the impact of the nested dissection ordering.

I use the following notation in the figure labels:

$$ \begin{aligned} n_x &: \text{ First extent of 2D grid } \\ n_y &: \text{ Second extent of 2D grid } \\ m=n_x n_y &: \text{ Total number of unknowns (number of columns for A) } \\ b &: \text{ Block size } \\ k &: \text{ Desired rank for SVD compression } \\ \mathbf{A} &: \text{ A matrix from the sparse-matrix family described earlier} \\ \mathbf{R} &: \text{ exact R factor from QR factorization} \\ \mathbf{Q} &: \text{ exact Q factor from QR factorization} \\ \mathbf{\hat{R}} \approx \mathbf{R} &: \text{Block-low-rank approximate R} \\ \mathbf{P} \approx \mathbf{Q} &: \text{Block-low-rank approximate Q} \end{aligned} $$

Approximation error for Q

These results show how well the block low rank approximation works for the \( \mathbf{Q} \) factor both with and without nested dissection

Parameters Result
\(k=1,n_x=n_y=16,b=16\)result1
\(k=4,n_x=n_y=16,b=16\)result2
\(k=8,n_x=n_y=16,b=16\)result3

Approximation error for R

These show how well the block low rank approximation works for the \( \mathbf{R} \) factor both with and without nested dissection

Parameters Result
\(k=1,n_x=n_y=16,b=16\)result1
\(k=4,n_x=n_y=16,b=16\)result2
\(k=8,n_x=n_y=16,b=16\)result3

Loss of orthgogonality in approximated Q

These show how well the block low rank approximation maintains orthogonality for the \( \mathbf{Q} \) factor both with and without nested dissection

Parameters Result
\(k=1,n_x=n_y=16,b=16\)result1
\(k=4,n_x=n_y=16,b=16\)result2
\(k=8,n_x=n_y=16,b=16\)result3

Factorization error

These show how well the block low rank approximation retains the factorization \( \mathbf{Q}\mathbf{R}=\mathbf{A} \)

Parameters Result
\(k=1,n_x=n_y=16,b=16\)result1
\(k=4,n_x=n_y=16,b=16\)result2
\(k=8,n_x=n_y=16,b=16\)result3

Conclusion

The simple famliy of sparse matrices which generate \( \mathbf{A} \) appear to permit block-low-rank approximations to the factors. These approximations get better as the rank increases (naturally), but also the nested dissection reordering appears to improve reliability and quality of these approximations.

Next Steps: using as preconditioner

I was curious whether block-low-rank approximation could be applied to QR factorization of a sparse matrix, but I wasn’t interested in this for its own sake. If the approximation works, which it appears to, can it be used as a preconditioner for a Krylov method? That will be the topic of my next post

Looking ahead: Using nested dissection directly

The block-low-rank approximation does appear to work but this was a very early exploration of the idea. I believe the quality of the approximation could be significantly improved by directly exploiting the tree-like structure one gets from nested-dissection rather than simply choosing an arbitrary block size (I chose \(b=16\) in all above experiments). It is frequently the case that off-diagonal blocks indexed by separators from nested-dissection result in optimal low-rank properties, but that may not be true for the QR factorization.