Interesting Sparse Nonsymmetric+Definite Linear Solver

April 12, 2022

I present a new-to-me algorithm for solving a sparse system of linear equations where the sparse matrix is positive definite (but not necessarily symmetric). This solver has some interesting properties:

  1. A factorization step using only Cholesky (no pivoting necessary)
  2. An iteration step using only a normal matrix (no nonnormality effects on iteration)
  3. Provably optimal Krylov convergence rate in step (2)
  4. Low memory overhead of resulting iteration

Achieving (1) through (4) doesn’t guarantee a practical solver but they are interesting to me because nonsymmetric solvers frequently require the following for robust and fast convergence:

  1. Pivoting for numerical stability (e.g. incomplete LU factorization)
  2. Iterating with nonormal matrix (no provable and practical convergence rates)
  3. Suboptimal Krylov methods (e.g. restarted GMRES with low restart value,bicgstab)
  4. Unbounded memory requirements (e.g. restarted GMRES stagnates and requires ever larger restart values)

I take a simple approach by splitting a matrix into its symmetric and anti-symmetric parts. I form the Cholesky factorization of the symmetric part and apply the inverse of its factors to the left and right. I prove that the system preconditioned in this way has a statically knowable spectral distribution in the complex plane, which means we may choose optimal parameters to the Chebyshev iteration Krylov method which targets this distribution. Since I choose optimal parameters the algorithm will converge. In this way I rely on the pivoting-free nature of the Cholesky factorization to factorize the symmetric part of a matrix, and the optimality properties of Chebyshev iteration to deliver convergence.

I first describe the mathematical derivation of this approach including proof of the spectral properties of the preconditioned operator. I derive the optimal Chebyshev parameters for this preconditioned system. I show numerical results confirming this does work. I then finish with some conclusions about the utility of this approach. I think this method largely has theoretical rather than practical value.

\(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\)

Statement of Problem

I wish to solve a sparse linear system of the form

\begin{align} Ax = b \end{align}

where \(A\) is a nonsymmetric sparse matrix that is definite. In this context I take definite to mean that \( \frac{1}{2}(A + A^T) \) has only positive real eigenvalues. I also assume \(A\) is real and do not consider the complex number case here.

Transforming into Shifted Skew Form

One can transform any matrix into a sum of a symmetric (hermitian) and a skew-symmetric matrix as follows:

\begin{align} A &= \frac{1}{2} * (A + A^T) + \frac{1}{2} * (A - A^T) \\ &\equiv H + S \end{align}

the \(H\) and \(S\) above are mean to represent hermitian and skew respectively.

Thus since \(H\) is positive definite it has a Cholesky Factorization

\begin{align} H = L L ^T \end{align}

One can compute this with sparse \(H\) using e.g. CHOLMOD.

If we left and right precondition \(A\) with this factor we get

\begin{align} L^{-1} A L^{-T} = I + L^{-1} S L^{-T} \end{align}

this transforms \(A\) into a shifted skew form i.e. since \(S\) is skew-symmetric, so is \( L^{-1} S L^{-T} \).

Spectral properties of Shifted Skew Form

A shifted-skew matrix \(I+S\) is normal which means the performance of Krylov methods will depend solely on its spectrum. We also know the eigenvalues of \(I+S\) take the form \(1 + \alpha i\). Furthermore since the matrix \(A\) is real we know that the eigenvalues come in conjugate pairs, so that if \(1 + \alpha i\) is an eigenvalue of \(I+S\) then so is \(1 - \alpha i\). In particular if we know the spectral radius of \( I + S \) (easily computed by power iteration) then we know the interval in which all eigenvalues of \(I+S\) lie.

Optimal Chebyshev Algorithm for Known Spectrum

The Chebyshev method for solving a system of linear equations is a Krylov method like GMRES or Conjugate Gradients but instead of building up a basis through an explicit or implicit orthogonalization procedure it uses prior knowledge of the input matrix spectrum. This knowledge generally must be very accurate or else the procedure will converge very slowly, and since computing eigenvalues is harder than solving a linear system this limits the usefulness of Chebyshev iteration in general. However in light of above comments on the spectrum of \(I + S\) we can achieve very accurate spectral estimates using simply power iteration.

I give simple Python code below for Chebyshev iteration

def chebyshev(m,A,b,alpha,c,maxiter=100,verbose=True,tol=1e-10):
    nu=-alpha/c
    beta=0.5*c * (1.0/nu)
    gamma=-alpha


    x=np.zeros(m)
    xm1=np.zeros(m)
    xp1=np.zeros(m)

    r=b.copy()
    rm1=np.zeros(m)
    rp1=np.zeros(m)


    for n in range(0,maxiter):
        if n>=2:
            beta=((0.5*c)**2)*(1.0/gamma)
        if n>=1:
            gamma=-(alpha+beta)

        if n==0:
            xp1=-(r+alpha*x)/gamma
            rp1=(A(r)-alpha*r)/gamma
        else:
            xp1=-(r+alpha*x+beta*xm1)/gamma
            rp1=(A(r)-alpha*r-beta*rm1)/gamma
        if verbose:
            print(f"iter={n},   residual={np.linalg.norm(rp1)}")

        xm1[:]=x
        rm1[:]=r
        x[:]=xp1
        r[:]=rp1
        if np.linalg.norm(r)<tol:
            break

    return x

The meaning of the parameters alpha,c is that the eigenvalues of the input matrix \(A\) must all lie in the ellipse defined by complex numbers alpha,c. Smaller ellipses result in faster Chebyshev iteration, and if the ellipse is as small as possible then you actually achieve optimal Krylov convergence rate due to the optimality property of Chebyshev polynomials.

Note that for simplicity I kept the parameters alpha,c complex but in the shifted-skew case they are either pure-real or pure-imaginary and this makes it possible to deal with only pure-real arrays (to avoid duplicating memory costs for storing residual and successive iteration vectors).

Fully specified Algorithm

I give Python code below for solving a system \(Ax=b\) using the above ideas.

import numpy as np
import scipy.sparse as sp
import scipy.sparse.linalg as spla
from sksparse.cholmod import cholesky
def symm_chebyshev(A,b):
    #Split into hermitian and skew-hermitian parts
    H=0.5*(A+A.T)
    S=0.5*(A-A.T)
    #Compute cholesky factorization of hermitian part
    L=cholesky(H)
    #Reorder original matrices according to fill-reduce ordering from cholesky
    P=L.P()
    b=b[P]
    #A=A[P[:, np.newaxis], P[np.newaxis, :]]
    H=H[P[:, np.newaxis], P[np.newaxis, :]]
    S=S[P[:, np.newaxis], P[np.newaxis, :]]
    #Convenience functions
    def invL(x):
        return L.solve_L(x,use_LDLt_decomposition=False)
    def invLT(x):
        return L.solve_Lt(x,use_LDLt_decomposition=False)
    #Get maximum eigenvalue for the skew-symmetric matrix inv(L)*S*inv(L^T)
    #"neval" below to keep track of how many times we evaluate this 
    #skew symmetric part as it's expensive
    neval=0
    def skew_factor(x):
        nonlocal neval
        neval+=1
        y=invLT(x)
        return invL(S@y)
    es,_=spla.eigs(spla.LinearOperator((m,m),matvec=skew_factor),k=6,tol=1e-10)
    eig_neval=neval
    #Define ellipse parameters for Chebyshev iteration
    alpha=1.0+0.0j
    print(es)
    c=max(np.imag(es))*1.0j
    #Left-precondition b
    b=invL(b)
    #Now solve by chebyshev iteration
    neval=0
    x=chebyshev.chebyshev_pureimagc(m,lambda x: x + skew_factor(x),b,alpha,c,maxiter=5000,tol=1e-14,verbose=False)
    #Apply right-preconditioner
    x[P]=invLT(x)
    cheb_neval=neval
    return x

Results

I did not yet extensively test this algorithm or compare against alternatives but I did confirm it works on some challenging nonsymmetric systems which I generated randomly. Convergence of the Chebyshev method is generally rapid when the skew-symmetric part is not very large compared to the symmetric part - like below:

iter=0,   residual=0.0001340813560623113
iter=1,   residual=9.845853216344927e-06
iter=2,   residual=1.013103495865824e-11
iter=3,   residual=2.5020940334707195e-13
iter=4,   residual=4.275895432223994e-19

convergence can take a little longer when the skew-symmetric part is large compared to the symmetric part:

iter=0,   residual=4.186754323212551
iter=1,   residual=210.66587568527942
iter=2,   residual=3.474588981192882
iter=3,   residual=167.0642639225858
iter=4,   residual=3.04771078119865
iter=5,   residual=120.54588772351465
iter=6,   residual=2.581557296578431
iter=7,   residual=82.71211926014004
...
...
iter=164,   residual=1.6864217663823996e-13
iter=165,   residual=3.096346201216635e-13
iter=166,   residual=1.1169909961323464e-13
iter=167,   residual=2.0317413138929607e-13
iter=168,   residual=7.397292905249339e-14
iter=169,   residual=1.3331742365813827e-13
iter=170,   residual=4.898175152676594e-14
iter=171,   residual=8.747942227362621e-14
iter=172,   residual=3.24292721522201e-14
iter=173,   residual=5.740166656201045e-14
iter=174,   residual=2.146760662263503e-14
iter=175,   residual=3.766550578885808e-14
iter=176,   residual=1.4209256729454957e-14
iter=177,   residual=2.4715157991933956e-14
iter=178,   residual=9.403862049980998e-15

Thoughts

This algorithm is interesting because it avoids all the pitfalls of nonsymmetry and nonnormality. The Cholesky factorization and left,right preconditioning by the Cholesky factor transforms the system into shifted-skew form even if the underlying matrix \(A\) is highly nonsymmetric and nonnormal furthermore the highly predictable spectrum of a shifted-skew matrix led to an optimal Chebysehv algorithm to solve systems involving it.

I am interested to see if there could be utility in using incomplete cholesky factorizations to construct preconditioners to nonsymmetric systems using the above ideas. I have not tested this idea yet however.