I present a new-to-me algorithm for solving a sparse system of linear equations where the sparse matrix is positive definite (but not necessarily symmetric). This solver has some interesting properties:

- A factorization step using only Cholesky (no pivoting necessary)
- An iteration step using only a normal matrix (no nonnormality effects on iteration)
- Provably optimal Krylov convergence rate in step (2)
- Low memory overhead of resulting iteration

Achieving (1) through (4) doesnâ€™t guarantee a practical solver but they are interesting to me because nonsymmetric solvers frequently require the following for robust and fast convergence:

- Pivoting for numerical stability (e.g. incomplete LU factorization)
- Iterating with nonormal matrix (no provable
*and*practical convergence rates) - Suboptimal Krylov methods (e.g. restarted GMRES with low restart value,bicgstab)
- Unbounded memory requirements (e.g. restarted GMRES stagnates and requires ever larger restart values)

I take a simple approach by splitting a matrix into its symmetric and anti-symmetric parts. I form the Cholesky factorization of the symmetric part and apply the inverse of its factors to the left and right. I prove that the system preconditioned in this way has a statically knowable spectral distribution in the complex plane, which means we may choose optimal parameters to the Chebyshev iteration Krylov method which targets this distribution. Since I choose optimal parameters the algorithm will converge. In this way I rely on the pivoting-free nature of the Cholesky factorization to factorize the symmetric part of a matrix, and the optimality properties of Chebyshev iteration to deliver convergence.

I first describe the mathematical derivation of this approach including proof of the spectral properties of the preconditioned operator. I derive the optimal Chebyshev parameters for this preconditioned system. I show numerical results confirming this does work. I then finish with some conclusions about the utility of this approach. I think this method largely has theoretical rather than practical value.

\(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\)

I wish to solve a sparse linear system of the form

\begin{align} Ax = b \end{align}

where \(A\) is a nonsymmetric sparse matrix that is *definite*. In this context I take *definite* to mean that \( \frac{1}{2}(A + A^T) \) has only positive real eigenvalues. I also assume \(A\) is *real* and do not consider the complex number case here.

One can transform any matrix into a sum of a symmetric (hermitian) and a skew-symmetric matrix as follows:

\begin{align} A &= \frac{1}{2} * (A + A^T) + \frac{1}{2} * (A - A^T) \\ &\equiv H + S \end{align}

the \(H\) and \(S\) above are mean to represent *hermitian* and *skew* respectively.

Thus since \(H\) is positive definite it has a Cholesky Factorization

\begin{align} H = L L ^T \end{align}

One can compute this with sparse \(H\) using e.g. CHOLMOD.

If we left and right precondition \(A\) with this factor we get

\begin{align} L^{-1} A L^{-T} = I + L^{-1} S L^{-T} \end{align}

this transforms \(A\) into a *shifted skew form* i.e. since \(S\) is skew-symmetric, so is \( L^{-1} S L^{-T} \).

A shifted-skew matrix \(I+S\) is *normal* which means the performance of Krylov methods will depend solely on its spectrum. We also know the eigenvalues of \(I+S\) take the form \(1 + \alpha i\). Furthermore since the matrix \(A\) is real we know that the eigenvalues come in conjugate pairs, so that if \(1 + \alpha i\) is an eigenvalue of \(I+S\) then so is \(1 - \alpha i\). In particular if we know the spectral radius of \( I + S \) (easily computed by power iteration) then we know the interval in which all eigenvalues of \(I+S\) lie.

The Chebyshev method for solving a system of linear equations is a Krylov method like GMRES or Conjugate Gradients but instead of building up a basis through an explicit or implicit orthogonalization procedure it uses prior knowledge of the input matrix spectrum. This knowledge generally must be *very accurate* or else the procedure will converge very slowly, and since computing eigenvalues is harder than solving a linear system this limits the usefulness of Chebyshev iteration *in general*. However in light of above comments on the spectrum of \(I + S\) we can achieve *very accurate* spectral estimates using simply power iteration.

I give simple Python code below for Chebyshev iteration

```
def chebyshev(m,A,b,alpha,c,maxiter=100,verbose=True,tol=1e-10):
nu=-alpha/c
beta=0.5*c * (1.0/nu)
gamma=-alpha
x=np.zeros(m)
xm1=np.zeros(m)
xp1=np.zeros(m)
r=b.copy()
rm1=np.zeros(m)
rp1=np.zeros(m)
for n in range(0,maxiter):
if n>=2:
beta=((0.5*c)**2)*(1.0/gamma)
if n>=1:
gamma=-(alpha+beta)
if n==0:
xp1=-(r+alpha*x)/gamma
rp1=(A(r)-alpha*r)/gamma
else:
xp1=-(r+alpha*x+beta*xm1)/gamma
rp1=(A(r)-alpha*r-beta*rm1)/gamma
if verbose:
print(f"iter={n}, residual={np.linalg.norm(rp1)}")
xm1[:]=x
rm1[:]=r
x[:]=xp1
r[:]=rp1
if np.linalg.norm(r)<tol:
break
return x
```

The meaning of the parameters `alpha,c`

is that the eigenvalues of the input matrix \(A\) must all lie in the ellipse defined by complex numbers `alpha,c`

. Smaller ellipses result in faster Chebyshev iteration, and if the ellipse is as small as possible then you actually achieve *optimal* Krylov convergence rate due to the optimality property of Chebyshev polynomials.

Note that for simplicity I kept the parameters `alpha,c`

complex but in the shifted-skew case they are either pure-real or pure-imaginary and this makes it possible to deal with only pure-real arrays (to avoid duplicating memory costs for storing residual and successive iteration vectors).

I give Python code below for solving a system \(Ax=b\) using the above ideas.

```
import numpy as np
import scipy.sparse as sp
import scipy.sparse.linalg as spla
from sksparse.cholmod import cholesky
def symm_chebyshev(A,b):
#Split into hermitian and skew-hermitian parts
H=0.5*(A+A.T)
S=0.5*(A-A.T)
#Compute cholesky factorization of hermitian part
L=cholesky(H)
#Reorder original matrices according to fill-reduce ordering from cholesky
P=L.P()
b=b[P]
#A=A[P[:, np.newaxis], P[np.newaxis, :]]
H=H[P[:, np.newaxis], P[np.newaxis, :]]
S=S[P[:, np.newaxis], P[np.newaxis, :]]
#Convenience functions
def invL(x):
return L.solve_L(x,use_LDLt_decomposition=False)
def invLT(x):
return L.solve_Lt(x,use_LDLt_decomposition=False)
#Get maximum eigenvalue for the skew-symmetric matrix inv(L)*S*inv(L^T)
#"neval" below to keep track of how many times we evaluate this
#skew symmetric part as it's expensive
neval=0
def skew_factor(x):
nonlocal neval
neval+=1
y=invLT(x)
return invL(S@y)
es,_=spla.eigs(spla.LinearOperator((m,m),matvec=skew_factor),k=6,tol=1e-10)
eig_neval=neval
#Define ellipse parameters for Chebyshev iteration
alpha=1.0+0.0j
print(es)
c=max(np.imag(es))*1.0j
#Left-precondition b
b=invL(b)
#Now solve by chebyshev iteration
neval=0
x=chebyshev.chebyshev_pureimagc(m,lambda x: x + skew_factor(x),b,alpha,c,maxiter=5000,tol=1e-14,verbose=False)
#Apply right-preconditioner
x[P]=invLT(x)
cheb_neval=neval
return x
```

I did not yet extensively test this algorithm or compare against alternatives but I did confirm it works on some challenging nonsymmetric systems which I generated randomly. Convergence of the Chebyshev method is generally rapid when the skew-symmetric part is not very large compared to the symmetric part - like below:

```
iter=0, residual=0.0001340813560623113
iter=1, residual=9.845853216344927e-06
iter=2, residual=1.013103495865824e-11
iter=3, residual=2.5020940334707195e-13
iter=4, residual=4.275895432223994e-19
```

convergence can take a little longer when the skew-symmetric part is large compared to the symmetric part:

```
iter=0, residual=4.186754323212551
iter=1, residual=210.66587568527942
iter=2, residual=3.474588981192882
iter=3, residual=167.0642639225858
iter=4, residual=3.04771078119865
iter=5, residual=120.54588772351465
iter=6, residual=2.581557296578431
iter=7, residual=82.71211926014004
...
...
iter=164, residual=1.6864217663823996e-13
iter=165, residual=3.096346201216635e-13
iter=166, residual=1.1169909961323464e-13
iter=167, residual=2.0317413138929607e-13
iter=168, residual=7.397292905249339e-14
iter=169, residual=1.3331742365813827e-13
iter=170, residual=4.898175152676594e-14
iter=171, residual=8.747942227362621e-14
iter=172, residual=3.24292721522201e-14
iter=173, residual=5.740166656201045e-14
iter=174, residual=2.146760662263503e-14
iter=175, residual=3.766550578885808e-14
iter=176, residual=1.4209256729454957e-14
iter=177, residual=2.4715157991933956e-14
iter=178, residual=9.403862049980998e-15
```

This algorithm is interesting because it avoids all the pitfalls of nonsymmetry and nonnormality. The Cholesky factorization and left,right preconditioning by the Cholesky factor transforms the system into shifted-skew form even if the underlying matrix \(A\) is highly nonsymmetric and nonnormal furthermore the highly predictable spectrum of a shifted-skew matrix led to an optimal Chebysehv algorithm to solve systems involving it.

I am interested to see if there could be utility in using *incomplete* cholesky factorizations to construct preconditioners to nonsymmetric systems using the above ideas. I have not tested this idea yet however.