Note on Quantizing Deflation Subspaces

March 28, 2026

Here I achieve a roughly 4X reduction in size for a deflation basis without impacting convergence performance of that basis. This can in theory result in faster deflation solves with very little downside. Deflation methods allow you to precompute and store a deflation basis for a matrix A and use that stored basis to accelerate convergence on subsequent iterative solves involving A. It is an explicit tradeoff where we accept an upfront computational cost (subspace computation, usually an eigensolver) as well as additional memory footprint and memory traffic on subsequent iterations, since every iteration of a deflation method will require us to read the deflation vectors at least once.

I demonstrate my methodology and results below

Methodology and Results

Using my Krylov-bits library I formed indefinite sparse matrices of the following form (with different mx,my)

mx=32
my=32
m=mx*my
bands = [0,1,mx]
A = spla.diags([rng.uniform(-1,1,size=m) for _ in bands],bands,shape=(m,m))
#Symmetrize
A = A + A.T

In each case I precomputed a deflation set equal to 5% of the explored sizes:

mx=my n=mx*my Deflation subspace size k
32102451
644096205
12816384819

Next I solves a linear system using minres without deflation, with deflation, and then with deflation and unsetting mantissa bits. I record iteration counts and relative residuals. All results are fp64 so with 4 kept mantissa bits we achieve an effective 4X compression against the originally computed deflation basis becuase

fp64 = 1 sign bit + 11 exponent bits + 52 mantissa bits
packed_deflation_format = 1 sign bit + 11 exponent bits + 4 mantissa bits

I do not actually pack the above bits into a 16 bit integer in order to realize the benefits, but I will experiment with doing this in a later post. So these results should be viewed as a simulated benefit.

Small example (m=1024)

Run Kept mantissa bits MINRES iterations Relative residual Relative error
Plain MINRES5214906.088e-073.113e-03
Deflated MINRES, unquantized basis522543.592e-072.775e-06
Deflated MINRES, quantized basis42603.671e-073.387e-06
Deflated MINRES, quantized basis13729.465e-072.120e-04

Larger Example (m=4096)

Run Kept mantissa bits MINRES iterations Relative residual Relative error
Plain MINRES5262851.296e-067.321e-04
Deflated MINRES, unquantized basis523006.520e-076.050e-06
Deflated MINRES, quantized basis43006.622e-076.158e-06

Largest Example (m=16384)

Run Kept mantissa bits MINRES iterations Relative residual Relative error
Plain MINRES5296581.939e-068.364e-03
Deflated MINRES, unquantized basis523261.525e-061.374e-05
Deflated MINRES, quantized basis43271.410e-061.243e-05

Conclusion

I precomputed a deflation basis in FP64, then post hoc quantized that basis by truncating mantissa bits while preserving the FP64 sign and exponent fields. Keeping only 4 mantissa bits corresponds to an effective 16-bit representation per basis entry, or about a 4x reduction in storage relative to FP64. Across these experiments, this caused little to no degradation in the convergence of the subsequent deflated iterative solve. In principle, those 16 bits could be packed into an integer representation to reduce memory traffic and basis storage. Additional savings may be possible by compressing exponent information as well, for example with shared exponents over spatial blocks if the basis exhibits local correlation.