Model Reduction for a Parameterized Linear System

August 23, 2021

Here I demonstrate model reduction on a parameterized system of linear equations, compressing the system from its original size of 16384 rows and columns to a mere 20 rows while still recovering useful accuracy. Model reduction makes it possible to take an otherwise expensive problem, like solving a large sparse linear system, and compress it to a much smaller and cheaper version. This has most utility for situations where one must solve the expensive problem repeatedly but for many parameter values (e.g. in a Monte Carlo simulation).

\(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\)

The parameterized system of linear equations

I define the parameterized system \( \mathbf{A}(\mathbf{\mu})\mathbf{x} = \mathbf{b} \) as follows:

\begin{align} (\mathbf{A}(\mathbf{\mu})\mathbf{z}) _ {i,j} = 4( \mu _ 0 + \mu _ 1 x _ {i,j} + \mu _ 2 y _ {i,j} + \mu _ 3 x _ {i,j} y _ {i,j} + \mu _ 4 x _ {i,j}^2 + \mu _ 5 y _ {i,j}^2 ) z _ {i,j} - z _ {i-1,j} - z _ {i+1,j} - z _ {i,j-1} - z _ {i,j+1} \end{align}

for \( i=0,\ldots,m_x\), \(j=0,\ldots, m_y \) and \( x _ {i,j}, y _ {i,j} \in [0,1] \times [0,1] \) are equally spaced samples. The parameters are all in the interval \( [-1,-0.01] \).

The Python code for assemble below will generate this matrix:

import numpy as np
import scipy.sparse as sp
def lap1d(m):
    return sp.lil_matrix(sp.diags((np.ones(m),-2.0*np.ones(m),np.ones(m)),(-1,0,1),shape=(m,m)))
#2D laplacian matrix constructed as kroneckor products of 1D laplacian
def lap2d(mx,my):
    Ix=sp.eye(mx)
    Iy=sp.eye(my)
    Lx=lap1d(mx)
    Ly=lap1d(my)
    return sp.kron(Ix,Ly)+sp.kron(Lx,Iy)
#Assemble the parameterized system
def assemble(mx,my,mus):
    assert(len(mus)==6)
    m=mx*my
    A=lap2d(mx,my)
    xs=np.linspace(0.0,1.0,mx)
    ys=np.linspace(0.0,1.0,my)
    xs,ys=np.meshgrid(xs,ys)
    xs=xs.flatten()
    ys=ys.flatten()
    fs=mus[0]+mus[1]*xs+mus[2]*ys+mus[3]*xs*ys+mus[4]*xs*xs+mus[5]*ys*ys
    D=sp.diags(fs*np.ones(m),0,shape=(m,m))
    return sp.csc_matrix(A+D)

I do not parameterize the right-hand-side vector \( \mathbf{b} \) and so it is constant in experiments that follow. It is generated by the following python code:

#Create 2D grid of equally spaced points
xs,ys=np.meshgrid(np.linspace(0,20,mx),np.linspace(0,20,my))
#Generate a reasonably smooth function on this grid
b=np.sin(xs)+np.cos(ys)*np.cos(ys)
#Linearize the vector
b=b.flatten()

For experiments below I take \(m _ x = m _ y = 128\) leading to a linear system size of \(m=m _ x m _ y = 16384 \) rows and columns. The system is sparse, however.

Overview of Model Reduction

There are many ways to accomplish a reduced model, I will take the simplest approach here. The basic idea is if the matrix \( \mathbf{A}(\mathbf{\mu}) \) in \( \mathbf{A}(\mathbf{\mu}) \mathbf{x} = \mathbf{b} \) is a \( m \times m \) matrix, then we try to find a \( m \times k \) matrix \( \mathbf{Q} \) with orthogonal columns (These columns are often called “basis vectors”) and \( k \) much smaller than \( m \) such that when solving the reduced \( k \times k \) system

\[ \mathbf{Q}^T \mathbf{A} (\mathbf{\mu} ) \mathbf{Q} \mathbf {x} _ h = \mathbf{Q}^T \mathbf{b} \]

we can reconstruct a non-reduced solution \( \mathbf{Q} \mathbf{x} _ h \) with prescribed error tolerance \( \epsilon \) such that

\[ \norm{ \mathbf{x} - \mathbf{Q} \mathbf{x} _ h} < \epsilon \]

holds indepedently of the parameter \( \mathbf{\mu} \).

In other words: instead of solving the much larger system involving \( \mathbf{A} ( \mathbf{\mu} ) \) every single time we change its parameter \( \mathbf{\mu} \), we instead solve the much smaller reduced system - ideally resulting in computation savings.

Choosing the Basis by Greedy Algorithm

A very common way to choose the basis vectors is by greedy algorithm. We start with a single solution to the parameterized system corresponding to a single choice of the parameter \( \mathbf{\mu} \), and then for every subsequent addition to the basis vector set we try to find the choice of \( \mathbf{\mu} \) which maximizes the residual of the linear system - the resulting solution then goes into the basis vector set and we QR factorize the result so that they are orthogonal to each other.

In math we can define the basis vectors and corresponding parameter choices \( \mathbf{\mu} \) inductively as follows:

\begin{align} \mathbf{\mu} _ 0 &= \text{ any choice } \\ \mathbf{\mu} _ {i+1} &= \text{argmax} _ {\mathbf{\mu}} \norm {\mathbf{A}(\mathbf{\mu}) \mathbf{Q} _ i \mathbf{x} _ h - \mathbf{b}} \\ \mathbf{d} _ i &= \mathbf{A}(\mathbf{\mu} _ i)^{-1}\mathbf{b} \\ \end{align}

where \( \mathbf{Q} \) is the result of QR factorizing the vectors \(\mathbf{d} _ 0, \ldots, \mathbf{d} _ i \) and \( \mathbf{x} _ h \) solves the reduced system associated with the previous iteration’s basis vectors.

I use the NAG Derivative-Free nonlinear optimizer (e04jdc) to solve the optimization problem specified above. I chose a derivative-free-optimizer (DFO) because the gradient of the residual can depend in a complex nonlinear way on the parameter \( \mathbf(\mathbf{\mu} )\). Instead of trying to derive the derivative by hand or spending time using Algorithmic Differentiation I just chose a solver that did not require a derivative (the trade-off is these optimizers can be slower without that extra derivative information).

I give the code for generating the basis below:

import numpy as np
import scipy.linalg as la
from numpy.random import default_rng
from naginterfaces.library.opt import handle_solve_dfno,handle_init,handle_set_simplebounds,handle_set_nlnobj
from naginterfaces.library.opt import handle_opt_set



#Start off with some random parameters
k=5
mus=rng.uniform(-1.0,-0.01,(nmu,k))


#How many more basis vectors to compute by greedy search
nmore=50
#Solve for these parameters and orthogonalize
D=np.zeros((m,k+nmore))
for i in range(0,k):
    A=assemble(mx,my,mus[:,i])
    luA=spla.splu(A)
    D[:,i]=luA.solve(b)
Q,_=la.qr(D[:,0:k],mode='economic')



#Now actually do the greedy search 
rs=np.zeros(nmore)
for j in range(0,nmore):
    #Set up NAG DFO solver
    handle=handle_init(nmu)
    bl=np.zeros(nmu)
    bu=np.zeros(nmu)
    bl[:]=-1.0
    bu[:]=-0.01
    handle_set_simplebounds(handle,bl,bu)
    handle_set_nlnobj(handle,list(range(1,nmu+1)))
    handle_opt_set(handle,"Print Level = 0")
    handle_opt_set(handle,"DFO Maximum Slow Steps = 200")


    def objfun(xs,inform,data=None):
        #Assemble matrix associated with parameters
        A=assemble(mx,my,xs)
        #Project to reduced system
        Ah=Q.T @ (A @ Q)
        #Project RHS to reduced space
        bh = Q.T @ b
        #Solve reduced system
        xh = la.solve(Ah,bh)
        #Project solution back to full space
        x = Q @ xh
        #Compute residual
        r = b - A@x
        #Maximize the residual, so take negative, square for better smoothness
        obj = np.linalg.norm(r)
        obj = -obj*obj
        return (obj,1)
    mu,rinfo,_ = handle_solve_dfno(handle,-np.ones(nmu),objfun=objfun)
    rs[j]=rinfo[0]

    A=assemble(mx,my,mu)
    luA=spla.splu(A)
    D[:,k+j]=luA.solve(b)
    #Q now stores the basis vectors from this current iteration
    Q,_=la.qr(D[:,0:k+j+1],mode='economic')

I validate the above code in a few ways in the following sections:

  1. I compute histograms of errors after randomly sampling the parameter space and solving with the reduced model
  2. I plot a few select reduced-system solutions alongside their corresponding full-system (exact) solutions
  3. I plot the greedy algorithm residuals to show they are generally decreasing

Results for 20 basis functions

Greedy Algorithm Residuals

Each step of the greedy algorithm produces a residual that is the largest the numerical optimizer was able to find. The optimization problem is not necessarily convex so the optimizer isn’t guaranteed to find the best possible at each iteration, even more so since I used a derivative-free variant.

Error Histogram

Next I sample the parameters \( \mathbf{\mu} \in [-1.0,-0.01]^5 \) with 1000 samples from a uniform distribution and compute the relative error between the reduced model solution \( \mathbf{x} _ h \) and the exact solution \( \mathbf{x} \) for each of these parameters. I take the logarithm of the relative error and so the number on the x-axis may be roughly interpreted as “number of correct digits.”

Solution Plots

Below I plot solutions for the reduced and exact models

Results for 50 basis functions

Next I increase the number of basis functions from 20 to 50 to confirm that more basis functions result in more accurate reduced model

Greedy Algorithm Residuals

Each step of the greedy algorithm produces a residual that is the largest the numerical optimizer was able to find. The optimization problem is not necessarily convex so the optimizer isn’t guaranteed to find the best possible at each iteration, even more so since I used a derivative-free variant.

Error Histogram

Next I sample the parameters \( \mathbf{\mu} \in [-1.0,-0.01]^5 \) with 1000 samples from a uniform distribution and compute the relative error between the reduced model solution \( \mathbf{x} _ h \) and the exact solution \( \mathbf{x} \) for each of these parameters. I take the logarithm of the relative error and so the number on the x-axis may be roughly interpreted as “number of correct digits.”

Solution Plots

Below I plot solutions for the reduced and exact models

Comparing Greedy basis to Random basis

Does the extra sophistication of a greedy algorithm to choose the basis actually help? I plot below the error histograms of a basis chosen simply by randomly sampling the parameter space for \( \mu \) on top of the greedy basis histogram - both using the same number of vectors.

The greedy algorithm does appear to offer significant improvement in accuracy over more naive sampling approaches.

Conclusion

Using a greedy algorithm driven by a derivative-free optimizer to choose basis functions for a reduced-order model allowed me to approximate a sparse linear system with over 10,000 rows and columns with a system that had only 25 rows and columns (the reduced model). Increasing the number of basis functions so that the reduced model had 55 rows and columns improved accuracy. A derivative-free optimizer may not be necessary if I used something like algorithmic differentiation to handle the potentially complex dependence of the derivative on the paramter \( \mathbf{\mu} \)