Abstract Numerical Linear Algebra

October 14, 2016

Not surprisingly from its name, Linear Algebra gives tools to analyze functions which are linear. Computational linear algebra however customarily requires a matrix representation of the linear functions under study, usually with floating point number entries. A number of algorithms explicitly require this, such as LU and Cholesky factorizations which are important tools for solving systems of linear equations. This somewhat breaks away from the “interface” of linearity however, where we are no longer only using the fact that the function is linear and are instead using only a particular kind of linear function, namely a matrix. Algorithms do exist which only use the underlying vector space structure and linearity of the function on that vector space. These are often called “matrix-free,” these are such algorithms as Richardson iteration, GMRES, BiCGSTAB, Arnoldi iteration, and many more; These algorithms are especially constructed to work without knowledge of a matrix representation. This results in a gap between theory and computation. On the theoretical side, usually little more information than linearity is required to prove interesting and useful results about a linear function. There is a very powerful spectral theory for bounded operators, self adjoint operators, and normal operators for example. Yet when entering computation more information is demanded. I decided to try and bridge this gap using a progrmaming language called OCaml. I want to see if by following arguments made on the theoretical side I could reproduce some elements of spectral theory for linear operators in a way that is also computational - i.e. gives us an algorithm for computing these quantities. This will open the algorithms of computational linear algebra to linear functions which can not be easily represented by a matrix. I have been moderately successful already, so I will document the early results here.

This is ongoing work and you can track the progress On Github

Defining Linear Spaces in OCaml

Before starting I’ll state right away some design choices I made. I decided to have all vector spaces explicitly over the Complex field rather than possibly have a non-Complex field. The reason for this is it prevents me from being lazy and assuming that inner products will commute, and it also contains the real-only case just fine as a special case. OCaml has its own complex numbers module, but it appears to require the underlying number representation to be a floating point number. I did not want this restriction in case I wanted to try more general numbers such as arbitrary precision or rational numbers. The module I defined for complex numbers is as follows:

module type ComplexNumber = sig  
  (*Type for real part*)
  type ret
  (*Type for imaginary part*)
  type imt
  (*Type for complex number*)
  type t
  (*Make a complex number out of real and imaginary parts*)
  val mk : ret -> imt -> t
  (*Real part of complex number*)
  val re : t -> ret
  (*Imaginary part of complex number*)
  val im : t -> imt
  (*Complex conjugate*)
  val conj  : t -> t
  (*Multiplication*)
  val  mul  : t -> t -> t
  (*Addition*)
  val  add  : t -> t -> t
  (*Multiplicative inverse*)
  val  inv  : t -> t
  (*Additive inverse*)
  val  neg  : t -> t
  (*Square root*)
  val sqrt : t -> t
  (*Additive identity*)
  val zero : t
  (*Multiplicative identity*)
  val one  : t
  (* "to_string" convenience function*)
  val show : t -> string
end;;

With the field defined, I now define a “Hilbert space.” Hilbert spaces are often associated with non-constructive arguments because of their completeness property, but I chose this because most interesting computational linear algebra algorithms are iterative in nature and therefore only make sense in a context where convergence works. I chose the term to reinforce that this is not just an arbitrary Vector space, but one in which we will want to do real computations iteratively.

module type HilbertSpace = sig
  (*Vector type*)
  type vect
  (*Complex number type*)
  type ct 
  (*Null vector*)
  val nullvector : vect
  (*Basis functions. basis i = i-th basis function*)
  val basis      : int -> vect
  (*Scalar multiplication: constant times a vector*)
  val scalarmul  : ct -> vect -> vect
  (*Vector addition*)
  val       add  : vect -> vect -> vect
  (*Inner product*)
  val innerprod  : vect -> vect -> ct
  (*Vector Norm*)
  val norm       : vect -> ct
  (*Convenience printing function*)
  val show       : vect -> string
end;;

One interesting thing to note here is that I made the existence of a basis (which now the user must supply) a part of the definition of the Hilbert space. If we are working non-constructively then the axiom of choice guarantees the existence of a basis and this would not be necessary. Since I wanted to enforce computability here I require the user of this module to supply a basis. Note I do not require the basis to be a list of vectors, which would imply that it is finite. It is rather a function from integers to basis functions, which keeps open the possibility that the hilbert space is not finite dimensional.

Finally I defined another module which is designed to extend the complex number and hilbert space modules by building on their interfaces to implement the computational algorithms. For now it has the placeholder name “Orthogonalizable” so named becuase the first thing I did was implement the Gram-Schmidt algorithm.

module type Orthogonalizable = sig
  (*Type of complex numbers*)
  type ct
  (*Type of vectors*)
  type vect
  (*Orthogonalize two vectors*)
  val orthogonalize2 : vect -> vect -> vect list
  (*Orthogonalize array of vectors*)
  val orthogonalize  : vect array -> vect array
  (*Compute adjoint of linear operator*)
  val adj            : int -> (vect -> vect) -> vect -> vect
  (*Compute operator norm of linear operator*)
  val opnorm         : int -> int -> (vect -> vect) -> ct
end;;

This is immediately followed by a functor

module MakeOrthogonalizable (C : ComplexNumber) (H : HilbertSpace with type ct=C.t) : Orthogonalizable with type vect=H.vect with type ct = C.t= struct
(*implementations using hilbert space and complex number functions*)
end;;

The list of functions in the module signature “Orthogonalizable” will grow, but for now I will describe how I computed the following:

Orthogonalizing Vectors

Given a linearly independent list of vectors it is possible to produce another list of vectors which spans the same space, but are orthonormal. This is a well-known fact and it is the easiest algorithm to implement abstractly because it only requires the underlying inner product. The algorithm for doing so is called the Gram-Schmidt algorithm and I implemented it inside the MakeOrthogonalizable functor:

  let orthogonalize xs = 
    let normalize u = H.scalarmul (C.inv (H.norm u)) u in
    let m = Array.length xs in
    let qs=xs in
    for k = 0 to (m-1) do
      let w = ref xs.(k) in
      for j = 0 to (k-1) do
        let rjk = H.innerprod (!w) qs.(j) in
        w := H.add (!w) (H.scalarmul (C.neg rjk) qs.(j));
      done;
      qs.(k) <- normalize !w
    done;
    qs

Despite its relative simplicity, this procedure has extremely wide reaching consequences. Orthogonalizing vectors not only makes computing matrix-free adjoints possible (discussed next), but it is the foundation of virtually every eigenvalue algorithm and quite a few linear system algorithms as well. Next I show how this enables the computation of an operator’s adjoint.

Computing Adjoints

Most linear algebra users are aware of adjoints as the transpose of a matrix but there is also a very general definition of an adjoint as follows:

Suppose that \(H\) is a Hilbert space with inner product \(\cdot,\cdot\) and suppose further that \(\mathbf{A}\) is a linear operator from \(H\) into \(H\). Then \(\mathbf{B}\) is an adjoint of \(\mathbf{A}\) is a linear operator such that

\[ (\mathbf{A}u, v) = (u,\mathbf{B}v) \]

holds for all \(u,v\in H\). In many cases this adjoint is unique, in which case we denote it \(\mathbf{A}^*\).

Existence of the adjoint follows from the Riesz representation theorem, which is usually another non-constructive result - but in almost all cases the following construction implements the adjoint

Suppose that \(n\) is an integer and that \( e_1,\ldots,e_n\) are orthonormal vectors in the hilbert space \(H\). Suppose further that \(y\) is a linear combination of these basis functions. Then

\[ \mathbf{A}^*y = \sum_{i=1}^n(y,\mathbf{A}e_i)e_i \]

and this I have done in OCaml:

(*n: How many orthonormal basis vectors to use*)
(*a: linear operator*)
(*u: input vector*)
  let adj (n:int) (a:vect->vect) (u:vect) = 
    let bs     = Array.make (n+1) (H.nullvector) in
    for i = 0 to n do
      bs.(i) <- H.basis i;
    done;
    let obs = orthogonalize bs in
    let l v   = H.innerprod (a v) u in
    let ls    = Array.map l obs in
    let cls   = Array.map C.conj ls in
    let ccls  = Array.map2 H.scalarmul cls obs in
    let z     = Array.fold_left H.add H.nullvector ccls in
    z
  ;;

This finally gives us a path to computing the operator norm of a linear operator, using power iteration.

The Operator Norm

While power iteration normally used does not require a matrix representation of a linear operator, it is usually applied to the operator applied to its own adjoint rather than the operator itself - and in code that is usually done with a matrix transpose which does require a matrix representation. Now that we have the matrix-free adjoint, power iteration can also give us the operator norm in a matrix-free fashion:

  let opnorm maxit n a = 
    let normalize u = H.scalarmul (C.inv (H.norm u)) u in
    let m = ref H.nullvector in
    for i = 0 to n do
      m := H.add (!m) (H.basis i)
    done;
    m := normalize !m;
    let adja = adj n a in
    let adja_a u = adja (a u) in
    let b = adja_a in

    for j=0 to maxit do
      m := normalize (b !m)
    done;
    C.sqrt (H.norm (b !m))

There are many more things to implement, but this is a work in progress. I will show some early results on a nontrivial case though.

A Nontrivial Example

One common problem in practice is computing the derivative of polynomials. The derivative is a well known linear operator, but it is not often convenient to represent polynomials as a vector or list of floating point numbers. Take for example the OCaml code which represents the polynomial as a syntax tree and computes its derivative via recursion and the product rule:

type poly  = 
    Val of FloatComplex.t
  | Var
  | Add of (poly*poly)
  | Mul of (poly*poly)
;;


let rec diff p = 
  match p with
    Val z       -> Val (FloatComplex.mk 0.0 0.0) (*Derivative of constant is zero*)
  | Var         -> Val (FloatComplex.mk 1.0 0.0) (*Derivative of variable is one*)
  | Add (p1,p2) -> Add ((diff p1),(diff p2))     (*Derivative of sum is sum of derivatives*)
  | Mul (p1,p2) -> Add ((Mul ((diff p1),p2)),(Mul (p1,(diff p2)))) (*Product rule*)

and let’s suppose further the inner product is so-defined:

\[ (P_1,P_2) = \int_{-1}^1 P_1(x) \overline{P_2(x)} dx \]

which again can be very easily implemented for polynomials represented as above.

This example is particularly helpful to me because there is already a paper by Sevtap Özışık et al which computes the operator norms of this derivative operator on polynomials, and therefore I can check my code against those in the paper.

I ran opnorm for number of basis functions in the Riesz representation construction equal to the order in Sevtap’s paper and I ran it for 100 iterations for each order. This was done in OCaml as

let () = 
  for i = 1 to 10 do
    let ndiff = MyOrth.opnorm 100 i diff in
    (*Squaring here because Sevtap's paper takes square root*)
    print_endline (FloatComplex.show (pow 2 ndiff));
  done;

which gives the results

3. + 0.i
15. + 0.i
42.531225624 + 0.i
95.0587828773 + 0.i
184.726234465 + 0.i
326.150767117 + 0.i
536.374221131 + 0.i
834.861502547 + 0.i
1243.50419935 + 0.i
1786.62294702 + 0.i

these can be verfied against the constant \(C_N\) shown in Table 1 of Sevtap’s paper.

Looking Ahead

So far I have figured out how to compute the adjoint of an operator in a matrix-free fashion, and then use this to compute its operator norm. I illustrated this by computing correctly the operator norm of the polynomial derivative operator even though it was written in a highly non-matrix style by recursion over a syntax tree.

To extend these results to a fully featured spectral theory I will need to implement the QR factorization of a list of vectors. This is already almost done by my existing orthogonalization function, I just throw away the resulting R factor which could instead be returned. With this factorization in hand it is possible to compute the whole spectrum of many suitably behaved operators via the QR iteration. Again this is completely without inputting any matrices to the routine. I am already investigating this path, but will need to make some choices about how to do this in a general way that will smoothly handle multiplicities in the spectrum.

Once I have the QR iteration code in-hand I can use it to compute an abstract SVD factorization of an input linear operator. The SVD is a very well behaved factorization across the whole space of linear operators, which is not true for spectral decompositions.

At this point the path to a “computable spectral theory” without matrices is fairly clear, but how properties like normality,self-adjointness interact with the above algorithms will need some careful investigation, hopefully leading to more interesting code and blog posts.