Visualizing Convergence of Krylov Eigensolvers

April 14, 2024

I show a simple illustration of Krylov solvers converging on eigenvectors for a very simple case: a diagonal matrix. While the diagonal matrix is a trivial example as we can analytically compute their eigenvalues (diagonal entries) and eigenvectors (columns of identity matrix), it turns out for Krylov methods these matrices pose the same difficulties as any other symmetric matrix. The fact that we can decompose any symmetric matrix \[ A = V \Lambda V^T \] with orthonormal \(V \) and diagonal \( \Lambda \) means that any polynomial of \(A\) can be fully described by that same polynomial on \( \Lambda \), and since the defining feature of Krylov methods is the use of polynomials this means we can realize the vast majority of Krylov method behavior simply by experimenting with diagonal matrices.

The Krylov Methods

I show three Krylov eigensolvers.

  1. QR Iteration
  2. QR Iteration with Ritz vectors
  3. Arnoldi Iteration

QR Iteration

This algorithm simply starts with a starting orthonormal \(V\), applies \( A \) to it, and then QR factorizes it and repeats the process.

import numpy as np
import scipy.linalg as la
from typing import Callable, Optional

def power(
    A: np.ndarray, 
    V: np.ndarray, 
    maxiter: int = 20, 
    callback: Optional[Callable[[np.ndarray], None]] = None
) -> np.ndarray:
    """
    Power iteration-like method for refining an orthonormal basis.

    This algorithm starts with an initial orthonormal matrix `V`, applies `A` to it,
    then performs a QR factorization, repeating this process for `maxiter` iterations.

    Args:
        A (np.ndarray): The matrix to be applied iteratively.
        V (np.ndarray): Initial orthonormal basis.
        maxiter (int, optional): Number of iterations to run. Default is 20.
        callback (Optional[Callable[[np.ndarray], None]], optional): 
            A function that is called with `V` at each iteration. Default is None.

    Returns:
        np.ndarray: The final orthonormal matrix after iterations.
    """
    
    # Ensure V is an orthonormal basis initially
    V, _ = la.qr(V, mode="economic")
    
    # Call the callback with the initial V if provided
    if callback:
        callback(V)
    
    # Iterative process
    for _ in range(maxiter):
        AV = A @ V  # Apply matrix A to V
        V, _ = la.qr(AV, mode="economic")  # Re-orthonormalize using QR factorization
        
        # Call the callback function at each iteration if provided
        if callback:
            callback(V)
    
    return V


QR iteration with Ritz vectors

By doing a little extra work on each iteration of the QR iteration we can compute ritz vectors which find the closest approximation to eigenvectors of A from the subspace represented by \( V \)

import numpy as np
import scipy.linalg as la
from typing import Callable, Optional

def power_ritz(
    A: np.ndarray, 
    V: np.ndarray, 
    maxiter: int = 20, 
    callback: Optional[Callable[[np.ndarray], None]] = None
) -> np.ndarray:
    """
    Power iteration with Ritz vector refinement.

    This algorithm iteratively applies `A` to an orthonormal basis `V`, 
    performs QR factorization, and refines the basis using the eigenvectors 
    of the projected matrix `V^T A V`.

    Args:
        A (np.ndarray): The matrix to be applied iteratively.
        V (np.ndarray): Initial orthonormal basis.
        maxiter (int, optional): Number of iterations to run. Default is 20.
        callback (Optional[Callable[[np.ndarray], None]], optional): 
            A function that is called with `V` at each iteration. Default is None.

    Returns:
        np.ndarray: The final orthonormal matrix after iterations.
    """
    
    # Ensure V is an orthonormal basis initially
    V, _ = la.qr(V, mode="economic")

    # Call the callback with the initial V if provided
    if callback:
        callback(V)
    
    # Iterative process
    for _ in range(maxiter):
        AV = A @ V  # Apply matrix A to V
        V, _ = la.qr(AV, mode="economic")  # Re-orthonormalize using QR factorization
        
        # Compute the Rayleigh quotient: VTAV = V^T A V
        VTAV = V.T @ AV
        
        # Solve for eigenvalues and eigenvectors of VTAV
        eigenvalues, W = la.eigh(VTAV)
        
        # Transform V using the Ritz vectors
        V = V @ W
        
        # Call the callback function at each iteration if provided
        if callback:
            callback(V)
    
    return V

Arnoldi Iteration

The next algorithm breaks from the previous algorithms by not driving all vectors in \( V\) at the same time. This leads to economy of evaluations of \(Ax\) but creates a data dependency in between every such evaluation, making it impossible to batch them. The payoff is high convergence rates.

This algorithm also orthogonalizes in between steps but does so through the Arnoldi factorization instead of the QR factorization: \(A V = V _ k * H\) where \( V = [V _ k, v] \) is orthonormal and \( H \) is upper hessenberg.

import numpy as np
import scipy.linalg as la
from typing import Callable, Optional

def arnoldi(
    A: np.ndarray, 
    v: np.ndarray, 
    k: int, 
    maxiter: int, 
    callback: Optional[Callable[[np.ndarray], None]] = None
) -> None:
    """
    Arnoldi iteration for Krylov subspace approximation.

    This algorithm constructs an orthonormal basis of the Krylov subspace using 
    the Arnoldi process and computes the Hessenberg matrix `H`. It refines the 
    basis using eigenvectors of `H` at each iteration.

    Args:
        A (np.ndarray): The matrix for which the Krylov subspace is constructed.
        v (np.ndarray): The initial vector for Krylov subspace iteration.
        k (int): The number of Arnoldi iterations to perform per cycle.
        maxiter (int): The total number of iterations.
        callback (Optional[Callable[[np.ndarray], None]], optional): 
            A function that receives the orthonormal basis at each iteration. Default is None.
    
    Returns:
        None: The function does not return a value but may call `callback` at each step.
    """

    # Utility functions for norm and dot product
    norm = np.linalg.norm
    
    m = v.size  # Dimension of the vector space
    V = np.zeros((m, k + 1))  # Orthonormal basis matrix
    H = np.zeros((k + 1, k))  # Upper Hessenberg matrix

    # Normalize initial vector
    V[:, 0] = v / norm(v)

    # Call the callback function with the initial basis if provided
    if callback:
        callback(V[:, :k])

    nevals = 0  # Number of evaluations
    while nevals < maxiter:
        # Re-normalize the starting vector at the beginning of each cycle
        V[:, 0] = v / norm(v)

        for j in range(k):
            w = A @ V[:, j]  # Apply A to the current basis vector
            h = V[:, :j + 1].T @ w  # Project w onto existing basis in V
            
            # Compute residual f after projection
            f = w - V[:, :j + 1] @ h
            # "DGKS" correction
            s = V[:, :j + 1].T @ f
            f -= V[:, :j + 1] @ s
            h += s
            
            beta = norm(f)  # Compute norm of f
            H[:j + 1, j] = h  # Store coefficients in H
            H[j + 1, j] = beta  # Subdiagonal value
            
            V[:, j + 1] = f / beta  # Normalize and store new basis vector

        nevals += 1  # Increment iteration counter

        # Compute eigenvalues and eigenvectors of the Hessenberg matrix
        eigenvalues, W = la.eigh(H[:k, :k])

        # Call the callback function with the transformed basis if provided
        if callback:
            callback(V[:, :k] @ W)

        # Update the starting vector for the next cycle
        v = V[:, :k] @ W @ np.ones(k)

The matrix and results

The matrix I use just contains positive real diagonal values starting at \(r>0\) and ending at \(1\).

$$ D = \begin{bmatrix} r & 0 & 0 & \cdots & 0 \\ 0 & r + \frac{1-r}{n-1} & 0 & \cdots & 0 \\ 0 & 0 & r + 2\frac{1-r}{n-1} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{bmatrix} $$

For visualization purposes I use a small matrix with just 32 rows, and I set each eigensolver to compute \(k=5\) eigenvectors.

All algorithms were initialized with a random vector with uniformly distributed entries.

In all cases I count an “iteration” as meaning a full sweep over the matrix \(V\), even though this is not a perfectly fair comparison between Arnoldi and QR iterations.

Finally to visualize I show the state of each iteration for the computed eigenvectors. Since the matrix is diagonal we know the eigenvectors are just columns of the identity matrix, so we know that the eigenvectors should converge to just one nonzero entry. None of these algorithms attempt any kind of shift, deflation, or careful restarting that would allow us to select specific eigenvalues so the only thing they will do is converge to some eigenvectors

Iteration Computed Eigenvectors
0
25
50
100
200

Observations

Power/QR iterations as expected converge to the largest eigenvalues. Although the ritz vector version of the algorithm seems to have trouble converging similarly it does zero out the expected portion. I’m not entirely sure why it does not converge to an identity column, it should do so no slower than the original algorithm.

The most interesting result here for me was the Arnoldi iteration which managed to pick out a wide range of eigenvalues and not just the largest ones. Generally we expect the Arnoldi iteration to excel at computing the largest eigenvalues of a matrix, but most existing algorithms will do “implicit restarting” while mine does not do this, thus I just let it converge on its own and do not attempt to guide it to a specific part of the spectrum through deflation techniques.