Compute matrix functions without materializing large matrices¶

Sometimes, we need to compute matrix exponentials, log-determinants, or similar functions of matrices, but our matrices are too big to use functions from scipy.linalg or jax.scipy.linalg. However, matrix-free linear algebra scales to even the largest of matrices. Here is how to use Matfree to compute functions of large matrices.

In [1]:

Copied!

import functools
import functools

In [2]:

Copied!

import jax
import jax

In [3]:

Copied!

from matfree import decomp, funm
from matfree import decomp, funm

In [4]:

Copied!

n = 7  # imagine n = 10^5 or larger
n = 7  # imagine n = 10^5 or larger

In [5]:

Copied!

key = jax.random.PRNGKey(1)
key, subkey = jax.random.split(key, num=2)
large_matrix = jax.random.normal(subkey, shape=(n, n))
key = jax.random.PRNGKey(1)
key, subkey = jax.random.split(key, num=2)
large_matrix = jax.random.normal(subkey, shape=(n, n))

The expected value is computed with jax.scipy.linalg.

In [6]:

Copied!





key, subkey = jax.random.split(key, num=2)
vector = jax.random.normal(subkey, shape=(n,))
expected = jax.scipy.linalg.expm(large_matrix) @ vector
print(expected)
key, subkey = jax.random.split(key, num=2)
vector = jax.random.normal(subkey, shape=(n,))
expected = jax.scipy.linalg.expm(large_matrix) @ vector
print(expected)

[  3.7240484 -10.36642    10.205571   -4.89606    -4.2590394 -19.160347
 -10.71674  ]

Instead of using jax.scipy.linalg, we can use matrix-vector products in combination with the Arnoldi iteration to approximate the matrix-function-vector product.

In [7]:

Copied!

def large_matvec(v):
    """Evaluate a matrix-vector product."""
    return large_matrix @ v
def large_matvec(v):
    """Evaluate a matrix-vector product."""
    return large_matrix @ v

In [8]:

Copied!





num_matvecs = 5
arnoldi = decomp.hessenberg(num_matvecs, reortho="full")
dense_funm = funm.dense_funm_pade_exp()
matfun_vec = funm.funm_arnoldi(dense_funm, arnoldi)
received = matfun_vec(large_matvec, vector)
print(received)
num_matvecs = 5
arnoldi = decomp.hessenberg(num_matvecs, reortho="full")
dense_funm = funm.dense_funm_pade_exp()
matfun_vec = funm.funm_arnoldi(dense_funm, arnoldi)
received = matfun_vec(large_matvec, vector)
print(received)

[  3.872463   -9.765792   10.424392   -5.4151964  -4.4425015 -19.06055
 -10.98663  ]

The matrix-function vector product can be combined with all usual JAX transformations. For example, after fixing the matvec-function as the first argument, we can vectorize the matrix function with jax.vmap and compile it with jax.jit.

In [9]:

Copied!





matfun_vec = functools.partial(matfun_vec, large_matvec)
key, subkey = jax.random.split(key, num=2)
vector_batch = jax.random.normal(subkey, shape=(5, n))  # a batch of 5 vectors
received = jax.jit(jax.vmap(matfun_vec))(vector_batch)
print(received.shape)
matfun_vec = functools.partial(matfun_vec, large_matvec)
key, subkey = jax.random.split(key, num=2)
vector_batch = jax.random.normal(subkey, shape=(5, n))  # a batch of 5 vectors
received = jax.jit(jax.vmap(matfun_vec))(vector_batch)
print(received.shape)

(5, 7)

Talking about function transformations: we can also reverse-mode-differentiate the matrix functions efficiently.

In [10]:

Copied!

jac = jax.jacrev(matfun_vec)(vector)
print(jac)
jac = jax.jacrev(matfun_vec)(vector)
print(jac)

[[ 0.8971387  -0.02348417 -0.68017864 -1.7201867   0.2719905   1.6036301
  -0.11475027]
 [-3.3453507   3.0401456  -2.3894792   3.1318164   5.752718    2.9370975
   2.9685035 ]
 [ 3.0593503  -3.5127664   2.103809   -1.3352948  -4.219494   -2.619301
  -4.5912113 ]
 [-2.1280344   0.62076676 -1.8225577   2.5717204   2.2454839   0.36229575
   1.7297204 ]
 [-1.7773658   1.3572413  -2.3025558   1.2953174   3.2735302   2.3572636
   2.6441016 ]
 [-5.8010306   5.587347   -3.9821255   4.158319    9.9962635   6.182465
   8.151907  ]
 [-2.4675639   3.2210712  -2.5758793   3.1784682   5.0637727   1.243409
   3.6850948 ]]

Under the hood, reverse-mode derivatives of Arnoldi- and Lanczos-based matrix functions use the fast algorithm for gradients of the Lanczos and Arnoldi iterations from this paper. Please consider citing it if you use reverse-mode derivatives functions of matrices (a BibTex is here).