7  Appendix: Notes on Gaussian Approximation

Mathematically, computing the mutual information between trajectories in the Gaussian framework is relatively simple. From Shannon’s formula for the entropy of a multivariate Gaussian distribution it follows that the MI is given by

I(S,X)=12ln(|Σss||Σxx||Z|)[nats].(7.1)I(S, X) = \frac{1}{2} \ln \left( \frac{|\Sigma_{ss}|\,|\Sigma_{xx}|}{|Z|} \right)\quad \mathrm{[nats]}\,. \qquad(7.1)

See  [1,2] for details. This is a straightforward formula involving the logarithm of determinants of symmetric matrices.

However, computationally evaluating this formula correctly and efficiently requires some thought. The matrices Σss\Sigma_{ss} and Σxx\Sigma_{xx} have dimensions N×NN\times N, and the matrix ZZ has dimensions 2N×2N2N \times 2N where NN is the trajectory length. Thus, for long trajectories, the involved matrices can become very large. Computing the determinant of a large matrix is computationally non-trivial and possibly numerically unstable. Note that the generic algorithm to compute the determinant of a matrix of dimensions N×NN \times N scales with order N3N^3. Thus, doubling the size of the matrix makes the computation of the determinant take approximately 8 times as long. Moreover, the determinant of a large matrix may be very close to zero or very close to infinity. Representing such numbers in the computer using floating point representations can lead to significant numerical accuracy issues. In practice this further limits the maximal size of covariance matrices that one can use.

Fortunately, we can leverage the special structure of the Gaussian covariance matrices to simplify the computation of determinants significantly. Generally, all involved matrices are symmetric and positive definite. Using a clever trick we can speed up the computation of the determinant of a symmetric matrix. But even then, we still have a scaling of n3n^3 with matrix size. Using one additional assumption we can do much better. If the system under consideration is in steady state, the matrices have Toeplitz structure. We will see that this structure allows us to construct a nlognn \log n scaling algorithm that approximates the determinant very well for large nn. Lastly, to deal with numerical accuracy issues, we will discuss how to compute the log-determinant of a matrix directly. This will be dramatically more accurate than first computing the determinant and then taking the logarithm.

7.1 Structure of Covariance Matrices in the Gaussian Framework

We consider a Gaussian system with stochastic input 𝒔\mathbfit{s} and output 𝒙\mathbfit{x}. Both, input and output, are vectors representing trajectories s(t)s(t) and x(t)x(t) sampled at discrete times t0,,td1t_0,\ldots,t_{d-1}. Hence, dd \in \mathbb{N} is the dimensionality of the vector space of 𝒔\mathbfit{s} and 𝒙\mathbfit{x}. The joint probability density of 𝒔\mathbfit{s} and 𝒙\mathbfit{x} is given by

P(𝒔,𝒙)=1(2π|Z|)dexp[12(𝒔𝒙)Z1(𝒔𝒙)](7.2)\mathrm{P}(\mathbfit{s}, \mathbfit{x}) = \frac{1}{(2\pi |Z|)^{d}} \exp\left[ -\frac{1}{2} \left( \begin{array}{cc} \mathbfit{s} & \mathbfit{x} \end{array} \right) Z^{-1} \left( \begin{array}{c} \mathbfit{s} \\ \mathbfit{x} \end{array} \right) \right] \qquad(7.2)

which is the PDF of a 2d2d-dimensional multivariate Gaussian distribution. ZZ is a 2d×2d2d \times 2d matrix with block form

Z=(ΣssΣsxΣxsΣxx).(7.3)Z = \left( \begin{array}{cc} \Sigma_{ss} & \Sigma_{sx} \\ \Sigma_{xs} & \Sigma_{xx} \end{array} \right)\,. \qquad(7.3)

The individual block entries of ZZ correspond to the (cross-)covariance matrices of input and output. Specifically, Σαβ\Sigma_{\alpha\beta} is a d×dd\times d matrix with entries

Σαβij=α(ti)β(tj).(7.4)\Sigma^{ij}_{\alpha\beta} = \langle \alpha(t_i) \beta(t_j) \rangle \,. \qquad(7.4)

We can make a few observations concerning the form of these matrices. It follows immediately from the definition that both Σss\Sigma_{ss} and Σxx\Sigma_{xx} are positive definite, symmetric matrices. Σsx\Sigma_{sx} and Σxs\Sigma_{xs} are generally not symmetric, but they are transposes of each other, i.e., ΣsxT=Σxs\Sigma^T_{sx} = \Sigma_{xs}. It follows from these two observations that ZZ itself is symmetric. Furthermore, we usually deal with systems in steady state, in which case the matrices have even more structure. When a system is in steady state, the correlation functions are (time-)translation invariant. That means that the correlation of observations at tit_i and tjt_j only depends on the difference tjtit_j-t_i, i.e., we can find functions Cαβ(t)C_{\alpha\beta}(t) such that Cαβ(tjti)=α(ti)β(tj)C_{\alpha\beta}(t_j-t_i) = \langle \alpha(t_i) \beta(t_j) \rangle. It follows that the matrix element Σαβij\Sigma^{ij}_{\alpha\beta} only depends on the difference jij-i. We can thus write the matrix elements as Σαβij=cαβji\Sigma^{ij}_{\alpha\beta} = c^{j-i}_{\alpha\beta} where cαβi=Cαβ(tit0)c^{i}_{\alpha\beta} = C_{\alpha\beta}(t_i-t_0) are the components of a vector 𝒄\mathbfit{c} that fully specifies the matrix. A matrix with this structure is called Toeplitz matrix. We can use this structure to simplify the computation of determinants significantly, as described below. But before we explain how to compute determinants efficiently, we discuss one further issue regarding the matrix ZZ itself.

Note that unlike the matrices Σαβ\Sigma_{\alpha\beta}, ZZ itself does not have Toeplitz structure in general. This means the method described below is not directly applicable for the computation of |Z||Z|. Yet, it seems we need to find |Z||Z| for computing the MI using Equation 7.1. It turns out that this is not necessary. One can make use of the fact that ZZ is composed of Toeplitz blocks to avoid computing its determinant. At the end of the following section we show how to adapt the formula described in Ref.  [3] for computing the mutual information rate to derive an efficient approximation of it in terms of the discrete Fourier transform.

7.2 Efficient Evaluation of Determinants

For a symmetric positive definite matrix Σd×d\Sigma\in\mathbb{R}^{d\times d} the determinant can be computed faster than for a general matrix. We exploit the fact that such a matrix can always be decomposed as

Σ=LLT(7.5)\Sigma = L L^T \qquad(7.5)

where Ld×dL\in\mathbb{R}^{d\times d} is a lower triangular matrix with positive diagonal entries. This factorization is called Cholesky decomposition and there are efficient algorithms for computing the Cholesky decomposition in many numerical linear algebra packages.

Now we make use of two basic properties of the determinant. First, the determinant of a product of matrices is equal to the product of the individual matrices’ determinants. Second, the determinant of the transpose is equal to the determinant of the original matrix. Combining these two facts, we see that |Σ|=|L||LT|=|L|2|\Sigma|=|L| |L^T| = |L|^2. Now, we remember that LL is a lower triangular matrix. It is easily checked that the determinant of a lower triangular matrix is given by the product of its diagonal entries 0,,d1\ell_0,\ldots,\ell_{d-1} where k=Lkk\ell_k = L^{kk}. Thus, we find two important identities for symmetric positive definite matrices:

|Σ|=i=0d1i2,ln|Σ|=2i=0d1lni.(7.6)\begin{aligned} |\Sigma| &= \prod^{d-1}_{i=0} \ell^2_i\,, \\ \ln |\Sigma| &= 2 \sum^{d-1}_{i=0} \ln\ell_i\,. \end{aligned} \qquad(7.6)

The second form is especially useful for large matrices. There, computing the log-determinant is numerically preferable to computing the determinant itself.

We can also exploit the structure of symmetric Toeplitz matrices to further simplify the computation of the determinant. However, here we won’t obtain an exact result but only an asymptotic approximation. This will nonetheless be very useful for large matrices where even the Cholesky factorization can be difficult to obtain.

We will investigate the asymptotic behavior of eigenvalues of Toeplitz matrices. First, we recall the defining property of Toeplitz matrices. Let Td×dT\in\mathbb{R}^{d\times d} be a Toeplitz matrix. Then we can write Tij=t|ji|T^{ij} = t^{|j-i|} where 𝒕d\mathbfit{t}\in\mathbb{R}^d is a vector that fully specifies the matrix. We will now try to understand what happens in the asymptotic limit that the vector 𝒕\mathbfit{t} is very high dimensional. Thus, suppose we have an infinite sequence of real numbers (tkk=,2,1,0,1,2,)(t_k \mid k=\ldots,-2,-1,0,1,2,\ldots). From this sequence we can construct a sequence of Toeplitz matrices Tnn×nT_n\in\mathbb{R}^{n\times n} for n=1,2,n=1,2,\ldots as follows. For any nn\in\mathbb{N}, we define the matrix elements of TnT_n as

Tnij=tijfor i,j=0,1,,n1.(7.7)T^{ij}_n = t_{i-j}\quad\text{for } i,j=0,1,\ldots,n-1 \,. \qquad(7.7)

The definition of TnT_n will thus only need 2n12n-1 elements t(n1),,t0,,tn1t_{-(n-1)},\ldots,t_0,\ldots,t_{n-1} from the sequence. Note that for a symmetric Toeplitz sequence tk=tkt_k = t_{-k}.

Under some light conditions there exists a continuous 2π2\pi-periodic function f(λ)f(\lambda) that is defined through the discrete-time Fourier transform

f(ω)=k=tkeiωk.(7.8)f(\omega) = \sum^\infty_{k=-\infty} t_k e^{-i\omega k} \,. \label{eq-fourier-series} \qquad(7.8)

Conversely, the tkt_k can be recovered from ff using inverse transform

tk=12π02πdωf(ω)eiωk.(7.9)t_k = \frac{1}{2\pi} \int^{2\pi}_0 d\omega\ f(\omega) e^{i\omega k} \,. \qquad(7.9)

Thus the sequence (tk)(t_k) determines the function ff and vice versa. Moreover, this means that the entire sequence of Toeplitz matrices is completely specified by the continuous function ff.

Now we are ready to state one of the most important theorems about Toeplitz matrices: Szegő’s theorem. Let τn,1,,τn,n\tau_{n,1},\ldots,\tau_{n,n} be the eigenvalues of the matrix TnT_n. Szegő’s theorem states that

limn1ni=1n1F(τn,i)=12π02πdωF(f(ω))(7.10)\lim_{n\to\infty} \frac{1}{n}\sum^{n-1}_{i=1} F(\tau_{n,i}) = \frac{1}{2\pi} \int^{2\pi}_0 d\omega\ F(f(\omega)) \qquad(7.10)

for any continuous function FF. This result is very useful to find the asymptotic determinant. In particular, note that the determinant is the product of eigenvalues, and hence ln|Tn|=i=1n1lnτn,i\ln |T_n| = \sum^{n-1}_{i=1}\ln \tau_{n,i}. Thus, it is not hard to show that

limnln|Tn||Tn1|=12π02πdωlnf(ω)(7.11)\lim_{n\to\infty} \ln \frac{|T_n|}{|T_{n-1}|} = \frac{1}{2\pi} \int^{2\pi}_0 d\omega\ \ln f(\omega) \qquad(7.11)

which we can leverage to compute information rates. In particular, this formula can be used for analytical computation of entropy rates of Gaussian processes  [1].

We will instead use this theorem to find a an approximate way to compute the determinant of a large Toeplitz matrix. We can approximate the integral by a Riemann sum

limnln|Tn|n1Nm=0N1lnf(ωm)(7.12)\lim_{n\to\infty} \frac{\ln |T_n|}{n} \approx \frac{1}{N}\sum^{N-1}_{m=0} \ln f\left( \omega_m \right) \qquad(7.12)

where ωm=2πm/N\omega_m = 2\pi m/N for some large NN. Now, inserting Equation 7.8 we find

limnln|Tn|n1Nm=0N1lnk=tkexp(i2πkmN).(7.13)\lim_{n\to\infty} \frac{\ln |T_n|}{n} \approx \frac{1}{N} \sum^{N-1}_{m=0} \ln \sum^{\infty}_{k=-\infty} t_k \exp\left(-\frac{i 2\pi k m}{N} \right)\,. \label{eq-logdet-toeplitz} \qquad(7.13)

In practice we cannot perform the infinite sum over kk. Thus, we must truncate the series. We truncate the series such that tk=0t_k=0 for |k|>N/2|k|>N/2. Then, the sum over kk becomes the discrete Fourier transform (DFT). We recall that the DFT of the sequence tkt_k for k=N/2,,0,,N/2k=-\lfloor N/2 \rfloor,\ldots,0,\ldots,\lfloor N/2 \rfloor is another sequence (λm)(\lambda_m), defined by

λm=k=N/2N/2tkexp(i2πkmN)(7.14)\lambda_m = \sum^{\lfloor N/2 \rfloor}_{k=-\lfloor N/2 \rfloor} t_k \exp\left( -\frac{i 2\pi k m}{N} \right) \qquad(7.14)

for m=0,,N1m=0,\ldots,N-1. Hence, we have derived the approximation

limnln|Tn|n1Nm=0N1lnλm.(7.15)\lim_{n\to\infty} \frac{\ln |T_n|}{n} \approx \frac{1}{N} \sum^{N-1}_{m=0} \ln \lambda_m\,. \qquad(7.15)

The DFT coefficients λm\lambda_m represent the inner sum in Equation 7.13, when truncated to k=N/2,,0,,N/2k=-\lfloor N/2 \rfloor,\ldots,0,\ldots,\lfloor N/2 \rfloor. In principle we want to choose NN as large as possible. However, given a matrix TnT_n we only know the values tn,,t0,,tnt_{-n},\ldots,t_0,\ldots,t_n. Thus, the maximum value of NN that we can use is N=2nN=2n.

This means, our approximation for the log-determinant of TnT_n for large nn can be written as

ln|Tn|12m=02n1lnλm.(7.16)\ln |T_n| \approx \frac{1}{2} \sum^{2n-1}_{m=0} \ln \lambda_{m} \,. \qquad(7.16)

It is easy to verify that this formula converges to Equation 7.13 as nn\to\infty. This formula is also very efficient to evaluate on modern computers. The sequence λ0,,λ2n1\lambda_0,\ldots,\lambda_{2n-1} can be easily computed via the FFT algorithm from the sequence of tkt_k. Efficient implementations of the FFT algorithm are widely available.

Finally, we discuss how to evaluate the mutual information rate, defined as R(S,X)=limnI(S[t0:tn],X[t0:tn])/nR(S, X) = \lim_{n\to\infty} I(S[t_0:t_n],X[t_0:t_n])/n, using Szegő’s theorem. Munakata and Kamiyabu  [3] used Szegő’s theorem to derive the formula

R(S,X)=1202πdωln(1|fsx(ω)|2fss(ω)fxx(ω))(7.17)R(S, X) = -\frac{1}{2} \int^{2\pi}_0 d\omega\ \ln\left( 1 - \frac{|f_{sx}(\omega)|^2}{f_{ss}(\omega) f_{xx}(\omega)} \right) \qquad(7.17)

where the functions fαβ(ω)=k=tαβ,keiωkf_{\alpha\beta}(\omega) = \sum^\infty_{k=-\infty} t_{\alpha\beta,k} e^{-i\omega k} represent the discrete-time Fourier transforms of the covariances of the matrices Σαβ\Sigma_{\alpha\beta} in the limit nn\to\infty. As above, we are going to approximate this formula using the discrete Fourier transform. We define DFT sequences for the matrices Σαβ\Sigma_{\alpha\beta} as λαβ,m=k=nntαβ,mexp(iωmk)\lambda_{\alpha\beta,m} = \sum^n_{k=-n} t_{\alpha\beta,m} \exp(-i \omega_m k) for ωm=2πm/N\omega_m = 2\pi m / N. This results in the estimate for the mutual information rate

R(S,X)12m=02n1ln(1|λsx,m|2λss,mλxx,m).(7.18)R(S, X) \approx -\frac{1}{2} \sum^{2n-1}_{m=0} \ln \left( 1 - \frac{|\lambda_{sx,m}|^2}{\lambda_{ss,m} \lambda_{xx,m}} \right) \,. \qquad(7.18)

More information about asymptotics and other results for Toeplitz matrices can be found in the review of Gray  [4].

References

[1]
F. Tostevin and P. R. ten Wolde, Mutual Information between Input and Output Trajectories of Biochemical Networks, Physical Review Letters 102, 218101 (2009).
[2]
F. Tostevin and P. R. ten Wolde, Mutual information in time-varying biochemical systems, Phys. Rev. E 81, 061917 (2010).
[3]
T. Munakata and M. Kamiyabu, Stochastic resonance in the FitzHugh-Nagumo model from a dynamic mutual information point of view, The European Physical Journal B - Condensed Matter and Complex Systems 53, 239 (2006).
[4]
R. M. Gray, Toeplitz and Circulant Matrices: A Review, Foundations and Trends® in Communications and Information Theory 2, 155 (2005).