5 The Accuracy of the Gaussian Approximation

This chapter was written by Age Tjalma and Manuel Reinhardt as shared first authors, in collaboration with Anne-Lena Moor (MPI-CBG Dresden), and Pieter Rein ten Wolde.

Efficient information processing is crucial for both living organisms and engineered systems. The mutual information rate, a core concept of information theory, quantifies the amount of information shared between the trajectories of input and output signals, and allows to quantification of information flow in dynamic systems. A common approach for estimating the mutual information rate is the Gaussian approximation, which assumes that the input and output trajectories follow Gaussian statistics. However, this method is limited to linear systems, and its accuracy in nonlinear or discrete systems remains unclear. In this work, we assess the accuracy of the Gaussian approximation for non-Gaussian systems by leveraging Path Weight Sampling (PWS), a recent technique for exactly computing the mutual information rate. In two case studies, we examine the limitations of the Gaussian approximation. First, we focus on discrete linear systems and demonstrate that, even when the system’s statistics are nearly Gaussian, the Gaussian approximation fails to accurately estimate the mutual information rate. Second, we explore a continuous diffusive system with a nonlinear transfer function, revealing significant deviations between the Gaussian approximation and the exact mutual information rate as nonlinearity increases. Our results provide a quantitative evaluation of the Gaussian approximation’s performance across different stochastic models and highlight when more computationally intensive methods, such as PWS, are necessary.

5.1 Introduction

For the functioning of both living and engineered systems it is paramount that they collect and process information effectively. Increasingly, it has become evident that beyond instantaneous properties, the dynamic features of an input signal or system output often encode valuable information [1–6]. Prime examples in biology include bacterial chemotaxis, which responds to temporal changes in concentration [7], the transcription factor NF- $\kappa$ B, which encodes information about input signals in its dynamic response [8], and neuronal information processing, where information is encoded in the sequence and timing of spikes [9]. Beyond biology, dynamic input signals are critical for various sensing systems, such as those used in automated factories or self-driving cars.

To understand and evaluate the performance, potential improvements, and limitations of these systems in processing information, we need appropriate metrics that capture their full information processing capability. Information theory, introduced by Shannon [10], provides the most general mathematical framework for such metrics. The mutual information and mutual information rate measure how much one random variable reduces uncertainty about another, quantified in bits. It is relatively straightforward to quantify the information shared between scalar properties of the input and output, as has been done in various forms [11–19]. However, capturing all information in dynamical properties of the input and the output, is much more challenging. To do so, one must consider the information encoded time-varying trajectories of the variables of interest. Yet, due to the high dimensionality of the trajectory space, computing the mutual information between such trajectories is notoriously difficult.

A major advancement in this area has been the Gaussian approximation of the mutual information rate [1,2], based on the assumption of input and output trajectories following jointly Gaussian statistics. This assumption makes it possible to compute the mutual information rate directly from the two-point correlation functions of the input and output. It is thus straightforward to apply the Gaussian approximation to experimental data. Moreover, given a mechanistic model of the underlying dynamics, the Gaussian approximation can be used to derive analytical expressions for the information rate [1–3]. Crucially however, the assumption of Gaussian statistics restricts the method to linear systems, as Gaussian statistics can only arise in such systems [20].

Understanding when the Gaussian approximation is accurate is critical because many real-world systems, such as biological and engineered sensory systems, exhibit nonlinear dynamics. This includes features such as bimodality, discrete jumps, or heavy tails, all of which deviate from purely Gaussian dynamics. Such non-Gaussian behavior typically results from intrinsic nonlinearities in the system, but determining the degree of a system’s deviation from linearity is difficult [21–23], and the extent to which the approximation loses accuracy in nonlinear systems is unclear. Thus, although the Gaussian approximation offers a computationally simple framework to estimate information transmission, it remains an open question under what conditions this approximation is sufficiently accurate.

Until recently, addressing this question has been hard because there was no reliable benchmark for the exact information rate. Without a method to compute the true information rate of a non-Gaussian system, it is impossible to rigorously assess the accuracy of the Gaussian approximation. This gap was filled by the development of two independent methods [4,5] for computing the information rate accurately even in systems that significantly deviate from Gaussian behavior. Here we leverage one of these methods: Path Weight Sampling (PWS) [5]. This is a Monte Carlo technique which is an exact method for calculating the mutual information rate in a wide range of stochastic models.

Using PWS, we can directly evaluate the accuracy of the Gaussian approximation in models that exhibit explicit non-Gaussian features, and study the approximation’s robustness in typical applications.

In this article, we investigate the accuracy of the approximate Gaussian information rate through two case studies. The first focuses on Markov jump processes, where the statistics are non-Gaussian due to the discrete nature of the processes. Perhaps surprisingly, the Gaussian approximation fails to accurately estimate the mutual information rate in this case, even when the statistics are nearly Gaussian [4,5]. We show that a recently developed reaction-based “discrete approximation” by Moor and Zechner [4] is much more accurate. This suggests that the Gaussian approximation fails because it cannot distinguish the individual reaction events.

The second case study examines a continuous diffusive process with a nonlinear transfer function. We demonstrate how intrinsic nonlinearity can cause significant deviations between the Gaussian approximation and the true mutual information rate. By varying the degree of nonlinearity as well as the system’s response timescale, we provide a comprehensive quantitative understanding of the Gaussian approximation’s limitations in nonlinear systems. Additionally, we show that for such systems, the Gaussian approximation differs significantly when derived from empirical correlation functions compared to when it is analytically obtained from the nonlinear model, highlighting that the correct application of the approximation is important.

Our work translates into concrete recommendations on when to use which method for the computation of the information rate. It therefore enables researchers to more confidently determine when a simpler approximate method is sufficient, or when a more sophisticated method like PWS [5] or the method developed by Moor and Zechner [4] should be used.

5.2 Methods

5.2.1 The mutual information rate

The mutual information between two random variables $S$ and $X$ is defined as

$I(S, X) = \iint \mathrm{P}(s, x) \ln \frac{\mathrm{P}(s, x)}{\mathrm{P}(s)\mathrm{P}(x)}\ ds\,dx \,, \qquad(5.1)$

or, equivalently, using Shannon entropies

$\begin{aligned} I(S, X) &= H(S) + H(X) - H(S, X) \\ &= H(S) - H(S|X) \\ &= H(X) - H(X|S)\,. \end{aligned} \qquad(5.2)$

In the context of a noisy communication channel, $S$ and $X$ represent the messages at the sending and receiving end, respectively. Then, $I(S, X)$ is the amount of information about $S$ that is communicated when only $X$ is received. If $S$ can be perfectly reconstructed from $X$ , then $I(S,X)=H(S)$ . On the contrary, if $S$ and $X$ are independent, $I(S, X)=0$ . The mutual information thus is always non-negative and quantifies the degree of statistical dependence between two random variables.

For systems that continuously transmit information over time, this concept must be extended to trajectories $\mathbfit{S}_T=\{S(t) \mid t \in [0, T]\}$ and $\mathbfit{X}_T=\{X(t) \mid t \in [0, T]\}$ . The mutual information between trajectories is defined analogously as

$I(\mathbfit{S}_T,\mathbfit{X}_T) = \left\langle \ln\frac{\mathrm{P}(\mathbfit{s}_T, \mathbfit{x}_T)}{\mathrm{P}(\mathbfit{s}_T) \mathrm{P}(\mathbfit{x}_T)} \right\rangle \label{eq-trajectory_mi} \qquad(5.3)$

where the expected value is taken with respect to the full joint probability of both trajectories. This quantity can be interpreted as the total information that is communicated over the time interval $[0, T]$ .

Note that the total amount of information communicated over the time-interval $[0, T]$ is not directly related to the instantaneous mutual information $I(S(t), X(t))$ at any instant $t \in [0,T]$ . This is because auto-correlations within the input or output sequences reduce the amount of new information transmitted in subsequent measurements. Moreover, information can be encoded in temporal features of the trajectories, which cannot be captured by an instantaneous information measure. Therefore, as previously pointed out [24,25], the instantaneous mutual information $I(S(t), X(t))$ for any given $t$ does not provide a meaningful measure of information transmission. To correctly quantify the amount of information transmitted per unit time we must consider entire trajectories.

For that reason, the mutual information rate is defined via the trajectory mutual information. Let the input and output of a system be given by two continuous-time stochastic processes $\mathcal{S}=\{S(t)\mid t\in\mathbb{R}\}$ and $\mathcal{X}=\{X(t)\mid t\in\mathbb{R}\}$ . Then, the mutual information rate between $\mathcal{S}$ and $\mathcal{X}$ is

$\label{eq-ratedef} R(\mathcal{S}, \mathcal{X}) = \lim_{T\to\infty} \frac{1}{T} I(\mathbfit{S}_T, \mathbfit{X}_T)\,, \qquad(5.4)$

and quantifies the amount of information that can reliably be transmitted per unit time. The mutual information rate therefore represents an excellent performance measure for information processing systems.

In summary, the mutual information rate is the crucial performance metric for stochastic information processing systems. However, its information-theoretic definition does not translate into an obvious scheme for computing it. As a result, various methods have been developed to compute or approximate the mutual information rate.

5.2.2 Gaussian Approximation

One way to significantly simplify the computation of the information rate, is to assume that the input and output trajectories obey stationary Gaussian statistics. Under this assumption Equation 5.3 simplifies to,

$I(\mathbfit{S}_T,\mathbfit{X}_T) = \frac{1}{2}\ln\frac{|\mathbfit{C}_{ss}||\mathbfit{C}_{xx}|}{|\mathbfit{Z}|}, \qquad(5.5)$

where $|\mathbfit{C}_{ss}|$ and $|\mathbfit{C}_{xx}|$ are the determinants of the covariance matrices of the respective trajectories $S_{[0,T]}$ and $X_{[0,T]}$ , and

$\mathbfit{Z} = \begin{pmatrix} \mathbfit{C}_{ss} & \mathbfit{C}_{sx} \\ \mathbfit{C}_{xs} & \mathbfit{C}_{xx} \end{pmatrix} \qquad(5.6)$

is the covariance matrix of their joint distribution.

In the limit that the trajectory length $N=T/\Delta$ , with the discretization $\Delta$ , becomes infinitely long ( $N\to\infty$ ) and continuous ( $\Delta \to 0$ ), the information rate as defined in Equation 5.4 can be expressed in terms of the power spectral densities, or power spectra, of the processes $\mathcal S$ and $\mathcal X$ [1,2]:

$\label{eq-gaussdef} R(\mathcal{S}, \mathcal{X}) = -\frac{1}{4\pi}\int_{-\infty}^{\infty} d\omega \ln\left(1-\frac{|S_{sx}|^2}{S_{ss}S_{xx}}\right). \qquad(5.7)$

Here, $S_{ss}$ and $S_{xx}$ respectively are the power spectra of trajectories generated by $\mathcal S$ and $\mathcal X$ , and $S_{sx}$ is their cross-spectrum. The fraction

$\phi_{sx}(\omega) = \frac{|S_{sx}|^2}{S_{ss}S_{xx}} \qquad(5.8)$

is known as the coherence, describing the distribution of power transfer between $\mathcal S$ and $\mathcal X$ over the frequency $\omega$ .

For systems that are neither Gaussian nor linear, there are two ways to still obtain an approximate Gaussian information rate. The first is to directly measure two-point correlation functions from data or simulations, and use these to retrieve the power spectra in Equation 5.7. The second is to use Kampen [26]’s linear noise approximation (LNA) and approximate the dynamics of the system to first order around a fixed point [26], see also [app:gauss?]. In this work, we will analyze both of these methods.

5.2.3 Path Weight Sampling for diffusive systems

To evaluate the accuracy of the Gaussian information rate for non-Gaussian systems, an exact method for determining the true information rate is required. Recently, a method called Path Weight Sampling (PWS) was developed, which computes the exact mutual information rate using Monte Carlo techniques without relying on approximations [5].

In Ref. [5], PWS was introduced as a computational framework for calculating the mutual information rate in systems governed by master equations. Master equations provide an exact stochastic description of continuous-time processes with discrete state-spaces, commonly used in models ranging from biochemical signaling networks to population dynamics. However, many systems are not described by discrete state spaces and instead require a stochastic description based on diffusion processes or other stochastic models. Fortunately, PWS is not restricted to systems described by master equations and can be extended to a variety of stochastic models.

In general, PWS can be applied to any system that meets the following conditions: (i) sampling from the input distribution $\mathrm{P}(\mathbfit{s}_T)$ is straightforward, (ii) sampling from the conditional output distribution $\mathrm{P}(\mathbfit{x}_T \mid \mathbfit{s}_T)$ is straightforward, and (iii) the logarithm of the conditional probability density $\ln\mathrm{P}(\mathbfit{x}_T \mid \mathbfit{s}_T)$ , referred to as the path weight, can be evaluated efficiently. For any stochastic model that satisfies these three criteria, the PWS computation proceeds similarly to systems governed by master equations.

Briefly, PWS computes the trajectory Mutual Information using a Monte Carlo estimate of Equation 5.3

$\frac{\sum^N_{i=1}\left[ \ln \mathrm{P}\left(\mathbfit{x}^i_T \middle| \mathbfit{s}^i_T\right) - \ln\mathrm{P}\left(\mathbfit{x}^i_T\right) \right] }{N} \label{eq-pws_monte_carlo} \qquad(5.9)$

where $\mathbfit{s}^1_T, \ldots, \mathbfit{s}^N_T$ are independently drawn from $\mathrm{P}(\mathbfit{s}_T)$ , and each $\mathbfit{x}^i_T$ is drawn from $\mathrm{P}(\mathbfit{x}_T \mid \mathbfit{s}^i_T)$ . As $N\to\infty$ , this expression converges to the mutual information $I(\mathbfit{S}_T,\mathbfit{X}_T)$ . In Equation 5.9, the term $\ln\mathrm{P}(\mathbfit{x}_T \mid \mathbfit{s}_T)$ can be evaluated directly (per criterion iii), but the marginal probability $\mathrm{P}(\mathbfit{x}_T)$ has to computed in separately for each output trajectory $\mathbfit{x}^i_T$ . Typically, this has to be done numerically via marginalization, i.e., by computing the path integral

$\mathrm{P}(\mathbfit{x}_T) = \int d\mathbfit{s}_T\ \mathrm{P}(\mathbfit{s}_T) \mathrm{P}(\mathbfit{x}_T \mid \mathbfit{s}_T) \qquad(5.10)$

using Monte Carlo techniques. Evaluating the marginalization integral efficiently is essential for computing the mutual information using PWS and discussed in detail in Ref. [5]. In summary, PWS is a generic framework that can be used beyond systems defined by a master equation as long as a suitable generative model satisfying the three conditions above is available.

For this study, we extended PWS to compute the mutual information rate for systems with diffusive dynamics, described by Langevin equations. For such systems, the aforementioned conditions are inherently fulfilled and PWS can be applied. Specifically, in a Langevin system, both the input $S_t$ and the output $X_t$ are stochastic processes given by the solution to a stochastic differential equation (SDE). Using stochastic integration schemes like the Euler-Mayurama method, we can straightforwardly generate realizations $s(t)$ and $x(t)$ from the corresponding stochastic process. These realizations are naturally time-discretized with the integration time step $\Delta t$ . For a time-discretized trajectory $\mathbfit{x}=(x_1,\ldots,x_n)$ , the path weight $\ln\mathrm{P}(\mathbfit{x} \mid \mathbfit{s})$ is—up to a Gaussian normalization constant—given by the Onsager-Machlup action [27]

$\ln\mathrm{P}(\mathbfit{x} \mid \mathbfit{s}) = -\sum^{n-1}_{i=1} \frac{1}{2\Delta t} \left( \frac{\Delta x_i - v_i \Delta t}{\sigma(x_i)} \right)^2 + \text{const} \qquad(5.11)$

where we used $\Delta x_i = x_{i+1} - x_i$ , and $v_i = f(x_i, s_i)$ is the deterministic drift, and $\sigma(x_i)$ represents the white noise amplitude. This expression captures the likelihood of a particular trajectory, given the stochastic dynamics of the system, and serves as the path weight in the PWS computation.

5.3 Case Studies

To investigate the conditions under which the Gaussian approximation deviates from the exact mutual information rate, we conducted two case studies. In both studies we compare the Gaussian approximation against the exact mutual information rate, computed via PWS. In the first case study we focus on a discrete linear system which is inspired by minimal motifs of cellular signaling.

5.3.1 Discrete reaction system

We consider a simple linear reaction system of two species, $S$ and $X$ , whose dynamics are governed by 4 reactions

$\begin{aligned} \label{eq-reac1}\text{∅ & → S} \\ \label{eq-reac2}\text{S & → ∅} \\ \label{eq-reac3}\text{S & → S + X} \\ \label{eq-reac4}\text{X & → ∅}\,. \end{aligned} \qquad(5.12)$

The reaction system is linear because each reaction has at most one reactant. The trajectories of $S$ and $X$ are correlated because the production rate of $X$ depends on the copy number of $S$ , and therefore information is transferred from $S$ to $X$ . This set of reactions can be interpreted as a simple motif for gene expression where $S$ is a transcription factor and $X$ represents the expressed protein. In steady state, the mean copy numbers are given by $\bar{s} = \kappa\lambda^{-1}$ and $\bar{x} = \bar{s}\rho\mu^{-1}$ .

The exact stochastic dynamics of this reaction system can be expressed by the chemical master equation [26]. This equation describes the time-evolution of the discrete probability distribution over the possible copy numbers of species $S$ and $X$ , capturing the noise from the chemical reaction events. From this description we can obtain the mutual information rate from $S$ to $X$ without approximations using PWS [5].

While the chemical master equation is an exact representation of the reaction system, for large copy numbers the stochastic dynamics are well-approximated by a linearized model around the steady state. The resulting Langevin equations can be systematically derived from the master equation using the LNA which yields

$\begin{aligned} \label{eq-input-dynamics}\dot{s}(t) &= \kappa - \lambda s(t) + \eta_s(t) \\ \dot {x}(t) &= \rho s(t) - \mu x(t) + \eta_x(t) \label{eq-output-dynamics} \end{aligned} \qquad(5.13)$

where $s$ and $x$ are continuous variables representing the copy numbers of $S$ and $X$ , and $\eta_s, \eta_x$ are independent delta-correlated white noise terms with $\langle \eta^2_s \rangle = 2\lambda\bar{s}$ and $\langle \eta^2_x \rangle = 2\mu\bar{x}$ , see [app:gauss?].

The Gaussian approximation of the mutual information rate is derived from the LNA description. Using this framework, Tostevin and Wolde [1] computed an analytical expression for the mutual information rate of the motif in units of nats $s^{-1}$ :

$R_\text{Gaussian} = \frac{\lambda}{2} \left( \sqrt{1 + \frac{\rho}{\lambda}} - 1 \right) \,. \label{eq-tostevin} \qquad(5.14)$

More recently, Moor and Zechner [4] have derived a different expression for the mutual information rate of this reaction system by analytically approximating the relevant filtering equation, which is derived from the master equation, thus recognizing the discreteness of molecules. This approach explicitly differentiates the contributions of individual reactions to the noise amplitude of each component, while the LNA lumps their contributions together. As we will discuss in more detail below, separately accounting for the noise from each reaction separately better captures the information transmitted via discrete systems, making this “discrete approximation” more accurate than the Gaussian approximation for this case study. Nevertheless, the result is still based on an approximation that is only accurate for large copy numbers. The expression for the mutual information rate in the discrete approximation appears remarkably similar to the expression obtained using the Gaussian framework:

$R_\text{discrete} = \frac{\lambda}{2} \left( \sqrt{1 + 2\ \frac{\rho}{\lambda}} - 1 \right)\,. \label{eq-moor} \qquad(5.15)$

Note that this equation only differs from Equation 5.14 by the additional factor 2 inside the square root.

The natural—but incorrect—expectation is that for large copy numbers both approximations converge to the true mutual information rate. However, the difference between Equation 5.14, Equation 5.15 already reveals that the two approximations do not converge. Indeed, previous work shows that even in the limit of infinite copy numbers, the Gaussian approximation only yields a lower bound to the information rate, which is not tight [4,5].

Figure 5.1: The mutual information rate of a simple linear reaction system defined by Equation 5.12. The black dots show the exact information rate, computed with PWS. We compare both, the Gaussian approximation of [1] and the discrete approximation of [4] against the exact result. In panel (a), we use parameters $\kappa=100$ , $\lambda = 1$ , $\mu = 1$ while varying $\rho$ . The mean output copy number is directly proportional to $\rho$ with proportionality factor $\kappa\lambda^{-1}=100$ . In panel (b), we fix $\rho=10$ , $\lambda = 1$ , $\mu = 1$ , and systematically vary $\kappa$ . As a consequence, we vary the mean input copy number $\bar{s}=\kappa\lambda^{-1}$ , and simultaneously also the mean output copy number $\bar{x}=\bar{s}\rho\mu^{-1}=10\bar{s}$ .

We compare both approximations against exact PWS simulations for different parameters. In Figure 5.1a, we vary the mean copy number of the readout, $\bar{x}$ , by varying its synthesis rate $\rho$ and compute the mutual information rate using both approximations as well as PWS, while keeping the input copy number constant at $\bar s = 100$ . We observe that the Gaussian approximation via the LNA [Equation 5.14] consistently underestimates the mutual information rate. This confirms that even when $\bar{s}$ and $\bar{x}$ are large, the Gaussian approximation only yields a lower bound to the information rate of the discrete linear system. In contrast, the discrete approximation [Equation 5.15] coincides with the true mutual information rate obtained from PWS simulations over all output copy numbers $\bar x$ , even for $\bar x \ll 1$ .

In Figure 5.1b, instead of varying the copy number of the output, we vary the copy number of the input by varying the production rate $\kappa$ . Note that both the Gaussian approximation and the discrete approximation are independent of $\kappa$ . Yet, we observe that the true mutual information rate is not. For sufficiently large input copy numbers the discrete approximation coincides with the true information rate while the Gaussian information rate remains only a lower bound. Thus, the discrete approximation is highly accurate for $\bar s \geq 10$ . For small $\kappa$ where $\bar s < 10$ , we find that the mutual information rate deviates from both, the LNA as well as the discrete approximation. Surprisingly, we find an optimal value of $\kappa$ for which the mutual information rate is maximized and exceeds both approximations. This implies that at low input copy numbers, the system is able to extract additional information from the discrete input trajectories, which is not accounted for by either of the approximations.

In all cases, we found that the Gaussian approximation deviates significantly from the true information rate for this discrete system. Seemingly paradoxically, the Gaussian approximation based on the LNA does not converge to the true information rate at high copy numbers, even though the LNA approximates the stochastic dynamics extremely well in this regime. In contrast, the discrete approximation from Moor and Zechner [4] does not suffer from this issue. It has been shown that, generally, the Gaussian approximation is a lower bound on the discrete approximation [4], prompting the question of which features of the discrete trajectories are not captured by the Gaussian approximation. ¹

5.3.2 Nonlinear continuous system

Next, we study a nonlinear variant of the reaction system above. In contrast to the previous case study, we deliberately avoid using discrete dynamics, as we already observed that the Gaussian approximation is generally inaccurate in such systems. Instead, we focus solely on continuous Langevin dynamics to explore how an explicitly nonlinear input-output mapping affects the accuracy of the inherently linear Gaussian approximation. We hypothesize that the accuracy of the Gaussian approximation will deteriorate as the degree of nonlinearity increases. To test this hypothesis, we analyze a simple Langevin system with adjustable nonlinearity.

The system is defined by two coupled Langevin equations, one that describes the input, and one that describes the output. The stochastic dynamics of the input $s(t)$ are given by Equation 5.13. The output dynamics of $x(t)$ are given by

$\dot {x}(t) = \rho a(s) - \mu x(t) + \eta_x(t) \label{eq-nonlinear_x} \qquad(5.16)$

with the Hill function

$a(s) = \begin{cases} \frac{s^n}{K^n + s^n} & \text{if } s \geq 0 \\ 0 & \text{if } s<0 \end{cases} \,. \label{eq-hillfunc} \qquad(5.17)$

This function serves as a tuneable non-linearity with the Hill coefficient $n$ . As $n\to0$ , the Hill function approaches a shallow linear mapping, while for large $n$ , it becomes sigmoidal and highly non-linear. As $n\to\infty$ , $a(s)$ approaches the unit step function centered at $s=K$ . The so-called static input-output relation specifies the mean output $\bar{x}(s)$ for a given input signal $s$ and is given by $\bar{x}(s) = \rho a(s) / \mu$ . The gain of this system is then defined as the slope of this relation at $s=\bar{s}$ , i.e.,

$g= \frac{\partial \bar{x}(s)}{\partial s} \bigg|_{\bar{s}} = \frac{n a(\bar{s}) [1 - a(\bar{s})] \rho}{\mu\bar{s}} \,, \label{eq-definition-gain} \qquad(5.18)$

as derived in Section 5.5.1.3. Importantly, for $\bar{s} = K$ , the gain of the system is directly proportional to the Hill coefficient $n$ , i.e., the gain is directly coupled to the degree of nonlinearity.

Figure 5.2 shows how, on average, the output $x(t)$ at a given time depends on the input $s(t)$ at that same time (solid colored curves). This is the so-called dynamical input-output relation of a system [29]. The solid black curve represents the static input-output relation. While the static input-output relation is purely determined by the instantaneous function $a(s)$ , the dynamical input-output relation depends not only on this function, but also on the timescale of the response $\tau_x = \mu^{-1}$ . The reason is that as the output responds more slowly to the input, the temporal input fluctuations are averaged out increasingly. Therefore, the response of the output becomes shallower for increasing $\tau_x$ . Moreover, slower systems with more shallow responses react approximately linear to the input (Figure 5.2).

Figure 5.2: The dynamical input output relationship of the non-linear system with a fixed static gain of $g=5$ [Equation 5.18]. The upper panel shows the static input-output relationship (solid black line) as well as the dynamical input output relationship, i.e., the effective mapping from input to output $S(t)\mapsto X(t)$ (at the same time) for different output timescales $\tau_x = \mu^{-1}$ . The dynamical input-output mapping is defined as the conditional expectation $x_\text{dyn}(s)=\mathbb{E}[X(t) \mid S(t)=s]$ and was estimated non-parametrically from simulated trajectories of the system via Nadaraya–Watson kernel regression [31] with a Gaussian kernel (bandwidth $h=\num{0.5}$ ). Additionally, using the linear noise approximation we obtain linear input output mappings with a dynamical gain $\tilde{g}=g/(1+\tau_x)$ (see Section 5.5.1.3) which are displayed as dashed lines. We observe that the linear mapping approximates the dynamical input output relation well for $s\approx\bar{s}=100$ but cannot capture the nonlinear saturation effect. The lower panel shows the stationary distribution of $s(t)$ .

Figure 5.3: The information rate of a non-linear system as a function of its gain over a range of response timescales. We vary the **static** gain $g$ by varying the Hill coefficient $n$ , see Equation 5.18. A short response timescale corresponds to a fast system (purple) while a long response timescale corresponds to a slow system (yellow). The information rate was computed in three different ways: (a) via the Gaussian approximation using the LNA to estimate the required power spectra; (b) via the Gaussian approximation using simulations to numerically estimate the required power spectra; (c) exactly via PWS.

While using the Langevin extension of PWS we can directly compute the mutual information rate of this nonlinear model, the Gaussian approximation can only be applied to linear systems. Therefore, to obtain the mutual information rate in the Gaussian approximation, we have to linearize the system. There are two approaches for linearizing the stochastic dynamics of this nonlinear system which result in different information estimates.

The first approach is to linearize Equation 5.16 analytically via the LNA as shown in [app:gauss:lna?]. Within this approach we can obtain an analytical expression for the information rate (see Section 5.5.1.3),

$R_\text{LNA} = \frac{\lambda}{2}\left(\sqrt{1+g^2 \frac{\bar s \mu}{\bar x \lambda}}-1\right). \qquad(5.19)$

This LNA based approach also yields a linearized dynamic input-output relation, shown as dashed lines in Figure 5.2.

We observe that the linearized input-output relation closely matches the slope of the true nonlinear dynamical input-output relation at $s=\bar s = 100$ , but overall it does not correspond to a (least-squares) linear fit of the nonlinear dynamical input-output relation. For all values of $s$ , the linearized input-output relation has a slope greater than or equal to the slope of the dynamical input-output relation. Empirically, the LNA thus seems to over-estimate the dynamical gain of the system. The reason may be that the LNA approximates the static input-relation (Figure 5.2 black curve), and estimates the linearized dynamical input-output relation based on this static approximation only.

The second “empirical Gaussian” approach to linearize the nonlinear system potentially avoids these issues. In this approach, we first numerically generate trajectories from the stochastic Equation 5.13, Equation 5.16 and use digital signal processing techniques to estimate the mutual information rate from the trajectories. We numerically estimate the (cross) power spectra of input and response using Welch’s method [32 see Ch. 11]. From the estimated spectral densities $\hat{S}_{\alpha\beta}(\omega)$ we compute the coherence

$\hat\phi_{sx}(\omega) = \frac{|\hat S_{sx}(\omega)|^2}{\hat S_{ss}(\omega) \hat S_{xx}(\omega)} \qquad(5.20)$

which we use to obtain the Gaussian approximation of the mutual information rate directly using Equation 5.7.

The empirical power spectra characterize the linear response of a system, but not in the same way as the LNA. While for linear systems the power spectra obtained via the LNA match the empirical power spectra [33], for a nonlinear system, the empirical power spectra and the coherence can differ from the corresponding LNA calculations. The two linearization approaches are thus not equivalent. We tested the accuracy of the Gaussian mutual information rate estimates using both linearization approaches to elucidate the differences in these approaches.

Figure 5.3 displays the mutual information rate obtained via two linearized approximations as well as the exact PWS result. We vary the gain $g$ and the response time-scale $\tau_x$ , both of which significantly affect the shape of the dynamical input-output relationship. As expected, a larger gain or a faster response time lead to an increase in the mutual information rate. At large gain, the information rate naturally saturates as $a(s)$ approaches a step function. The saturation effect is clearly seen in the PWS results, and is found to be even more pronounced in the empirical Gaussian approximation. The LNA-based Gaussian approximation, however, shows no saturation. This highlights that the LNA linearizes the system at the level of the input-output mapping $a(s)$ which results in an approximation that is unaffected by the sigmoidal shape of $a(s)$ . In contrast, the empirical approximation is affected by nonlinear saturation effects because it is computed directly from simulated trajectories. We thus see that both approximations yield substantially different results at large gain.

Figure 5.4: Deviation from the exact information rate for the approximate Gaussian information rate computed via LNA (top), and the approximate Gaussian information rate computed via empirical estimates of the power spectra (bottom). A deviation of 0 implies perfect accuracy. While the absolute deviation of both approximations increases with increasing gain and decreasing response timescale, the LNA based approach consistently overestimates the information rate whereas the empirical approach constitutes a lower bound. Moreover, in terms of absolute deviation, the empirical approach is more accurate across all parameter values.

In Figure 5.4 we compare the absolute deviation between the approximations and the PWS result. For small gain we see that both approximations are accurate which is not surprising since the nonlinearity is very weak in this regime. Strikingly, for large gain, the LNA-based approximation always overestimates the mutual information rate while the empirical Gaussian approximation always underestimates the rate. In both cases the systematic error decreases as the response timescale becomes slower. This reflects the fact that for slow responders, the dynamic input-output relation is more linear (Figure 5.2) than for fast responders.

Additionally, we computed the relative deviation, see Figure 5.5 in Section 5.5.2. We find that in terms of relative error the curves for different response timescales largely overlap. In terms of relative approximation error, the gain, rather than response timescale, is the primary factor affecting the accuracy of the Gaussian approximation.

5.4 Discussion

We investigated the accuracy of the Gaussian approximation for the mutual information rate in two case studies, each highlighting a scenario where the approximation may be inaccurate. We were able to reliably quantify the inaccuracy in each case by computing the “ground truth” mutual information rate for these scenarios using a recently developed exact Monte Carlo technique called PWS [5].

We first considered linear discrete systems, which are relevant in biology due to the discrete nature of biochemical signaling networks. In our example, the Gaussian approximation cannot capture the full information rate, but only yields a lower bound. We show that a discrete approximation, developed by Moor and Zechner [4], is able to correctly estimate the mutual information rate of the network over a wide range of parameters. Since the Gaussian approximation captures the second moments of the discrete system, this finding demonstrates that a discrete system can transmit significantly more information than what would be inferred from its second moments alone. This perhaps surprising fact has been observed before [4,5,34] and it hinges on the use of a discrete reaction-based readout. As demonstrated in unpublished work [28], the increased mutual information rate found for a discrete readout stems from the ability of unambiguously distinguishing between individual reaction events in the readout’s trajectory. However, it remains an open question whether biological (or other) signaling systems can effectively harness this additional information encoded in the discrete trajectories. For systems that cannot distinguish individual reaction events in downstream processing, the Gaussian framework might still accurately quantify the “accessible information”.

A notable new observation in our first case study is the deviation between the discrete approximation of the mutual information rate derived by Moor and Zechner [4] and the exact result obtained using PWS [5] for inputs with low copy number $\bar s$ . In the discrete approximation, the mutual information rate is independent of the input copy number $\bar s$ , but the PWS simulations show that at low copy numbers there is an optimal $s^\star \approx 1$ which maximizes the mutual information rate. This surprising finding suggests that the information rate in discrete systems can be increased by reducing the copy number of the input sufficiently, such that it only switches between a few discrete input levels. Notably, in the reverse case—low output copy number $\bar x$ but large $\bar s$ —the discrete approximation always remains accurate. We leave a precise characterization of this finding for future work.

The second example focused on a continuous but nonlinear system, where we demonstrated that the accuracy of the Gaussian approximation depends on the linearization method. Linearizing the underlying system dynamics directly via the LNA leads to an overestimation of the information rate, while estimating the system’s correlation functions empirically from data underestimates it. Regardless of the method, the Gaussian approximation is more accurate in terms of absolute deviation of the true information rate when the gain of the system is small and its response is slow compared to the timescale of the input fluctuations.

The result of our second case study—that the empirical Gaussian mutual information rate underestimates the true rate—is consistent with theoretical expectations. As shown by Mitra and Stark [23], and highlighted in Ref. [2], an empirical Gaussian estimate of the mutual information between a Gaussian input signal $S_G$ and a non-Gaussian output $X$ provides a lower bound on the channel capacity $C(S,X)=\max_{\mathrm{P}(s)} I(S, X)$ (subject to a power constraint on $S$ ). Specifically, they show that $C(S, X) \geq I(S_G; X) \geq I(S_G; X_G)$ , where $(S_G, X_G)$ is a jointly Gaussian pair with the same covariance matrix as $(S_G, X)$ . For purely Gaussian systems like $(S_G, X_G)$ , the mutual information calculated using Equation 5.7 is exact and equal to the channel capacity. However, for systems that have a Gaussian input but are otherwise non-Gaussian, the mutual information is larger or equal than the corresponding Gaussian model with matching second moments, as evidenced in Figure 5.4. In general, the empirical Gaussian approximation yields a lower bound on the mutual information of the nonlinear system with a Gaussian input signal, as well as a lower bound on the channel capacity of the nonlinear system.²

We can distill several concrete recommendations for the computation of the information rate from our analysis. For linear discrete systems, the Gaussian approximation yields a lower bound on the true information rate which may accurately quantify the information available to systems that cannot distinguish individual discrete events. Alternatively, the reaction-based discrete approximation by Moor and Zechner [4] is highly accurate, even when the copy number of the output is extremely small. However, when the copy number of the input becomes small ( $\lesssim 10$ ), both approximations break down and one must use an exact method. Exact methods for obtaining the information rate of any stochastic reaction-based systems are PWS [5] or brute-force numerical integration of the stochastic filtering equation, as shown in [4]. For nonlinear continuous systems with small gain one can safely use the Gaussian approximation, either based on a linearization of the underlying dynamics or on empirically estimated correlation functions. Moreover, when the slowest input time-scale is more than a magnitude faster than the response time-scale, the non-linear response of the system is “averaged out” by the quick input fluctuations, and the Gaussian approximation yields accurate results. In this case, using a Gaussian approximation based on empirical correlation functions yields the most accurate result, and provides a rigorous lower bound for the mutual information. Finally, if the system is both highly nonlinear and has a fast response with respect to the input, one must resort to an exact method like PWS. We hope that our results will guide future research in determining the appropriate method for computing the mutual information rate.

Overall, our results greatly increase the usefulness of the Gaussian approximation for the information rate of non-Gaussian systems. The Gaussian approximation remains a useful method that can be applied directly and straightforwardly to experimental data. Here, we have quantified the prerequisites to safely use this approach. Moreover, we elucidate how an empirical Gaussian approximation constitutes a lower bound on the true information rate for systems with a sufficiently large input copy number.

5.5 Supplementary Information

5.5.1 Gaussian approximation

Here we derive the analytical expressions for the Gaussian information rate of the networks considered in the main text. To this end we first discuss the dynamics of the input signal $S$ and its power spectrum. Then, we perform a linear approximation of the dynamics of the readout species $X$ and derive the approximate Gaussian information rate between $S$ and $X$ for the nonlinear network. Finally, we derive the Gaussian information rate of the linear network from our expression of the Gaussian information rate of the nonlinear network.

5.5.1.1 Signal

The input signal is generated by a birth-death process,

$\label{eq-SBD} \begin{aligned} \text{∅ & ⇌ S},\\ \end{aligned} \qquad(5.21)$

Its dynamics in Langevin form are,

$\dot{s} = \kappa - \lambda s(t)+ \eta_s(t), \label{eq-sigdyn} \qquad(5.22)$

yielding the steady state signal concentration $\bar s=\kappa/\lambda$ . The independent Gaussian white noise process $\eta_s(t)$ summarizes all reactions that contribute to fluctuations in $S$ . The strength of the noise term in steady state is

$\langle \eta_s^2 \rangle = \kappa+ \lambda\bar s=2\lambda \bar s. \label{eq-etas} \qquad(5.23)$

The power spectral density, or power spectrum, of a stationary process $\mathcal{X}$ is defined as $S_{xx} = \lim_{T \to \infty}\frac{1}{T} |\tilde{x_T}|^2$ , where $\tilde{x}$ denotes the Fourier transform of $x(t)$ . The power spectrum of a signal obeying Equation 5.22 is thus given by

$\label{eq-specss} \begin{aligned} S_{ss} %&= \lim_{T\to \infty} \frac{\kappa^2\delta(-\omega) \delta(\omega)/T + \f[-]{\eta_s}\tilde{\eta_s}/T}{\omega^2+\lambda^2}, \\ &= \frac{\langle \eta_s^2 \rangle}{\omega^2+\lambda^2}=\frac{2 \lambda \bar s}{\omega^2+\lambda^2}. \end{aligned} \qquad(5.24)$

5.5.1.2 Linear approximation

We now consider the readout $X$ , which is produced via a nonlinear activation function $a(s)$ :

$\label{eq-StoV} \begin{aligned} \text{S & → S + X},\\ \text{X & → ∅}. \end{aligned} \qquad(5.25)$

We define the activation level $a(s)$ to be a Hill function,

$\label{eq-as} a(s) = \frac{s(t)^n}{K^n + s(t)^n}. \qquad(5.26)$

Such a dependency, in which $K$ sets the concentration of $S$ at which the activation is half-maximal and $n$ sets the steepness, can for example arise from cooperativity between the signal molecules in activating the synthesis of $X$ .

We have for the dynamics of $X$ in Langevin form

$\dot{x} = \rho \, a(s) - \mu \, x(t) + \eta_x(t), \qquad(5.27)$

with $a(s)$ given by Equation 5.26. The steady state concentration of $X$ is given by $\bar x = \bar a \rho/\mu$ , where we have defined the steady state activation level $\bar a = a(\bar s)$ . It is useful to determine the static gain of the network, which is defined as the change in the steady state of the output upon a change in the steady state of the signal:

$\label{eq-g} \begin{aligned} g &= \partial_{\bar s} \bar x = r/\mu, \\ & = n(1-\bar a) \bar x/\bar s, \end{aligned} \qquad(5.28)$

where we have defined the approximate linear activation rate

$\label{eq-r} r = n \bar a(1-\bar a)\rho/\bar s, \qquad(5.29)$

and the steady state of the activation level is given by

$\bar a =\frac{\bar s^n}{K^n + \bar s^n}. \qquad(5.30)$

Generally, we assume that $K=\bar s$ , which entails that in steady state the network is tuned to $\bar a = 1/2$ .

To compute the Gaussian information rate we approximate the dynamics of $X$ to first order around $\bar x$ via the classical linear noise approximation [26]. Within this approximation the dynamics of the deviation $\delta x(t) = x(t) -\bar x$ are,

$\label{eq-xlindyn} \delta\dot{ x} = r\, \delta s(t)- \mu \, \delta x(t) + \eta_x(t), \qquad(5.31)$

with the synthesis rate $r$ given by Equation 5.29.

In the linear noise approximation the noise strength is a constant given by the noise strength at steady state,

$\label{eq-etax} \langle \eta_x^2 \rangle = \rho \bar a + \mu \bar x = 2 \mu \bar x. \qquad(5.32)$

5.5.1.3 Information rate

Following Tostevin & Ten Wolde [1,2], we can express the Gaussian information rate as follows,

$\label{eq-infratedef} R(\mathcal S; \mathcal X) = \frac{1}{4\pi}\int_{-\infty}^{\infty} d\omega \log \left(1+ \frac{|K(\omega)|^2}{|N(\omega)|^2}S_{ss}\right), \qquad(5.33)$

where $|K(\omega)|^2$ is the frequency dependent gain and $|N(\omega)|^2$ is the frequency dependent noise of the output process $\mathcal X$ . If the intrinsic noise of the network is not correlated to the process that drives the signal, the power spectrum of the network output obeys the spectral addition rule [35]. In this case the frequency dependent gain and noise can be identified directly from the power spectrum of the output, because it takes the following form:

$\label{eq-specxx} S_{xx} = |K(\omega)|^2 S_{ss} + |N(\omega)|^2. \qquad(5.34)$

For a species $X$ obeying Equation 5.31, we have

$\label{eq-fgfn} \begin{aligned} |K(\omega)|^2 &= \frac{r^2}{\mu^2 + \omega^2}, \\ |N(\omega)|^2 &= \frac{\langle \eta_x^2 \rangle}{\mu^2 + \omega^2}= \frac{2 \mu \bar x}{\mu^2 + \omega^2}. \end{aligned} \qquad(5.35)$

The Wiener Khinchin theorem states that the power spectrum of a stochastic process and its auto-correlation function are a Fourier transform pair. We thus obtain for the variance in the readout, substituting the frequency dependent gain and noise [Equation 5.35] and the power spectrum of the signal [Equation 5.24] in Equation 5.34 and taking the inverse Fourier transform at $t=0$ ,

$\label{eq-varx} \sigma_x^2 = g\tilde{g}\sigma_s^2 + \sigma^2_{x|s}=\frac{g^2 \bar s}{1+\lambda/\mu} + \bar x, \qquad(5.36)$

where the signal variance equals its mean $\sigma_s^2=\bar s$ , and the mean readout concentration sets the intrinsic noise $\sigma^2_{x|s}=\bar x$ . We further have the static gain $g$ given by Equation 5.28, and have defined the dynamical gain

$\label{eq-dg} \tilde{g} \equiv \frac{\langle \delta x(t) \delta s(t) \rangle}{\sigma_s^2}= \frac{g}{1+\lambda/\mu}, \qquad(5.37)$

which is the slope of the mapping from the time-varying signal value $s(t)$ to the time-varying readout $x(t)$ ; for Gaussian systems $\langle x(t)|s(t) \rangle = \tilde{g} s(t)$ [2,19,29].

To solve the integral in Equation 5.33 we exploit that

$\label{eq-infrate1root} \int^\infty_{-\infty} d\omega \log\left( \frac{\omega^2 + a^2}{\omega^2 + b^2}\right) = 2\pi(a-b). \qquad(5.38)$

Substituting the frequency dependent gain and noise given in Equation 5.35 and the signal power spectrum of Equation 5.24 in Equation 5.33 and using Equation 5.38 we obtain the information rate,

$\label{eq-Rnl} \begin{aligned} R(\mathcal S; \mathcal X) &= \frac{\lambda}{2}\left(\sqrt{1+\frac{r^2\langle \eta_s^2 \rangle}{\lambda^2\langle \eta_x^2 \rangle}}-1\right), \\ &= \frac{\lambda}{2}\left(\sqrt{1+g^2\frac{\bar s \mu}{\bar x \lambda}}-1\right), \end{aligned} \qquad(5.39)$

where we used the noise strengths given in Equation 5.23 and Equation 5.32, we have the static gain $g$ of Equation 5.28, and the synthesis rate $r$ of Equation 5.29.

5.5.1.4 Linear network

To disambiguate differences in the information rate caused by the linear approximation of our nonlinear reaction network on one hand and the Gaussian approximation of the underlying jump process on the other, we consider the information rate of a linear network. Any difference between the exact information rate and the Gaussian information rate must then be a result of the Gaussian approximation. To this end we use the same input signal [Equation 5.22], but we consider a linear activation of the readout, i.e.

$\begin{aligned} \text{S & → S + X},\\ \text{X & → ∅} \end{aligned} \qquad(5.40)$

such that the Langevin dynamics of $X$ are

$\dot x = \rho s(t) - \mu x(t) + \eta_x(t), \qquad(5.41)$

which yields the steady state concentration $\bar x= \bar s \rho/\mu$ . For this linear readout, the static gain is simply set by the ratio of steady states of the input and the output, $g=\rho/\mu=\bar x/\bar s$ . We can then obtain the information rate of this linear system by substitution of its static gain in Equation 5.39, which yields

$\begin{aligned} \label{eq-Rlin} R(\mathcal S; \mathcal X) %&= \frac{\lambda}{2}\left(\sqrt{1+\frac{\bar x \mu}{\bar s \lambda}}-1\right), \\ & = \frac{\lambda}{2}\left(\sqrt{1+\frac{\rho}{\lambda}}-1\right). \end{aligned} \qquad(5.42)$

This result is identical to that of [1,2] (motif III).

5.5.2 Relative deviation of the Gaussian approximation for a nonlinear system

Figure 5.5: Relative deviation from the exact information rate for the approximate Gaussian information rate. The relative deviation was computed for both the LNA-based approximation and the empirical Gaussian approximation using numerically estimated power spectra, across varying system gains and response timescales (see Section 5.3.2 for details). The curves for different response timescales largely overlap, indicating that the relative approximation error is primarily influenced by system gain rather than response timescale. This highlights that system gain is the dominant factor in determining the accuracy of the Gaussian approximation.

Due to the definition of the mutual information, an absolute difference in information maps to a relative difference in the reduction of uncertainty. For this reason, Figure 5.4 in the main text (Section 5.3.2) focuses on the absolute deviation between the Gaussian approximation and the true mutual information. We compared two variants of the Gaussian approximation, the LNA-based approximation and the empirical Gaussian approximation. We found that in both cases the absolute deviation decreases with slower response timescales, reflecting the more linear input-output relationship in slow-responding systems.

However, the relative deviation of the Gaussian information rate from the true rate also offers valuable insights, which we explore here. In Figure 5.5 we compare the relative deviation $R_\text{Gaussian} / R_\text{PWS}$ between the Gaussian approximation and the exact mutual information computed using PWS. We find that the relative deviation increases as the system gain increases, indicating that the Gaussian approximation also becomes relatively less accurate for larger gains. As already discussed above, the empirical Gaussian method consistently underestimates the true information rate, while the LNA-based approximation overestimates it.

Interestingly, we also observe that for the LNA approximation, at fast timescales the result is slightly more accurate, whereas the empirical Gaussian estimate is more accurate at slow timescales. We initially expected that in both cases slow timescales would yield better agreement with PWS, as the input-output dynamics are more linear for slow timescales, and thus better approximated by the Gaussian model. The fact that this is not the case for the LNA approximation is intriguing, indicating the need for further investigation into the interplay between timescales, system nonlinearity, and the LNA.

References

[1]

F. Tostevin and P. R. ten Wolde, Mutual Information between Input and Output Trajectories of Biochemical Networks, Physical Review Letters 102, 218101 (2009).

[2]

F. Tostevin and P. R. ten Wolde, Mutual information in time-varying biochemical systems, Phys. Rev. E 81, 061917 (2010).

[3]

H. H. Mattingly, K. Kamino, B. B. Machta, and T. Emonet, Escherichia coli chemotaxis is information limited, Nature Physics 17, 1426 (2021).

[4]

A.-L. Moor and C. Zechner, Dynamic information transfer in stochastic biochemical networks, Physical Review Research 5, 013032 (2023).

[5]

M. Reinhardt, G. Tkačik, and P. R. ten Wolde, Path Weight Sampling: Exact Monte Carlo Computation of the Mutual Information between Stochastic Trajectories, Physical Review X 13, 041017 (2023).

[6]

L. Hahn, A. M. Walczak, and T. Mora, Dynamical Information Synergy in Biochemical Signaling Networks, Physical Review Letters 131, 128401 (2023).

[7]

J. E. Segall, S. M. Block, and H. C. Berg, Temporal comparisons in bacterial chemotaxis, Proceedings of the National Academy of Sciences 83, 8987 (1986).

[8]

M. W. Covert, T. H. Leung, J. E. Gaston, and D. Baltimore, Achieving stability of lipopolysaccharide-induced NF-

\kappa

b activation, Science 309, 1854 (2005).

[9]

S. P. Strong, R. Koberle, R. R. de Ruyter van Steveninck, and W. Bialek, Entropy and Information in Neural Spike Trains, Physical Review Letters 80, 197 (1998).

[10]

C. E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal 27, 379 (1948).

[11]

E. Ziv, I. Nemenman, and C. H. Wiggins, Optimal Signal Processing in Small Stochastic Biochemical Networks, PLoS ONE 2, e1077 (2007).

[12]

R. Cheong, A. Rhee, C. J. Wang, I. Nemenman, and A. Levchenko, Information Transduction Capacity of Noisy Biochemical Signaling Networks, Science 334, 354 (2011).

[13]

J. O. Dubuis, G. Tkačik, E. F. Wieschaus, T. Gregor, and W. Bialek, Positional information, in bits, Proceedings of the National Academy of Sciences 110, 16301 (2013).

[14]

S. E. Palmer, O. Marre, M. J. Berry, and W. Bialek, Predictive information in a sensory population, Proceedings of the National Academy of Sciences 112, 6908 (2015).

[15]

M. Chalk, O. Marre, and G. Tkačik, Toward a unified theory of efficient, predictive, and sparse coding, Proceedings of the National Academy of Sciences 115, 186 (2018).

[16]

M. Bauer, M. D. Petkova, T. Gregor, E. F. Wieschaus, and W. Bialek, Trading bits in the readout from a genetic network, Proceedings of the National Academy of Sciences 118, e2109011118 (2021).

[17]

V. Sachdeva, T. Mora, A. M. Walczak, and S. E. Palmer, Optimal prediction with resource constraints using the information bottleneck, PLoS Computational Biology 17, e1008743 (2021).

[18]

A. J. Tjalma, V. Galstyan, J. Goedhart, L. Slim, N. B. Becker, and P. R. ten Wolde, Trade-offs between cost and information in cellular prediction, Proceedings of the National Academy of Sciences 120, e2303078120 (2023).

[19]

A. J. Tjalma and P. R. ten Wolde, Predicting concentration changes via discrete receptor sampling, Physical Review Research 6, 033049 (2024).

[20]

T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. (John Wiley & Sons, 2006).

[21]

G. Tkačik, C. G. Callan, and W. Bialek, Information flow and optimization in transcriptional regulation, Proceedings of the National Academy of Sciences 105, 12265 (2008).

[22]

W. H. de Ronde, F. Tostevin, and P. R. ten Wolde, Effect of feedback on the fidelity of information transmission of time-varying signals, Physical Review E—Statistical, Nonlinear, and Soft Matter Physics 82, 031914 (2010).

[23]

P. P. Mitra and J. B. Stark, Nonlinear limits to the information capacity of optical fibre communications, Nature 411, 1027 (2001).

[24]

M. Meijers, S. Ito, and P. R. ten Wolde, Behavior of information flow near criticality, Physical Review E 103, L010102 (2021).

[25]

R. Fan and A. Hilfinger, Characterizing the nonmonotonic behavior of mutual information along biochemical reaction cascades, Physical Review E 110, 034309 (2024).

[26]

N. G. van Kampen, Stochastic Processes in Physics and Chemistry, 3rd ed. (Elsevier, Amsterdam, 2007).

[27]

L. Onsager and S. Machlup, Fluctuations and Irreversible Processes, Physical Review 91, 1505 (1953).

[28]

A.-L. Moor, A. Tjalma, M. Reinhardt, P. R. ten Wolde, and C. Zechner, Reaction-Based Information Processing in Biochemical Networks, (unpublished).

[29]

G. Malaguti and P. R. ten Wolde, Theory for the optimal detection of time-varying signals in cellular sensing systems, eLife 10, e62574 (2021).

[30]

E. A. Nadaraya, On Estimating Regression, Theory of Probability & Its Applications 9, 141 (1964).

[31]

G. S. Watson, Smooth Regression Analysis, Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 26, 359 (1964).

[32]

A. V. Oppenheim and R. W. Schafer, Digital Signal Processing (Prentice-Hall, Englewood Cliffs, New Jersey, 1975).

[33]

P. B. Warren, S. Tănase-Nicola, and P. R. ten Wolde, Exact results for noise power spectra in linear biochemical reaction networks, The Journal of Chemical Physics 125, 144904 (2006).

[34]

S. A. Cepeda-Humerez, J. Ruess, and G. Tkačik, Estimating information in time-varying signals., PLoS Computational Biology 15, e1007290 (2019).

[35]

S. Tănase-Nicola, P. B. Warren, and P. R. ten Wolde, Signal detection, modularity, and the correlation between extrinsic and intrinsic noise in biochemical networks, Physical Review Letters 97, 068102 (2006).

In still unpublished work together with Anne-Lena Moor and Christoph Zechner [28], we found that the root cause for the deviations of the Gaussian approximation lies in how the LNA approximates the reaction noise in the chemical master equation. While the dynamics of the chemical master equation give rise to discrete sample paths, i.e., piece-wise constant trajectories connected by instantaneous discontinuous jumps, the LNA approximation yields continuous stochastic trajectories. Our results imply that a discrete sample path of $X$ carries more information about $S$ than the corresponding continuous sample path $x(t)$ would carry about $s(t)$ in the LNA. In the collaborative effort [28] we found that this is ultimately due to the fact that in the discrete system, each reaction event is unambiguously recorded in the $X$ trajectories and thus different reactions modifying the same species can be distinguished. In contrast, in the continuous LNA description, all reactions that modify $X$ contribute to the noise term $\eta_x(t)$ in ?eq-output-dynamics but their contributions are lumped together and therefore cannot be distinguished from an observed $x(t)$ -trajectory. Specifically, note that for the motif studied here, only the production reaction $S\to S+X$ conveys information. The decay reaction of the output $X\to\emptyset$ does not carry information on the input fluctuations since its propensity is independent of the input. Yet, it contributes to the overall fluctuations in the output. The Gaussian approximation only considers the total fluctuations in the output, while the discrete approximation correctly distinguishes between the fluctuations induced from production events and decay events. Therefore, the Gaussian approximation consistently underestimates the true information transmission, whereas the discrete approximation does not incur this systematic error. This subtle point is reflected in the difference between Equation 5.14, Equation 5.15.↩︎
Note that this argument does not apply to the Linear Noise Approximation (LNA). The bound specifically requires the Gaussian model to use the covariance of the full, original system. When the system is first linearized using the LNA, the resulting linear model does not retain the same covariance as the original nonlinear system. As a result, the mutual information rate calculated with the LNA is generally not a lower (nor an upper) bound on the true mutual information rate.↩︎