5 The Accuracy of the Gaussian Approximation
This chapter was written by Age Tjalma and Manuel Reinhardt as shared first authors, in collaboration with Anne-Lena Moor (MPI-CBG Dresden), and Pieter Rein ten Wolde.
Efficient information processing is crucial for both living organisms and engineered systems. The mutual information rate, a core concept of information theory, quantifies the amount of information shared between the trajectories of input and output signals, and allows to quantification of information flow in dynamic systems. A common approach for estimating the mutual information rate is the Gaussian approximation, which assumes that the input and output trajectories follow Gaussian statistics. However, this method is limited to linear systems, and its accuracy in nonlinear or discrete systems remains unclear. In this work, we assess the accuracy of the Gaussian approximation for non-Gaussian systems by leveraging Path Weight Sampling (PWS), a recent technique for exactly computing the mutual information rate. In two case studies, we examine the limitations of the Gaussian approximation. First, we focus on discrete linear systems and demonstrate that, even when the system’s statistics are nearly Gaussian, the Gaussian approximation fails to accurately estimate the mutual information rate. Second, we explore a continuous diffusive system with a nonlinear transfer function, revealing significant deviations between the Gaussian approximation and the exact mutual information rate as nonlinearity increases. Our results provide a quantitative evaluation of the Gaussian approximation’s performance across different stochastic models and highlight when more computationally intensive methods, such as PWS, are necessary.
5.1 Introduction
For the functioning of both living and engineered systems it is paramount that they collect and process information effectively. Increasingly, it has become evident that beyond instantaneous properties, the dynamic features of an input signal or system output often encode valuable information [1–6]. Prime examples in biology include bacterial chemotaxis, which responds to temporal changes in concentration [7], the transcription factor NF-B, which encodes information about input signals in its dynamic response [8], and neuronal information processing, where information is encoded in the sequence and timing of spikes [9]. Beyond biology, dynamic input signals are critical for various sensing systems, such as those used in automated factories or self-driving cars.
To understand and evaluate the performance, potential improvements, and limitations of these systems in processing information, we need appropriate metrics that capture their full information processing capability. Information theory, introduced by Shannon [10], provides the most general mathematical framework for such metrics. The mutual information and mutual information rate measure how much one random variable reduces uncertainty about another, quantified in bits. It is relatively straightforward to quantify the information shared between scalar properties of the input and output, as has been done in various forms [11–19]. However, capturing all information in dynamical properties of the input and the output, is much more challenging. To do so, one must consider the information encoded time-varying trajectories of the variables of interest. Yet, due to the high dimensionality of the trajectory space, computing the mutual information between such trajectories is notoriously difficult.
A major advancement in this area has been the Gaussian approximation of the mutual information rate [1,2], based on the assumption of input and output trajectories following jointly Gaussian statistics. This assumption makes it possible to compute the mutual information rate directly from the two-point correlation functions of the input and output. It is thus straightforward to apply the Gaussian approximation to experimental data. Moreover, given a mechanistic model of the underlying dynamics, the Gaussian approximation can be used to derive analytical expressions for the information rate [1–3]. Crucially however, the assumption of Gaussian statistics restricts the method to linear systems, as Gaussian statistics can only arise in such systems [20].
Understanding when the Gaussian approximation is accurate is critical because many real-world systems, such as biological and engineered sensory systems, exhibit nonlinear dynamics. This includes features such as bimodality, discrete jumps, or heavy tails, all of which deviate from purely Gaussian dynamics. Such non-Gaussian behavior typically results from intrinsic nonlinearities in the system, but determining the degree of a system’s deviation from linearity is difficult [21–23], and the extent to which the approximation loses accuracy in nonlinear systems is unclear. Thus, although the Gaussian approximation offers a computationally simple framework to estimate information transmission, it remains an open question under what conditions this approximation is sufficiently accurate.
Until recently, addressing this question has been hard because there was no reliable benchmark for the exact information rate. Without a method to compute the true information rate of a non-Gaussian system, it is impossible to rigorously assess the accuracy of the Gaussian approximation. This gap was filled by the development of two independent methods [4,5] for computing the information rate accurately even in systems that significantly deviate from Gaussian behavior. Here we leverage one of these methods: Path Weight Sampling (PWS) [5]. This is a Monte Carlo technique which is an exact method for calculating the mutual information rate in a wide range of stochastic models.
Using PWS, we can directly evaluate the accuracy of the Gaussian approximation in models that exhibit explicit non-Gaussian features, and study the approximation’s robustness in typical applications.
In this article, we investigate the accuracy of the approximate Gaussian information rate through two case studies. The first focuses on Markov jump processes, where the statistics are non-Gaussian due to the discrete nature of the processes. Perhaps surprisingly, the Gaussian approximation fails to accurately estimate the mutual information rate in this case, even when the statistics are nearly Gaussian [4,5]. We show that a recently developed reaction-based “discrete approximation” by Moor and Zechner [4] is much more accurate. This suggests that the Gaussian approximation fails because it cannot distinguish the individual reaction events.
The second case study examines a continuous diffusive process with a nonlinear transfer function. We demonstrate how intrinsic nonlinearity can cause significant deviations between the Gaussian approximation and the true mutual information rate. By varying the degree of nonlinearity as well as the system’s response timescale, we provide a comprehensive quantitative understanding of the Gaussian approximation’s limitations in nonlinear systems. Additionally, we show that for such systems, the Gaussian approximation differs significantly when derived from empirical correlation functions compared to when it is analytically obtained from the nonlinear model, highlighting that the correct application of the approximation is important.
Our work translates into concrete recommendations on when to use which method for the computation of the information rate. It therefore enables researchers to more confidently determine when a simpler approximate method is sufficient, or when a more sophisticated method like PWS [5] or the method developed by Moor and Zechner [4] should be used.
5.2 Methods
5.2.1 The mutual information rate
The mutual information between two random variables and is defined as
or, equivalently, using Shannon entropies
In the context of a noisy communication channel, and represent the messages at the sending and receiving end, respectively. Then, is the amount of information about that is communicated when only is received. If can be perfectly reconstructed from , then . On the contrary, if and are independent, . The mutual information thus is always non-negative and quantifies the degree of statistical dependence between two random variables.
For systems that continuously transmit information over time, this concept must be extended to trajectories and . The mutual information between trajectories is defined analogously as
where the expected value is taken with respect to the full joint probability of both trajectories. This quantity can be interpreted as the total information that is communicated over the time interval .
Note that the total amount of information communicated over the time-interval is not directly related to the instantaneous mutual information at any instant . This is because auto-correlations within the input or output sequences reduce the amount of new information transmitted in subsequent measurements. Moreover, information can be encoded in temporal features of the trajectories, which cannot be captured by an instantaneous information measure. Therefore, as previously pointed out [24,25], the instantaneous mutual information for any given does not provide a meaningful measure of information transmission. To correctly quantify the amount of information transmitted per unit time we must consider entire trajectories.
For that reason, the mutual information rate is defined via the trajectory mutual information. Let the input and output of a system be given by two continuous-time stochastic processes and . Then, the mutual information rate between and is
and quantifies the amount of information that can reliably be transmitted per unit time. The mutual information rate therefore represents an excellent performance measure for information processing systems.
In summary, the mutual information rate is the crucial performance metric for stochastic information processing systems. However, its information-theoretic definition does not translate into an obvious scheme for computing it. As a result, various methods have been developed to compute or approximate the mutual information rate.
5.2.2 Gaussian Approximation
One way to significantly simplify the computation of the information rate, is to assume that the input and output trajectories obey stationary Gaussian statistics. Under this assumption Equation 5.3 simplifies to,
where and are the determinants of the covariance matrices of the respective trajectories and , and
is the covariance matrix of their joint distribution.
In the limit that the trajectory length , with the discretization , becomes infinitely long () and continuous (), the information rate as defined in Equation 5.4 can be expressed in terms of the power spectral densities, or power spectra, of the processes and [1,2]:
Here, and respectively are the power spectra of trajectories generated by and , and is their cross-spectrum. The fraction
is known as the coherence, describing the distribution of power transfer between and over the frequency .
For systems that are neither Gaussian nor linear, there are two ways to still obtain an approximate Gaussian information rate. The first is to directly measure two-point correlation functions from data or simulations, and use these to retrieve the power spectra in Equation 5.7. The second is to use Kampen [26]’s linear noise approximation (LNA) and approximate the dynamics of the system to first order around a fixed point [26], see also [app:gauss?]. In this work, we will analyze both of these methods.
5.2.3 Path Weight Sampling for diffusive systems
To evaluate the accuracy of the Gaussian information rate for non-Gaussian systems, an exact method for determining the true information rate is required. Recently, a method called Path Weight Sampling (PWS) was developed, which computes the exact mutual information rate using Monte Carlo techniques without relying on approximations [5].
In Ref. [5], PWS was introduced as a computational framework for calculating the mutual information rate in systems governed by master equations. Master equations provide an exact stochastic description of continuous-time processes with discrete state-spaces, commonly used in models ranging from biochemical signaling networks to population dynamics. However, many systems are not described by discrete state spaces and instead require a stochastic description based on diffusion processes or other stochastic models. Fortunately, PWS is not restricted to systems described by master equations and can be extended to a variety of stochastic models.
In general, PWS can be applied to any system that meets the following conditions: (i) sampling from the input distribution is straightforward, (ii) sampling from the conditional output distribution is straightforward, and (iii) the logarithm of the conditional probability density , referred to as the path weight, can be evaluated efficiently. For any stochastic model that satisfies these three criteria, the PWS computation proceeds similarly to systems governed by master equations.
Briefly, PWS computes the trajectory Mutual Information using a Monte Carlo estimate of Equation 5.3
where are independently drawn from , and each is drawn from . As , this expression converges to the mutual information . In Equation 5.9, the term can be evaluated directly (per criterion iii), but the marginal probability has to computed in separately for each output trajectory . Typically, this has to be done numerically via marginalization, i.e., by computing the path integral
using Monte Carlo techniques. Evaluating the marginalization integral efficiently is essential for computing the mutual information using PWS and discussed in detail in Ref. [5]. In summary, PWS is a generic framework that can be used beyond systems defined by a master equation as long as a suitable generative model satisfying the three conditions above is available.
For this study, we extended PWS to compute the mutual information rate for systems with diffusive dynamics, described by Langevin equations. For such systems, the aforementioned conditions are inherently fulfilled and PWS can be applied. Specifically, in a Langevin system, both the input and the output are stochastic processes given by the solution to a stochastic differential equation (SDE). Using stochastic integration schemes like the Euler-Mayurama method, we can straightforwardly generate realizations and from the corresponding stochastic process. These realizations are naturally time-discretized with the integration time step . For a time-discretized trajectory , the path weight is—up to a Gaussian normalization constant—given by the Onsager-Machlup action [27]
where we used , and is the deterministic drift, and represents the white noise amplitude. This expression captures the likelihood of a particular trajectory, given the stochastic dynamics of the system, and serves as the path weight in the PWS computation.
5.3 Case Studies
To investigate the conditions under which the Gaussian approximation deviates from the exact mutual information rate, we conducted two case studies. In both studies we compare the Gaussian approximation against the exact mutual information rate, computed via PWS. In the first case study we focus on a discrete linear system which is inspired by minimal motifs of cellular signaling.
5.3.1 Discrete reaction system
We consider a simple linear reaction system of two species, and , whose dynamics are governed by 4 reactions
The reaction system is linear because each reaction has at most one reactant. The trajectories of and are correlated because the production rate of depends on the copy number of , and therefore information is transferred from to . This set of reactions can be interpreted as a simple motif for gene expression where is a transcription factor and represents the expressed protein. In steady state, the mean copy numbers are given by and .
The exact stochastic dynamics of this reaction system can be expressed by the chemical master equation [26]. This equation describes the time-evolution of the discrete probability distribution over the possible copy numbers of species and , capturing the noise from the chemical reaction events. From this description we can obtain the mutual information rate from to without approximations using PWS [5].
While the chemical master equation is an exact representation of the reaction system, for large copy numbers the stochastic dynamics are well-approximated by a linearized model around the steady state. The resulting Langevin equations can be systematically derived from the master equation using the LNA which yields
where and are continuous variables representing the copy numbers of and , and are independent delta-correlated white noise terms with and , see [app:gauss?].
The Gaussian approximation of the mutual information rate is derived from the LNA description. Using this framework, Tostevin and Wolde [1] computed an analytical expression for the mutual information rate of the motif in units of nats :
More recently, Moor and Zechner [4] have derived a different expression for the mutual information rate of this reaction system by analytically approximating the relevant filtering equation, which is derived from the master equation, thus recognizing the discreteness of molecules. This approach explicitly differentiates the contributions of individual reactions to the noise amplitude of each component, while the LNA lumps their contributions together. As we will discuss in more detail below, separately accounting for the noise from each reaction separately better captures the information transmitted via discrete systems, making this “discrete approximation” more accurate than the Gaussian approximation for this case study. Nevertheless, the result is still based on an approximation that is only accurate for large copy numbers. The expression for the mutual information rate in the discrete approximation appears remarkably similar to the expression obtained using the Gaussian framework:
Note that this equation only differs from Equation 5.14 by the additional factor 2 inside the square root.
The natural—but incorrect—expectation is that for large copy numbers both approximations converge to the true mutual information rate. However, the difference between Equation 5.14, Equation 5.15 already reveals that the two approximations do not converge. Indeed, previous work shows that even in the limit of infinite copy numbers, the Gaussian approximation only yields a lower bound to the information rate, which is not tight [4,5].
We compare both approximations against exact PWS simulations for different parameters. In Figure 5.1a, we vary the mean copy number of the readout, , by varying its synthesis rate and compute the mutual information rate using both approximations as well as PWS, while keeping the input copy number constant at . We observe that the Gaussian approximation via the LNA [Equation 5.14] consistently underestimates the mutual information rate. This confirms that even when and are large, the Gaussian approximation only yields a lower bound to the information rate of the discrete linear system. In contrast, the discrete approximation [Equation 5.15] coincides with the true mutual information rate obtained from PWS simulations over all output copy numbers , even for .
In Figure 5.1b, instead of varying the copy number of the output, we vary the copy number of the input by varying the production rate . Note that both the Gaussian approximation and the discrete approximation are independent of . Yet, we observe that the true mutual information rate is not. For sufficiently large input copy numbers the discrete approximation coincides with the true information rate while the Gaussian information rate remains only a lower bound. Thus, the discrete approximation is highly accurate for . For small where , we find that the mutual information rate deviates from both, the LNA as well as the discrete approximation. Surprisingly, we find an optimal value of for which the mutual information rate is maximized and exceeds both approximations. This implies that at low input copy numbers, the system is able to extract additional information from the discrete input trajectories, which is not accounted for by either of the approximations.
In all cases, we found that the Gaussian approximation deviates significantly from the true information rate for this discrete system. Seemingly paradoxically, the Gaussian approximation based on the LNA does not converge to the true information rate at high copy numbers, even though the LNA approximates the stochastic dynamics extremely well in this regime. In contrast, the discrete approximation from Moor and Zechner [4] does not suffer from this issue. It has been shown that, generally, the Gaussian approximation is a lower bound on the discrete approximation [4], prompting the question of which features of the discrete trajectories are not captured by the Gaussian approximation. 1
5.3.2 Nonlinear continuous system
Next, we study a nonlinear variant of the reaction system above. In contrast to the previous case study, we deliberately avoid using discrete dynamics, as we already observed that the Gaussian approximation is generally inaccurate in such systems. Instead, we focus solely on continuous Langevin dynamics to explore how an explicitly nonlinear input-output mapping affects the accuracy of the inherently linear Gaussian approximation. We hypothesize that the accuracy of the Gaussian approximation will deteriorate as the degree of nonlinearity increases. To test this hypothesis, we analyze a simple Langevin system with adjustable nonlinearity.
The system is defined by two coupled Langevin equations, one that describes the input, and one that describes the output. The stochastic dynamics of the input are given by Equation 5.13. The output dynamics of are given by
with the Hill function
This function serves as a tuneable non-linearity with the Hill coefficient . As , the Hill function approaches a shallow linear mapping, while for large , it becomes sigmoidal and highly non-linear. As , approaches the unit step function centered at . The so-called static input-output relation specifies the mean output for a given input signal and is given by . The gain of this system is then defined as the slope of this relation at , i.e.,
as derived in Section 5.5.1.3. Importantly, for , the gain of the system is directly proportional to the Hill coefficient , i.e., the gain is directly coupled to the degree of nonlinearity.
Figure 5.2 shows how, on average, the output at a given time depends on the input at that same time (solid colored curves). This is the so-called dynamical input-output relation of a system [29]. The solid black curve represents the static input-output relation. While the static input-output relation is purely determined by the instantaneous function , the dynamical input-output relation depends not only on this function, but also on the timescale of the response . The reason is that as the output responds more slowly to the input, the temporal input fluctuations are averaged out increasingly. Therefore, the response of the output becomes shallower for increasing . Moreover, slower systems with more shallow responses react approximately linear to the input (Figure 5.2).
While using the Langevin extension of PWS we can directly compute the mutual information rate of this nonlinear model, the Gaussian approximation can only be applied to linear systems. Therefore, to obtain the mutual information rate in the Gaussian approximation, we have to linearize the system. There are two approaches for linearizing the stochastic dynamics of this nonlinear system which result in different information estimates.
The first approach is to linearize Equation 5.16 analytically via the LNA as shown in [app:gauss:lna?]. Within this approach we can obtain an analytical expression for the information rate (see Section 5.5.1.3),
This LNA based approach also yields a linearized dynamic input-output relation, shown as dashed lines in Figure 5.2.
We observe that the linearized input-output relation closely matches the slope of the true nonlinear dynamical input-output relation at , but overall it does not correspond to a (least-squares) linear fit of the nonlinear dynamical input-output relation. For all values of , the linearized input-output relation has a slope greater than or equal to the slope of the dynamical input-output relation. Empirically, the LNA thus seems to over-estimate the dynamical gain of the system. The reason may be that the LNA approximates the static input-relation (Figure 5.2 black curve), and estimates the linearized dynamical input-output relation based on this static approximation only.
The second “empirical Gaussian” approach to linearize the nonlinear system potentially avoids these issues. In this approach, we first numerically generate trajectories from the stochastic Equation 5.13, Equation 5.16 and use digital signal processing techniques to estimate the mutual information rate from the trajectories. We numerically estimate the (cross) power spectra of input and response using Welch’s method [32 see Ch. 11]. From the estimated spectral densities we compute the coherence
which we use to obtain the Gaussian approximation of the mutual information rate directly using Equation 5.7.
The empirical power spectra characterize the linear response of a system, but not in the same way as the LNA. While for linear systems the power spectra obtained via the LNA match the empirical power spectra [33], for a nonlinear system, the empirical power spectra and the coherence can differ from the corresponding LNA calculations. The two linearization approaches are thus not equivalent. We tested the accuracy of the Gaussian mutual information rate estimates using both linearization approaches to elucidate the differences in these approaches.
Figure 5.3 displays the mutual information rate obtained via two linearized approximations as well as the exact PWS result. We vary the gain and the response time-scale , both of which significantly affect the shape of the dynamical input-output relationship. As expected, a larger gain or a faster response time lead to an increase in the mutual information rate. At large gain, the information rate naturally saturates as approaches a step function. The saturation effect is clearly seen in the PWS results, and is found to be even more pronounced in the empirical Gaussian approximation. The LNA-based Gaussian approximation, however, shows no saturation. This highlights that the LNA linearizes the system at the level of the input-output mapping which results in an approximation that is unaffected by the sigmoidal shape of . In contrast, the empirical approximation is affected by nonlinear saturation effects because it is computed directly from simulated trajectories. We thus see that both approximations yield substantially different results at large gain.
In Figure 5.4 we compare the absolute deviation between the approximations and the PWS result. For small gain we see that both approximations are accurate which is not surprising since the nonlinearity is very weak in this regime. Strikingly, for large gain, the LNA-based approximation always overestimates the mutual information rate while the empirical Gaussian approximation always underestimates the rate. In both cases the systematic error decreases as the response timescale becomes slower. This reflects the fact that for slow responders, the dynamic input-output relation is more linear (Figure 5.2) than for fast responders.
Additionally, we computed the relative deviation, see Figure 5.5 in Section 5.5.2. We find that in terms of relative error the curves for different response timescales largely overlap. In terms of relative approximation error, the gain, rather than response timescale, is the primary factor affecting the accuracy of the Gaussian approximation.
5.4 Discussion
We investigated the accuracy of the Gaussian approximation for the mutual information rate in two case studies, each highlighting a scenario where the approximation may be inaccurate. We were able to reliably quantify the inaccuracy in each case by computing the “ground truth” mutual information rate for these scenarios using a recently developed exact Monte Carlo technique called PWS [5].
We first considered linear discrete systems, which are relevant in biology due to the discrete nature of biochemical signaling networks. In our example, the Gaussian approximation cannot capture the full information rate, but only yields a lower bound. We show that a discrete approximation, developed by Moor and Zechner [4], is able to correctly estimate the mutual information rate of the network over a wide range of parameters. Since the Gaussian approximation captures the second moments of the discrete system, this finding demonstrates that a discrete system can transmit significantly more information than what would be inferred from its second moments alone. This perhaps surprising fact has been observed before [4,5,34] and it hinges on the use of a discrete reaction-based readout. As demonstrated in unpublished work [28], the increased mutual information rate found for a discrete readout stems from the ability of unambiguously distinguishing between individual reaction events in the readout’s trajectory. However, it remains an open question whether biological (or other) signaling systems can effectively harness this additional information encoded in the discrete trajectories. For systems that cannot distinguish individual reaction events in downstream processing, the Gaussian framework might still accurately quantify the “accessible information”.
A notable new observation in our first case study is the deviation between the discrete approximation of the mutual information rate derived by Moor and Zechner [4] and the exact result obtained using PWS [5] for inputs with low copy number . In the discrete approximation, the mutual information rate is independent of the input copy number , but the PWS simulations show that at low copy numbers there is an optimal which maximizes the mutual information rate. This surprising finding suggests that the information rate in discrete systems can be increased by reducing the copy number of the input sufficiently, such that it only switches between a few discrete input levels. Notably, in the reverse case—low output copy number but large —the discrete approximation always remains accurate. We leave a precise characterization of this finding for future work.
The second example focused on a continuous but nonlinear system, where we demonstrated that the accuracy of the Gaussian approximation depends on the linearization method. Linearizing the underlying system dynamics directly via the LNA leads to an overestimation of the information rate, while estimating the system’s correlation functions empirically from data underestimates it. Regardless of the method, the Gaussian approximation is more accurate in terms of absolute deviation of the true information rate when the gain of the system is small and its response is slow compared to the timescale of the input fluctuations.
The result of our second case study—that the empirical Gaussian mutual information rate underestimates the true rate—is consistent with theoretical expectations. As shown by Mitra and Stark [23], and highlighted in Ref. [2], an empirical Gaussian estimate of the mutual information between a Gaussian input signal and a non-Gaussian output provides a lower bound on the channel capacity (subject to a power constraint on ). Specifically, they show that , where is a jointly Gaussian pair with the same covariance matrix as . For purely Gaussian systems like , the mutual information calculated using Equation 5.7 is exact and equal to the channel capacity. However, for systems that have a Gaussian input but are otherwise non-Gaussian, the mutual information is larger or equal than the corresponding Gaussian model with matching second moments, as evidenced in Figure 5.4. In general, the empirical Gaussian approximation yields a lower bound on the mutual information of the nonlinear system with a Gaussian input signal, as well as a lower bound on the channel capacity of the nonlinear system.2
We can distill several concrete recommendations for the computation of the information rate from our analysis. For linear discrete systems, the Gaussian approximation yields a lower bound on the true information rate which may accurately quantify the information available to systems that cannot distinguish individual discrete events. Alternatively, the reaction-based discrete approximation by Moor and Zechner [4] is highly accurate, even when the copy number of the output is extremely small. However, when the copy number of the input becomes small (), both approximations break down and one must use an exact method. Exact methods for obtaining the information rate of any stochastic reaction-based systems are PWS [5] or brute-force numerical integration of the stochastic filtering equation, as shown in [4]. For nonlinear continuous systems with small gain one can safely use the Gaussian approximation, either based on a linearization of the underlying dynamics or on empirically estimated correlation functions. Moreover, when the slowest input time-scale is more than a magnitude faster than the response time-scale, the non-linear response of the system is “averaged out” by the quick input fluctuations, and the Gaussian approximation yields accurate results. In this case, using a Gaussian approximation based on empirical correlation functions yields the most accurate result, and provides a rigorous lower bound for the mutual information. Finally, if the system is both highly nonlinear and has a fast response with respect to the input, one must resort to an exact method like PWS. We hope that our results will guide future research in determining the appropriate method for computing the mutual information rate.
Overall, our results greatly increase the usefulness of the Gaussian approximation for the information rate of non-Gaussian systems. The Gaussian approximation remains a useful method that can be applied directly and straightforwardly to experimental data. Here, we have quantified the prerequisites to safely use this approach. Moreover, we elucidate how an empirical Gaussian approximation constitutes a lower bound on the true information rate for systems with a sufficiently large input copy number.
5.5 Supplementary Information
5.5.1 Gaussian approximation
Here we derive the analytical expressions for the Gaussian information rate of the networks considered in the main text. To this end we first discuss the dynamics of the input signal and its power spectrum. Then, we perform a linear approximation of the dynamics of the readout species and derive the approximate Gaussian information rate between and for the nonlinear network. Finally, we derive the Gaussian information rate of the linear network from our expression of the Gaussian information rate of the nonlinear network.
5.5.1.1 Signal
The input signal is generated by a birth-death process,
Its dynamics in Langevin form are,
yielding the steady state signal concentration . The independent Gaussian white noise process summarizes all reactions that contribute to fluctuations in . The strength of the noise term in steady state is
The power spectral density, or power spectrum, of a stationary process is defined as , where denotes the Fourier transform of . The power spectrum of a signal obeying Equation 5.22 is thus given by
5.5.1.2 Linear approximation
We now consider the readout , which is produced via a nonlinear activation function :
We define the activation level to be a Hill function,
Such a dependency, in which sets the concentration of at which the activation is half-maximal and sets the steepness, can for example arise from cooperativity between the signal molecules in activating the synthesis of .
We have for the dynamics of in Langevin form
with given by Equation 5.26. The steady state concentration of is given by , where we have defined the steady state activation level . It is useful to determine the static gain of the network, which is defined as the change in the steady state of the output upon a change in the steady state of the signal:
where we have defined the approximate linear activation rate
and the steady state of the activation level is given by
Generally, we assume that , which entails that in steady state the network is tuned to .
To compute the Gaussian information rate we approximate the dynamics of to first order around via the classical linear noise approximation [26]. Within this approximation the dynamics of the deviation are,
with the synthesis rate given by Equation 5.29.
In the linear noise approximation the noise strength is a constant given by the noise strength at steady state,
5.5.1.3 Information rate
Following Tostevin & Ten Wolde [1,2], we can express the Gaussian information rate as follows,
where is the frequency dependent gain and is the frequency dependent noise of the output process . If the intrinsic noise of the network is not correlated to the process that drives the signal, the power spectrum of the network output obeys the spectral addition rule [35]. In this case the frequency dependent gain and noise can be identified directly from the power spectrum of the output, because it takes the following form:
For a species obeying Equation 5.31, we have
The Wiener Khinchin theorem states that the power spectrum of a stochastic process and its auto-correlation function are a Fourier transform pair. We thus obtain for the variance in the readout, substituting the frequency dependent gain and noise [Equation 5.35] and the power spectrum of the signal [Equation 5.24] in Equation 5.34 and taking the inverse Fourier transform at ,
where the signal variance equals its mean , and the mean readout concentration sets the intrinsic noise . We further have the static gain given by Equation 5.28, and have defined the dynamical gain
which is the slope of the mapping from the time-varying signal value to the time-varying readout ; for Gaussian systems [2,19,29].
To solve the integral in Equation 5.33 we exploit that
Substituting the frequency dependent gain and noise given in Equation 5.35 and the signal power spectrum of Equation 5.24 in Equation 5.33 and using Equation 5.38 we obtain the information rate,
where we used the noise strengths given in Equation 5.23 and Equation 5.32, we have the static gain of Equation 5.28, and the synthesis rate of Equation 5.29.
5.5.1.4 Linear network
To disambiguate differences in the information rate caused by the linear approximation of our nonlinear reaction network on one hand and the Gaussian approximation of the underlying jump process on the other, we consider the information rate of a linear network. Any difference between the exact information rate and the Gaussian information rate must then be a result of the Gaussian approximation. To this end we use the same input signal [Equation 5.22], but we consider a linear activation of the readout, i.e.
such that the Langevin dynamics of are
which yields the steady state concentration . For this linear readout, the static gain is simply set by the ratio of steady states of the input and the output, . We can then obtain the information rate of this linear system by substitution of its static gain in Equation 5.39, which yields
5.5.2 Relative deviation of the Gaussian approximation for a nonlinear system
Due to the definition of the mutual information, an absolute difference in information maps to a relative difference in the reduction of uncertainty. For this reason, Figure 5.4 in the main text (Section 5.3.2) focuses on the absolute deviation between the Gaussian approximation and the true mutual information. We compared two variants of the Gaussian approximation, the LNA-based approximation and the empirical Gaussian approximation. We found that in both cases the absolute deviation decreases with slower response timescales, reflecting the more linear input-output relationship in slow-responding systems.
However, the relative deviation of the Gaussian information rate from the true rate also offers valuable insights, which we explore here. In Figure 5.5 we compare the relative deviation between the Gaussian approximation and the exact mutual information computed using PWS. We find that the relative deviation increases as the system gain increases, indicating that the Gaussian approximation also becomes relatively less accurate for larger gains. As already discussed above, the empirical Gaussian method consistently underestimates the true information rate, while the LNA-based approximation overestimates it.
Interestingly, we also observe that for the LNA approximation, at fast timescales the result is slightly more accurate, whereas the empirical Gaussian estimate is more accurate at slow timescales. We initially expected that in both cases slow timescales would yield better agreement with PWS, as the input-output dynamics are more linear for slow timescales, and thus better approximated by the Gaussian model. The fact that this is not the case for the LNA approximation is intriguing, indicating the need for further investigation into the interplay between timescales, system nonlinearity, and the LNA.
References
In still unpublished work together with Anne-Lena Moor and Christoph Zechner [28], we found that the root cause for the deviations of the Gaussian approximation lies in how the LNA approximates the reaction noise in the chemical master equation. While the dynamics of the chemical master equation give rise to discrete sample paths, i.e., piece-wise constant trajectories connected by instantaneous discontinuous jumps, the LNA approximation yields continuous stochastic trajectories. Our results imply that a discrete sample path of carries more information about than the corresponding continuous sample path would carry about in the LNA. In the collaborative effort [28] we found that this is ultimately due to the fact that in the discrete system, each reaction event is unambiguously recorded in the trajectories and thus different reactions modifying the same species can be distinguished. In contrast, in the continuous LNA description, all reactions that modify contribute to the noise term in ?eq-output-dynamics but their contributions are lumped together and therefore cannot be distinguished from an observed -trajectory. Specifically, note that for the motif studied here, only the production reaction conveys information. The decay reaction of the output does not carry information on the input fluctuations since its propensity is independent of the input. Yet, it contributes to the overall fluctuations in the output. The Gaussian approximation only considers the total fluctuations in the output, while the discrete approximation correctly distinguishes between the fluctuations induced from production events and decay events. Therefore, the Gaussian approximation consistently underestimates the true information transmission, whereas the discrete approximation does not incur this systematic error. This subtle point is reflected in the difference between Equation 5.14, Equation 5.15.↩︎
Note that this argument does not apply to the Linear Noise Approximation (LNA). The bound specifically requires the Gaussian model to use the covariance of the full, original system. When the system is first linearized using the LNA, the resulting linear model does not retain the same covariance as the original nonlinear system. As a result, the mutual information rate calculated with the LNA is generally not a lower (nor an upper) bound on the true mutual information rate.↩︎