2 Path Weight Sampling
The contents of this chapter have been published in Phys. Rev. X 13, 041017 (2023) [1].
Most natural and engineered information-processing systems transmit information via signals that vary in time. Computing the information transmission rate or the information encoded in the temporal characteristics of these signals requires the mutual information between the input and output signals as a function of time, i.e., between the input and output trajectories. Yet, this is notoriously difficult because of the high-dimensional nature of the trajectory space, and all existing techniques require approximations. We present an exact Monte Carlo technique called Path Weight Sampling (PWS) that, for the first time, makes it possible to compute the mutual information between input and output trajectories for any stochastic system that is described by a master equation. The principal idea is to use the master equation to evaluate the exact conditional probability of an individual output trajectory for a given input trajectory and average this via Monte Carlo sampling in trajectory space to obtain the mutual information. PWS also makes it possible to compute the mutual information between input and output trajectories for systems with hidden internal states as well as systems with feedback from output to input.
2.1 Introduction
Quantifying information transmission is vital for understanding and designing natural and engineered information-processing systems, ranging from biochemical and neural networks, to electronic circuits and optical systems [2–4]. Claude Shannon introduced the mutual information and the information rate as the central measures of Information Theory more than 70 years ago [5]. These measures quantify the fidelity by which a noisy system transmits information from its inputs to its outputs. Yet, computing these quantities exactly remains notoriously difficult, if not impossible. This is because the inputs and outputs are often not scalar values, but rather temporal trajectories.
Most, if not all, information-processing systems transmit signal that vary in time. The canonical measure for quantifying information transmission via time-varying signals is the mutual information rate [5–8]. It quantifies the speed at which distinct messages are transmitted through the system, and it depends not only on the accuracy of the input-output mapping but also on the correlations within the input and output signals. Computing the mutual information rate thus requires computing the mutual information between the input and output trajectories, not between their signal values at given time points. The rate at which this trajectory mutual information increases with the trajectory duration in the long-time limit defines the mutual information rate, see Equation 1.3. In the absence of feedback this rate also equals the multi-step transfer entropy [9,10].
More generally, useful information is often contained in the temporal dynamics of the signal. A prime example is bacterial chemotaxis, where the response does not depend on the current ligand concentration, but rather on whether it has changed in the recent past [11,12]. Moreover, the information from the input may be encoded in the temporal dynamics of the output [13–16]. Quantifying information encoded in these temporal features of the signals requires the mutual information not between two time points, i.e. the instantaneous mutual information, but rather between input and output trajectories [7].
Unfortunately, computing the mutual information between trajectories is exceptionally difficult. The conventional approach requires non-parametric distribution estimates of the input and output distributions, e.g. via histograms of data obtained through simulations or experiments [17–22]. These non-parametric distribution estimates are necessary because the mutual information cannot generally be computed from summary statistics like the mean or variance of the data alone. However, the high-dimensional nature of trajectories makes it infeasible to obtain enough empirical data to accurately estimate the required probability distributions. Moreover, this approach requires the discretization of time, which becomes problematic when the information is encoded in the precise timing of signal spikes, as, e.g., in neuronal systems [23]. Except for the simplest systems with a binary state space [22], the conventional approach to estimate the mutual information via histograms therefore cannot be transposed to trajectories.
Because there are currently no general schemes available to compute the mutual information between trajectories exactly, approximate methods or simplified models are typically used. While empirical distribution estimates can be avoided by employing the K-nearest-neighbors entropy estimator [24,25], this method depends on a choice of metric in trajectory space and can become unreliable for long trajectories [26]. Alternative, decoding-based information estimates can be developed for trajectories [27], but merely provide a lower bound of the mutual information, and it remains unclear how tight these lower bounds are [26,28,29]. Analytical results are avaiable for simple systems [30], and for linear systems that obey Gaussian statistics, the mutual information between trajectories can be obtained from the covariance matrix [7]. However, many information processing systems are complex and non-linear such that the Gaussian approximation does not hold, and analytical solutions do not exist. A more promising approach to estimate the trajectory mutual information for chemical reaction networks has been developed by Duso and Zechner [31] and generalized in Ref. [32]. However, the scheme relies on a moment closure approximation and has so far only been applied to very simple networks, seemingly being difficult to extend to complex systems.
Here, we present Path Weight Sampling (PWS), an exact technique to compute the trajectory mutual information for any system described by a master equation. Master equations are widely used to model chemical reaction networks [33–36], biological population growth [37–39], economic processes [40,41], and a large variety of other systems [42,43], making our scheme of interest to a broad class of problems.
PWS is an exact Monte Carlo scheme, in the sense that it provides an unbiased statistical estimate of the trajectory mutual information. In PWS, the mutual information is computed as the difference between the marginal output entropy associated with the marginal distribution of the output trajectories , and the conditional output entropy associated with the output distribution conditioned on the input trajectory . Our scheme is inspired by the observation of Cepeda-Humerez et al. [26] that the path likelihood, i.e., the probability , can be computed exactly from the master equation for a static input signal . This makes it possible to compute the mutual information between a discrete input and a time-varying output via a Monte Carlo averaging procedure of the likelihoods, rather than from an empirical estimate of the intractable high-dimensional probability distribution functions. The scheme of Cepeda-Humerez et al. [26] is however limited to discrete input signals that do not vary in time. Here we show that the path likelihood can also be computed for a dynamical input trajectory , which allows us to compute the conditional output entropy also for time-varying inputs. While this solves the problem in part, the marginal output entropy associated with cannot be computed with the approach of Cepeda-Humerez et al. [26], and thus requires a different scheme.
In Section 2.2 we show how, for time-varying input signals, the marginal probability can be obtained as a Monte Carlo average of over a large number of input trajectories. We then use the Monte Carlo estimate for to compute the marginal output entropy.
In Section 2.3 we show that, surprisingly, our PWS methods additionally make it possible to compute the mutual information between input and output trajectories of systems with hidden internal states. Hidden states correspond, for example, to network components that merely relay, process or transform the signal from the input to the output. Indeed, the downstream system typically responds to the information that is encoded in this output, and not the other internal system components. Most information processing systems contain such hidden states, and generally we want to integrate out these latent network components. In addition, we can generalize PWS to systems with feedback from the output to the input as shown in Section 2.4.
2.2 Monte Carlo Estimate of the Mutual Information
In this section we present the fundamental ideas of PWS. These ideas lie at the heart of Direct PWS (DPWS) and also form the foundation of the other two more advanced PWS variants which will be explained in Chapter 3.
2.2.1 Statement of the Problem
All information processing systems repeatedly take an input value and produce a corresponding output . Due to noise, the output produced for the same input can be different every time, such that the system samples outputs from the distribution . In the information theoretic sense, the device’s capabilities are fully specified by its output distributions for all possible inputs. We consider the inputs as being distributed according to a probability density such that the whole setup of signal and device is completely described by the joint probability density .
When the conditional output distributions overlap with each other, information is lost because the input can not always be inferred uniquely from the output (see Figure 2.1). The remaining information that the output carries about the signal on average is quantified by the mutual information between input and output.
Mathematically, the mutual information between a random variable , representing the input, and a second random variable , representing the output, is defined as
where the marginal output distribution is given by . The quantity as defined above is a non-negative real number, representing the mutual information between and in nats. The integrals in Equation 2.1 run over all possible realizations of the random variables and . In our case, and represent stochastic trajectories and so the integrals become path integrals.
In general, the mutual information can be decomposed into two terms, a conditional and marginal entropy. Due to the symmetry of Equation 2.1 with respect to exchange of and , this decomposition can be written as
The (marginal) input entropy represents the total uncertainty about the input, and the conditional input entropy describes the remaining uncertainty of the input after having observed the output. Thus, the mutual information naturally quantifies the reduction in uncertainty about the input through the observation of the output.
When analyzing data from experiments or simulations however, the mutual information is generally estimated via . This is because simulation or experimental data generally provide information about the distribution of outputs for a given input, rather than vice versa. The accessible entropies are thus the marginal output entropy and the conditional output entropy , which are defined as
The conventional way of computing the mutual information involves generating many samples to obtain empirical distribution estimates for and via histograms. However, the number of samples needs to be substantially larger than the number of histogram bins to reduce the noise in the bin counts. Obtaining enough samples is effectively impossible for high-dimensional data, like signal trajectories. Moreover, any nonzero bin size leads to a systematic bias in the entropy estimates, even in one dimension [18]. These limitations of the conventional method make it impractical for high-dimensional data, highlighting the need for alternative approaches to accurately compute mutual information for trajectories.
2.2.2 Direct PWS
The central idea of PWS is to compute probability densities for trajectories exactly, sidestepping the problem having to estimate them via histograms. We exploit that for systems described by a master equation, the conditional probability of an output trajectory for a given input trajectory can be computed analytically. With this insight we can derive a procedure to compute the mutual information. Specifically, we will show that
for a system described by a master equation, the trajectory likelihood is a quantity that can be computed on the fly in a stochastic simulation;
input trajectories can be generated from , output trajectories for a given input can be generated according to using standard SSA (Gillespie) simulations;
by combining the two ideas above, we can derive a direct Monte Carlo estimate for the mutual information , as illustrated in Figure 2.2.
Note that we denote trajectories by bold symbols to distinguish them from scalar quantities.
Our technique is conceptually straightforward. Using Monte Carlo simulations we can compute averages over the configuration space of trajectories. Suppose we have a function that takes a trajectory and produces a scalar value. The mean of with respect to the trajectory distribution is then
We write to denote a path integral over all possible trajectories of a given duration. We estimate , by generating a large number of trajectories from and evaluating the corresponding Monte Carlo average
which converges to the true mean in the limit .
Specifically, we want to estimate the conditional and the marginal entropy to compute the mutual information. Let us imagine that we generate input trajectories from the distribution . Next, for every input , we generate a set of outputs from . Then, the Monte Carlo estimate for the conditional entropy is
Secondly, for a given output we generate inputs according to , then we can obtain a Monte Carlo estimate for the marginal probability of the output trajectory :
The estimate for the marginal entropy is then given by
In the last step we inserted the result from Equation 2.7. In this estimate, the trajectories are sampled from , i.e., by first sampling from and then from . Finally, the mutual information is obtained by taking the entropy difference, i.e., .
While this is the main idea behind PWS, it is computationally advantageous to change the order of operations in the estimate. Specifically, computing the difference of two averages, leads to large statistical errors. We can obtain an improved estimate by reformulating the mutual information as a single average of differences:
This equation applies to all variants of PWS. They differ, however, in the way is computed. In the brute-force version of PWS, called Direct PWS (DPWS), we use Equation 2.7 to evaluate the marginal probability . DPWS indeed involves two nested Monte Carlo computations, in which pairs are generated, and for each output , input trajectories are generated from scratch to compute . In Chapter 3, we present two additional variants of PWS where the brute-force estimate of the marginal probability is replaced by more elaborate schemes. That said, DPWS is a conceptually simple, straightforward to implement, and exact scheme to compute the mutual information.
Having explained the core ideas of our technique above, we will continue this section with a review of the necessary concepts of master equations to implement PWS. First, in Section 2.2.3, we derive the formula for the conditional probability which lies at the heart of our technique. In Section 2.2.3, Section 2.2.4, we discuss how trajectories are generated according to and , which are the remaining ingredients required for using DPWS. Then, in Chapter 3, we will present the two other variants of PWS that improve on DPWS.
2.2.3 Driven Markov Jump Process
In this chapter, we consider systems that can be modeled by a master equation and are being driven by a stochastic signal. The master equation specifies the time evolution of the conditional probability distribution which is the probability for the process to reach the discrete state at time , given that it was at state at the previous time . The state space is multi-dimensional if the system is made up of multiple components and therefore and can be vectors rather than scalar values. Denoting the transition rate at time from state to another state by , the master equation reads
where, for brevity, we suppress the dependence on the initial condition, i.e., . By defining for and
the master equation simplifies to
Note that by definition the diagonal matrix element is the negative exit rate from state , i.e. the total rate at which probability flows away from state .
Using the master equation we can compute the probability of any trajectory. A trajectory is defined by a list of jump times , together with a sequence of system states . The trajectory starts at time in state and ends at time in state , such that its duration is . At each time (for ) the trajectory describes an instantaneous jump . The probability density of is
a product of the probability of the initial state , the rates of the transitions , and the survival probabilities for the waiting times between jumps, given by for .
2.2.3.1 Computing the Likelihood
To compute the likelihood or conditional probability of an output trajectory for a given input trajectory , we note that the input determines the time-dependent stochastic dynamics of the jump process. Indeed, the transition rates at time , given by , depend explicitly on the input at time and may even depend on the entire history of prior to .
In the common case that every input trajectory leads to a unique transition rate matrix , i.e. the map is injective, the likelihood is directly given by Equation 2.13:
where is the probability of the initial state of the output given the initial state of the input .
The evaluation of the trajectory likelihood is at the heart of our Monte Carlo scheme. However, numerically computing a large product like Equation 2.14 very quickly reaches the limits of floating point arithmetic since the result is often either too large or too close to zero to be representable as a floating point number. Thus, to avoid numerical issues, it is vital to perform the computations in log-space, i.e. to compute
where
The computation of the log-likelihood for given trajectories and according to Equation 2.15, Equation 2.16 proceeds as follows:
At the start of the trajectory we compute the log-probability of the initial condition ,
for every jump in (at time ) compute the log jump propensity , and
for every interval of constant output value between two jumps of , we compute . This integral can be performed using standard numerical methods such as the trapezoidal rule, which is also exact if is a piecewise linear function of .
The sum of the three contributions above yields the exact log-likelihood as given in Equation 2.15.
The algorithm to compute the log-likelihood is both efficient and straightforward to implement, being closely related to the standard Gillespie algorithm. The only term in Equation 2.15 that cannot be directly obtained from the master equation is . This quantity depends on the initial distribution of the system of interest.
Our scheme can be applied to any system with a well-defined (non-equilibrium) initial distribution as specified by, e.g. the experimental setup. Most commonly though, one is interested in studying information transmission for systems in steady state. Then, the initial condition is the stationary distribution of the Markov process. Depending on the complexity of the system, this distribution can be found either analytically from the master equation [44,45] (possibly using simplifying approximations [46,47]), or computationally from stochastic simulations [48].
2.2.3.2 Sampling from
Standard kinetic Monte Carlo simulations naturally produce exact samples of the probability distribution as defined in Equation 2.14. That is, for any signal trajectory and initial state drawn from we can use the Stochastic Simulation Algorithm (SSA) or variants thereof to generate a corresponding trajectory . The SSA propagates the initial condition forward in time according to the transition rate matrix . In the standard Direct SSA algorithm [48] this is done by alternatingly sampling the waiting time before the next transition, and then selecting the actual transition.
The transition rates of a driven master equation are necessarily time-dependent since they include the coupling of the jump process to the input trajectory , which itself varies in time. These time-varying transition rates need to be accounted for in the SSA. While most treatments of the SSA assume that the transition rates are constant in time, this restriction is easily lifted. Consider step of the Direct SSA which generates the next transition time . For time-varying transition rates the distribution of the stochastic waiting time is characterized by the survival function
The waiting time can be sampled using inverse transform sampling, i.e. by generating a uniformly distributed random number and computing the waiting time using the inverse survival function . Numerically, computing the inverse of the survival function requires solving the equation
for the waiting time . Depending on the complexity of , this equation can either be solved analytically or numerically, e.g. using Newton’s method. Hence, this method to generate stochastic trajectories is only truly exact if we can solve Equation 2.18 analytically. Additionally, in some cases more efficient variants of the SSA with time dependent rates could be used [49,50].
2.2.4 Input Statistics
For our mutual information estimate, we need to be able to draw samples from the input distribution . Our algorithm poses no restrictions on other than the possibility to generate sample trajectories.
For example, the input signal may be described by a continuous-time jump process. One benefit is that it is possible to generate exact realizations of such a process (using the SSA) and to exactly compute the likelihood using Equation 2.15. Specifically, the likelihood can be exactly evaluated because the transition rates for any input trajectory , while time-dependent, are piece-wise constant. This implies that the integral in Equation 2.15 can be evaluated analytically without approximations. Similarly, for piece-wise constant transition rates, the inverse function of Equation 2.18 can be evaluated directly such that we can sample exact trajectories from the driven jump process. As a result, when both input and output are described by a master equation, PWS is a completely exact Monte Carlo scheme to compute the mutual information.
However, the techniques described here do not require the input signal to be described by a continuous-time jump process, or even to be Markovian. The input signal can be any stochastic process for which trajectories can be generated numerically. This includes continuous stochastic processes that are found as solutions to stochastic differential equations [51].
2.3 Integrating Out Internal Components
So far the output trajectory has been considered to correspond to the trajectory of the system in the full state space . Concomitantly, the method presented is a scheme for computing the mutual information between the input signal and the trajectory , comprising the time evolution of all the components in the system, . Each component itself has a corresponding trajectory , such that the full trajectory can be represented as a vector . It is indeed also the conditional probability and the marginal probability of this vector in the full state space that can be directly computed from the master equation. In fact, it is this vector, which captures the states of all the components in the system, that carries the most information on the input signal , and thus has the largest mutual information. However, typically the downstream system cannot read out the states of all the components . Often, the downstream system reads out only a few components or often even just one component, the “output component” . The other components then mainly serve to transmit the information from the input to this readout . From the perspective of the downstream system, the other components are hidden. The natural quantity to measure the precision of information processing is then the mutual information between the input and the output component’s trajectory , not . The question then becomes how to compute and , from which can be obtained. Here, we present a scheme to achieve this.
As an example, consider a chemical reaction network with species . Without loss of generality, we will assume that the -th component is the output component, . The other species are thus not part of the output, but only relay information from the input signal to the output signal . To determine the mutual information we need , where is the trajectory of only the readout component . However, from the master equation we can only obtain an expression for the full conditional probability of all components. To compute the value of , we must perform the marginalization integral
We can compute this integral using a Monte Carlo scheme as described below and use the resulting estimate for to compute the mutual information using our technique presented in Section 2.2.2.
The marginalization of Equation 2.19 entails integrating out degrees of freedom from a known joint probability distribution. In Equation 2.7 we solved the analogous problem of obtaining the marginal probability by integrating out the input trajectories through the integral . As described in Section 2.2.2, the integral from Equation 2.7 can be computed via a Monte Carlo estimate by sampling many input trajectories from and taking the average of the corresponding conditional probabilities . We will show that in the case where there is no feedback from the readout component back to the other components, a completely analogous Monte Carlo estimate can be derived for Equation 2.19.
More specifically, we can evaluate Equation 2.19 via a direct Monte Carlo estimate under the condition that the stochastic dynamics of the other components are not influenced by (i.e., no feedback from the readout). Using the identity
to rewrite the integrand in Equation 2.19, we are able to represent the conditional probability as an average over the readout component’s trajectory probability
Thus, assuming that we can evaluate the conditional probability of the readout given all the other components, , we arrive at the estimate
where the samples for are drawn from . Notice that the derivation of this Monte Carlo estimate is fully analogous to the estimate in Equation 2.7, but instead of integrating out the input trajectory we integrate out the component trajectories .
To obtain in Equation 2.21, Equation 2.22, we note that, in absence of feedback, we can describe the stochastic dynamics of the readout component as a jump process with time-dependent transition rates whose time-dependence arises from the trajectories of the other components and the input input . In effect, this is a driven jump process for , driven by all upstream components and the input signal. Specifically, denoting as the joint trajectory representing the history of all upstream components as well as the input signal, we can, as explained in Section 2.2.3, write the time dependent transition rate matrix for the stochastic dynamics of and use Equation 2.14 to compute . Using Equation 2.22, this then allows us to compute .
Finally, to compute the mutual information , e.g. using the estimate in Equation 2.9, we additionally need to evaluate the marginal output probability . This requires us to perform one additional integration over the space of input trajectories :
The corresponding Monte Carlo estimate is
where the input trajectories follow and the intermediate components , for and , follow .
In summary, the scheme to obtain in the presence of hidden intermediate components is analogous to that used for computing from . In both cases, one needs to marginalize a distribution function by integrating out components. Indeed, the schemes presented here and in Section 2.2.2 are bona fide schemes to compute the mutual information between the input and either the trajectory of the output component or the full output . However, when the trajectories are sufficiently long or the stochastic dynamics are sufficiently complex, then the free-energy schemes of Chapter 3 may be necessary to enhance the efficiency of computing the marginalized distribution, or .
2.4 Dealing with Feedback
In principle all physical information processing systems exhibit feedback. The physical interaction needed to measure the input signal necessarily affects the incoming signal, and indeed, it follows that no information can be extracted from the input signal without any perturbation of the input dynamics. Often, it is assumed that the amplitude of such perturbations is comparatively small and thus that the feedback can safely be ignored.
Indeed, the PWS scheme was derived with the assumption of no feedback. In this section, we drop the assumption and will explicitly consider systems where the produced output perturbs the input, i.e. systems where the output feeds back onto the input. We will first discuss the additional problems that arise when computing the mutual information for a system with feedback, and subsequently present a modified version of PWS that can be used to compute the trajectory mutual information for these systems.
2.4.1 Computing the Mutual Information with Feedback between Input and Output
PWS requires the computation of the trajectory likelihood , a quantity that is not readily available for systems with feedback. Indeed, as already mentioned in Section 2.2.3.1, for a given input trajectory , the output dynamics are no longer described by a Markov process in a system with feedback, and therefore we cannot find an expression for based on the master equation. This implies that for systems with feedback, PWS schemes cannot be used without modification. While it is generally not possible to derive an expression for the conditional probability in systems with feedback, we often still can compute the joint probability density instead. Based on this quantity, we will present a modified PWS scheme to compute the mutual information for systems with feedback.
Specifically, since PWS is a model-based approach to compute the mutual information, when there is feedback from the output back to the input, we require a complete model of the combined system. Specifically, such a model must provide an expression for the joint probability , describing the input dynamics and the interaction between input and output, including the feedback.
An estimate of the mutual information that only relies on the computation of joint probability densities can be obtained by expressing the mutual information as
Thus, the PWS scheme with feedback consists of the computation of
which we want to estimate via a Monte Carlo average using samples from . We see that while we don’t need to evaluate the likelihood , we now need to explicitly compute the joint density , and two marginal densities, and , for each Monte Carlo sample . While the joint density can be evaluated directly by assumption, each of the marginalized densities can only be computed using a nested Monte Carlo estimate.
Specifically, for PWS with feedback, we need to compute two marginalization integrals per Monte Carlo sample:
and
However, these marginalization integrals cannot be directly computed with the techniques described so far. Therefore, in the following subsection, we discuss how to compute marginalization integrals for systems with feedback.
Additionally, as discussed in Section 2.3, we may also need to integrate out internal components of the master equation even when the output feeds back onto these internal components. The technique discussed below can also be used in this case as a way to compute the marginalization integral in Equation 2.19.
2.4.2 Marginalization Integrals for Systems with Feedback
Computing marginalization integrals in systems with feedback is harder than it is in the case without feedback. Specifically, we will show that it is not obvious how apply the Monte Carlo estimate from Equation 2.7 to systems with feedback. Nevertheless, if the system with feedback can be decomposed into a non-interacting part and an interacting part that includes the feedback, it is often still possible to compute marginalization integrals. Below, we sketch the steps that are necessary in order to compute marginalization integrals for systems with feedback using such a decomposition.
For concreteness, we discuss how to compute
as the prototype for a marginalization integral we want to compute. Unlike before, we now assume that feeds back onto . That means that we have access to the joint distribution’s density , but not to the marginal density or the conditional density .
Formulated in the language of statistical physics, marginalization is equivalent to the computation of the free-energy difference between two ensembles described by potentials and . Previously, for systems without feedback, we chose these potentials to be and with the idea that is the potential corresponding to the actual system and is the potential of a reference system with known free energy. Then, by computing the free-energy difference between the reference system and the actual system, we could compute the marginal probability .
However, in systems with feedback we face a problem. Note that the actual system is still described by the potential , even with feedback. Yet, for the reference system described by we cannot make the same choice as before, because the previous choice involved the marginal probability which is not available with feedback.
Instead, we have to find an alternative expression for . To construct a suitable reference potential, we can use a decomposition of the full potential into three parts
where describes the features of the system that induce interaction, or correlation, between and . The first two terms of the potential above, , therefore describe a non-interacting version of the system, where the input and output are fully independent of each other. We want to use the potential of that non-interacting version as our expression for , i.e. . To be able to do so, we require that the partition function (normalization constant)
is known. In other words, we need to choose the decomposition in Equation 2.30 such that the partition function Equation 2.31 is known either analytically or numerically. If such a decomposition is found, we can compute the marginal probability from the difference in free energy between and :
where is known. Because we have a known expression for , the free-energy difference can now be computed using any of the techniques described in Section 3.2.
As an example for finding a decomposition like Equation 2.30, let us consider the case where the joint system of input and output is described by a single master equation, i.e. we have a master equation with two components, which represents the input, and which represents the output. In such a system, information is transmitted if there exist transitions that change the copy number of with a rate that depends on the copy number of . In terms of chemical reactions, is an example for such a transition. In turn, this system exhibits feedback if at least one of the transitions that change the copy number of has a rate that depends on , as for example with the reaction . Note that with such reactions, the dynamics of depend on the current copy number of , and therefore we cannot evolve trajectories independently of trajectories, a consequence of feedback. Both of the reactions and introduce correlations between the and trajectories.
In a non-interacting system, such interactions between the input and output must be absent. Thus, a non-interacting version of the reaction system contains no single reaction that involves both and . We will now describe how we can use that non-interacting version of the reaction system, to obtain the reference potential . Since the input and output trajectories are completely independent in the non-interacting system, we can express the joint distribution’s probability density as the product of the individual component’s trajectory densities, . Note that and should not be confused with the marginal probabilities and of the interacting version of the reaction system, which must be computed using a marginalization integral. Since in the non-interacting version both and obey independent dynamics which are characterized by individual master equations, both and can be individually computed using Equation 2.13. Thus, in this case, the non-interacting potential is and, since the probability densities and are normalized, the corresponding partition function is . Hence, for this reaction system, we can straightforwardly define a non-interacting version that can be used to obtain the reference potential . Using the techniques described in Section 3.2, we can then compute the free-energy difference between and , where the latter potential describes the dynamics of the fully interacting system. Specifically, we can compute the marginal probabilities , pertaining to the interacting system which are required for the mutual information estimate in Equation 2.26.
In summary, for systems with feedback, we can compute marginalization integrals by specifying a reference potential by finding a non-interacting version of the system. However, barring a decomposition into interacting and non-interacting potentials, there is generally no unambiguous choice of the reference potential to compute the marginal probability . In fact, the optimal reference potential is likely to be system-specific. Still, if a suitable expression for can be found, we can make use of the techniques developed in Section 3.2 to compute marginal probability .
2.5 Discussion
In this chapter, we have described a general, practical, and flexible method that makes it possible to compute the mutual information between trajectories exactly. PWS is a Monte Carlo scheme based on the exact computation of trajectory probabilities. We showed how to compute exact trajectory probabilities from the master equation and thus how to use PWS for any system described by a master equation. Since the master equation is employed in many fields and in particular provides an exact stochastic model for well-mixed chemical reaction dynamics, PWS is very broadly applicable.
However, it must be noted that PWS cannot be used to directly obtain the mutual information between trajectories from experimental data, in contrast to model-free (yet approximate) methods such as K-nearest-neighbors estimators [24,25], decoding-based information estimates [27], or schemes that compute the mutual information from the data within the Gaussian framework [52]. PWS requires a (generative) model based on a master equation or Langevin equation. Yet, in Chapter 6, we will show how PWS can be combined with machine learning to obtain the rate directly from time-series data.
We have applied PWS to compute the mutual information rate in steady state, but PWS can be used equally well to study systems out of steady state. For such systems a (non-)equilibrium initial condition must be specified in addition to the stochastic dynamics of input trajectories . These distributions are defined by the (experimental) setup and lead to a well-defined output distribution when the system is coupled to the input. Thus, a steady state is no prerequisite for the application of PWS to study the trajectory mutual information.
Overall, PWS is a general framework for computing the mutual information between trajectories. Because of its flexibility and simplicity, we envision that it will become an important and reliable tool for studying information transmission in dynamic stochastic systems.