1 Introduction

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.

We live in the era of information. Information technology permeates every aspect of modern life, shaping how we communicate, learn, have social interactions, and spend our leisure time. Beyond daily life, information plays a crucial role in fields like physics, biology, neuroscience, and engineering, where it is used to study and enhance the function of complex systems and machines. Quantifying the flow of information within these domains is essential, and, although the concept of information is abstract, its power in explaining the processes that shape our world is profound.

The explanatory power of information stems from the intrinsic link between information and performance. Without a potential reward, or the possibility of avoiding harm, information has no value.¹ As a result, information collection and processing typically serves a clear purpose. For example, a self-driving car processes information from its sensors in order to make decisions about navigation [3]. Similarly, bacteria acquire chemical information about their environment in order to optimize their movement toward nutrients and away from toxins, maximizing their chance of survival [4]. More generally, in evolutionary biology the link between genetic information and fitness is explored [5,6]. Thus, whether in biological organisms or engineered systems, understanding how information is used is essential for optimizing performance.

Quantifying information transmission is vital for understanding and improving natural or engineered information-processing systems. Shannon’s information theory [7] provides the framework for studying the efficiency and reliability of any communication channel, whether it’s a telephone line, a biochemical signaling cascade, or a neural pathway in the brain. The cornerstone of information theory is a set of mathematical definitions to rigorously quantify amounts of information. These makes it possible to determine, in absolute terms, the amount of information that is transmitted by a given information-processing mechanism, for a specific input signal. Moreover, it is possible to quantify the maximum amount of information that can be transmitted through a given mechanism under optimal conditions: this limit is known as the channel capacity, measured in bits per time unit. Shannon’s information measures enable us to characterize a wide range of systems in terms of their information transmission capabilities.

Information theory has found many applications across disciplines, and is frequently used to understand and improve sensory or computational systems. In biology, information transmission is studied, e.g., in the brain, by analyzing the timing of electrical impulses between neurons [8,9]. Within cells, information flow in biochemical signaling and transcription regulation has been extensively studied by analyzing biochemical pathways [10–12]. In artificial intelligence, information theory has proven useful in improving learning in neural networks. The information bottleneck theory [13] suggests that the performance of neural networks can be enhanced by balancing compression and information retention during training [14,15]. In economics and finance, information theory has been applied to describe financial markets [16] and to optimize financial decision-making under uncertainty [17]. In optics, information theory is employed to study the efficiency of signal processing in optical resonators, with applications in precision sensing and optical computing [18,19]. Information theory boasts a wealth of applications and is essential for the analysis and theoretical understanding of information-processing systems.

The canonical measure for the quality of information transmission is the mutual information. It quantifies how much information is shared between two random variables, such as the input and output signals of an information-processing mechanism, see Figure 1.1. Let $S$ and $X$ be two random variables that are jointly distributed according to the density $\mathrm{P}(s, x)$ and with marginal densities $\mathrm{P}(s)$ and $\mathrm{P}(x)$ . The mutual information between $S$ and $X$ is then defined as

$I(S, X) = \iint \mathrm{P}(s, x) \ln\left( \frac{\mathrm{P}(s, x)}{\mathrm{P}(s)\mathrm{P}(x)} \right)ds\,dx \qquad(1.1)$

and provides a measure of correlation between the random variables.² From the definition it follows that $I(S, X)=0$ only if $S$ and $X$ are statistically independent, and $I(S, X)>0$ otherwise. Thus, the mutual information quantifies the statistical dependence between random variables, equally characterizing the degrees of influence from $S \to X$ and from $X \to S$ . Hence, the mutual information is a symmetric measure, satisfying $I(S,X)=I(X,S)$ . In a typical information processing system, the input $S$ influences the output $X$ but there is no feedback from $X$ to $S$ . In such cases, the mutual information $I(S, X)$ provides a measure for how effectively information about $S$ is transmitted through the system into the output $X$ .

Figure 1.1: A generic information processing device takes an input signal

s

and produces an output signal

x

. Because the output is correlated with the input, we can quantify the information that

s

and

x

share. This quantity is called the mutual information and measures the information that is transmitted.

In biological systems, information transmission has frequently been quantified via the instantaneous mutual information (IMI) $I(S_{t_1}, X_{t_2})$ , i.e. the mutual information between stimulus and response at two time points. This measure has been applied for analyzing biochemical pathways [12,22,25–29] and neural spiking dynamics [8,30]. However, in many cases, the IMI cannot correctly quantify information transmission due to correlations within the input or the output which reduce the total information transmitted. More generally, information may be encoded in the temporal patterns of signals, which cannot be captured by a pointwise information measure like the IMI. Thus, the IMI is generally inadequate for computing information transmission in systems which process dynamical signals.

There are many examples of information being encoded in dynamical features of signals. In cellular Ca²⁺ signaling, information seems to be encoded in the timing and duration of calcium bursts [31], while in the MAPK pathway information is encoded in the amplitude and duration of the transient phosphorylation response to external stimuli [32,33]. Moreover, there are reasons to believe that encoding information in dynamical signal features is advantageous for reliable information transmission [34]. Studying the information transmitted via temporal features is thus highly desirable but not possible with an instantaneous information measure. Therefore, in cases where the dynamics of input or output time-series may carry relevant information, the need for appropriate dynamical information measures has been widely recognized [4,33,35–42].

The natural measure for quantifying information transmission via dynamical signals is the trajectory mutual information. It takes into account the total information encoded in the input and output trajectories of a system, and therefore captures all information transmitted over a specific time interval. Conceptually, its definition is simple. The trajectory mutual information is the mutual information between the input and output trajectories of a stochastic process, given by

$I(\mathbfit{S}, \mathbfit{X}) = \iint \mathrm{P}(\mathbfit{s}, \mathbfit{x}) \ln\left( \frac{\mathrm{P}(\mathbfit{s}, \mathbfit{x})}{\mathrm{P}(\mathbfit{s})\mathrm{P}(\mathbfit{x})} \right)d\mathbfit{s}\,d\mathbfit{x} \qquad(1.2)$

where the bold symbols $\mathbfit{s}$ and $\mathbfit{x}$ are used to denote trajectories. These trajectories arise from a stochastic process that defines the joint probability distribution $\mathrm{P}(\mathbfit{s}, \mathbfit{x})$ . The integral itself runs over all possible input and output trajectories.

The closely related mutual information rate is defined as the rate at which the trajectory mutual information increases with the duration of the trajectories in the long-time limit. Let $\mathbfit{S}_T$ and $\mathbfit{X}_T$ be trajectories of duration $T$ , then the mutual information rate is given by

$R = \lim_{T\to\infty} \frac{I(\mathbfit{S}_T, \mathbfit{X}_T)}{T} \,. \label{eq-intro-info-rate} \qquad(1.3)$

The mutual information rate quantifies how many independent messages can be transmitted per unit time, on average, via a communication channel. It depends on both, the signal statistics of the input, as well as the transmission properties of the channel. In the absence of feedback it is equal to the transfer entropy [43,44].

The trajectory mutual information and the mutual information rate are fundamental measures for information transmission in dynamical systems. They serve as key performance metrics for biochemical signaling networks [12,36], as well as for neural sensory systems [8,30]. More generally, in communication channels with memory, the mutual information rate for the optimal input signal determines the channel capacity [20]. In financial markets, it quantifies correlations in stochastic time series, such as stock prices and trading volumes [16]. Finally, in non-equilibrium thermodynamics, the trajectory mutual information provides a link between information theory and stochastic thermodynamics [45,46]. Efficient methods for calculating the trajectory mutual information and the mutual information rate are needed and constitute the primary objective of this thesis.

Unfortunately, calculating the mutual information between trajectories is notoriously difficult due to the high dimensionality of trajectory space [47]. Conventional approaches for computing mutual information require non-parametric estimates of the input and output entropy, typically obtained via histograms or kernel density estimators [8,10,12,38,47,48]. However, the high-dimensional nature of trajectories makes it infeasible to obtain enough data for accurate non-parametric distribution estimates. Other non-parametric entropy estimators such as the k-nearest-neighbor estimator [44,49] depend on a choice of metric in trajectory space and become unreliable for long trajectories [50]. Thus, except for very simple systems [38], the curse of dimensionality makes it infeasible to obtain accurate results for the trajectory mutual information using conventional mutual information estimators.

Due to the inherent difficulty of directly estimating the mutual information between trajectories, previous research has often employed simplified models or approximations. In some cases, the problem can be simplified by considering static (scalar) inputs instead of input signal trajectories [34,39,50]. But this approach ignores the dynamics of the input signal. Lower bounds for the mutual information can be derived from the Donsker-Varadhan inequality [51–53], or obtained through general-purpose compression algorithms [50,54,55]. While exact analytical results for the trajectory mutual information are available for certain simple processes such as Gaussian [36] or Poisson channels [56,57], many complex, realistic systems lack analytical solutions, and approximations have to be employed. For systems governed by a master equation, numerical or analytical approximations are sometimes feasible [58,59] but these become intractable for complex systems. Finally, the Gaussian framework for approximating the mutual information rate is particularly widely used [4,36,40], though it assumes linear system dynamics and Gaussian noise statistics. These assumptions make it ill-suited for many realistic nonlinear information-processing systems.

To address the limitations of previous methods, we introduce Path Weight Sampling (PWS), a novel Monte Carlo technique for computing the trajectory mutual information efficiently and accurately. PWS leverages free-energy estimators from statistical physics and combines analytical and numerical methods to circumvent the curse of dimensionality associated with long trajectories. The approach relies on exact calculations of trajectory likelihoods derived analytically from a stochastic model. By averaging these likelihoods in a Monte Carlo fashion, PWS can accurately compute the trajectory mutual information, even in high-dimensional settings.

PWS is an exact Monte Carlo scheme, in the sense that it provides an unbiased statistical estimate of the trajectory mutual information. In PWS, the mutual information is computed via the identity

$I(\mathbfit{S}, \mathbfit{X}) = H(\mathbfit{X}) - H(\mathbfit{X} \mid \mathbfit{S}) \qquad(1.4)$

as the difference between the marginal output entropy $H(\mathbfit{X})$ associated with the marginal distribution $\mathrm{P}(\mathbfit{x})$ of the output trajectories $\mathbfit{x}$ and the conditional output entropy $H(\mathbfit{X} \mid \mathbfit{S})$ associated with $\mathrm{P}(\mathbfit{x}|\mathbfit{s})$ , the conditional output distribution for a given input $\mathbfit{s}$ . Both entropies are evaluated as Monte-Carlo averages over the associated distribution, i.e., $H(\mathbfit{X}) = -\langle \ln \mathrm{P}(\mathbfit{x}) \rangle$ and $H(\mathbfit{X} \mid \mathbfit{S}) = -\langle \ln \mathrm{P}(\mathbfit{x}|\mathbfit{s}) \rangle$ , where the notation $\langle\cdot\rangle$ denotes an average with respect to the joint distribution $\mathrm{P}(\mathbfit{s}, \mathbfit{x})$ . The key insights of PWS are that the conditional probability $\mathrm{P}(\mathbfit{x}|\mathbfit{s})$ can be directly evaluated from a generative model of the system, and that the marginal probability $\mathrm{P}(\mathbfit{x})$ can be computed efficiently via marginalization using Monte Carlo procedures inspired by computational statistical physics.

The crux of PWS lies in the efficient computation of $\mathrm{P}(\mathbfit{x})$ via the marginalization integral

$\mathrm{P}(\mathbfit{x}) = \int \mathrm{P}(\mathbfit{x}|\mathbfit{s}) \mathrm{P}(\mathbfit{s})\,d\mathbfit{s}\,. \label{eq-intro-marginaliztion} \qquad(1.5)$

To evaluate this integral efficiently, we present different variants of PWS. In Chapter 2 we introduce Direct PWS, the simplest variant of PWS, where Equation 1.5 is computed bia a “brute-force” Monte Carlo approach that works well for short trajectories, but which becomes exponentially harder for long trajectories. In Chapter 3, we present two additional variants of PWS that evaluate the marginalization integral more efficiently, RR-PWS and TI-PWS. Rosenbluth-Rosenbluth PWS (RR-PWS) is based on efficient free-energy estimation techniques developed in polymer physics [60–63]. Thermodynamic integration PWS (TI-PWS) uses techniques from transition path sampling to derive a MCMC sampler in trajectory space [64]. From this MCMC chain, we can compute the marginalization integral using thermodynamic integration [63,65,66]. Finally, in Chapter 6, we introduce a fourth marginalization technique based on variational inference via neural networks [67]. Its conceptual simplicity, coupled with powerful marginalization methods, make PWS a versatile framework for computing the trajectory mutual information in a variety of scenarios.

Yet, to compute the mutual information PWS requires evaluating the conditional trajectory probability $\mathrm{P}(\mathbfit{x}| \mathbfit{s})$ , which in turn requires a stochastic model defining a probability measure over trajectories. While (stochastic) mechanistic models of experimental systems are increasingly becoming available, the question remains whether PWS can be applied directly to experimental data when no such model is available. In Chapter 6, we show that machine learning can be used to construct a data-driven stochastic model that captures the trajectory statistics, i.e. $\mathrm{P}(\mathbfit{x}| \mathbfit{s})$ , enabling the application of PWS to experimental data.

We demonstrate the practical utility of PWS by calculating the trajectory mutual information for a range of systems. In Chapter 3, Chapter 5, we study a minimal model for gene expression, showing that PWS can estimate the mutual information rate for this system more accurately than any previous technique. Using PWS, we reveal that the Gaussian approximation, though expected to hold due to the system’s linearity, does not provide an accurate estimate in this case. In Chapter 5, Chapter 6 we extend our analysis to simple nonlinear models for information transmission, comparing PWS results against the Gaussian approximation; for these models, PWS is the first technique capable of accurately computing trajectory mutual information. Moreover, in Chapter 4 we apply PWS to a complex stochastic model of bacterial chemotaxis, marking the first instance where the information rate for a system of this complexity can be computed exactly. Together, these examples demonstrate that an exact technique like PWS is indispensable for understanding information transmission in realistic scenarios.

1.1 Contributions of This Work

The main contributions of this thesis are as follows:

PWS: A novel framework for computing the trajectory mutual information: We introduce Path Weight Sampling, a computational framework for calculating the trajectory mutual information in dynamical stochastic systems. This framework is exact, applicable to both continuous and discrete time processes, and does not rely on any assumptions about the system’s dynamics. PWS and its main variants are described in Chapter 2, Chapter 3.
Discovery of discrepancies between experiments and mathematical models of chemotaxis: We apply PWS to various systems, including the complex bacterial chemotaxis signaling network. By studying the information transmission rate of chemotaxis and comparing our results against those of Mattingly et al. [4], we find that the widely-used MWC model of chemotaxis cannot explain the experimental data. We find that the number of receptor clusters is smaller and that the size of these clusters is larger than hitherto believed. We describe and characterize this finding in Chapter 4.
Study of the accuracy of the gaussian approximation for the information rate: In Chapter 5, we use PWS to quantitatively study the accuracy of the widely-used Gaussian approximation. Before PWS, no exact technique was available to obtain ground truth results of the mutual information rate for non-linear systems, and the accuracy of the Gaussian framework could not be evaluated. We reveal that the Gaussian model can be surprisingly inaccurate, even for linear reaction systems.
Neural networks for learning the stochastic dynamics from time-series data: In Chapter 6, we demonstrate that recent machine learning techniques can be employed to automatically learn the stochastic dynamics from experimental data. We show that by combining these learned models with PWS, it becomes possible to compute the trajectory mutual information directly from time-series data. This approach outperforms previous techniques, like the Gaussian approximation, for estimating information rates from data.

1.2 Thesis Outline

The remainder of this thesis is divided into 5 chapters. We first present three variants of PWS, all of which compute the conditional entropy in the same manner, but differ in the way this Monte Carlo averaging procedure for computing the marginal probability $\mathcal{P}[\mathbfit{x}]$ is carried out. Chapter 2, Chapter 3, Chapter 4 of this thesis have been published previously in Physical Review X.³

In Chapter 2 we present the simplest PWS variant, Direct PWS (DPWS). To compute $\mathcal{P}[\mathbfit{x}]$ , DPWS performs a brute-force average of the path likelihoods $\mathcal{P}[\mathbfit{x}|\mathbfit{s}]$ over the input trajectories $\mathbfit{s}$ . While we show that this scheme works for simple systems, the brute-force Monte Carlo averaging procedure becomes more difficult for larger systems and exponentially harder for longer trajectories.

In Chapter 3, we present our second and third variant of PWS which are based on the realization that the marginal probability $\mathcal{P}[\mathbfit{x}]$ is akin to a partition function. These schemes leverage techniques for computing free energies from statistical physics. We also apply PWS to a simple model system which consists of a simple pair of coupled birth-death processes which allows us to compare the efficiency of the three PWS variants, as well as to compare the PWS results against analytical results from the Gaussian approximation [36].

In Chapter 4, we apply PWS to the bacterial chemotaxis system, which is arguably the best characterized signaling system in biology. Mattingly et al. [4] recently argued that bacterial chemotaxis in shallow gradients is information limited. Yet, to compute the information rate from their experimental data they had to employ a Gaussian framework. PWS makes it possible to asses the accuracy of this approximation.

Chapter 5 is devoted to studying the accuracy of the Gaussian approximation for non-Gaussian systems. By understanding the limitations and strengths of the Gaussian approximation, this chapter aims to provide deeper insights into selecting the appropriate method for MI estimation depending on the system.

Finally, Chapter 6 we introduce ML-PWS, which combines recent machine learning models with PWS, to compute the mutual information directly from data. This idea significantly extends the range of applications for PWS, since we no longer require a mechanistic model of the system. Instead, the stochastic model is automatically learned from the data.

References

[1]

J. V. Neumann and O. Morgenstern, Theory of games and economic behavior, 60th anniversary ed. (Princeton University Press, Princeton, N.J., 2004).

[2]

L. J. Savage, The Foundations of Statistics, 2nd ed. (Dover Publications, New York, 1972).

[3]

W. Schwarting, J. Alonso-Mora, and D. Rus, Planning and Decision-Making for Autonomous Vehicles, Annual Review of Control, Robotics, and Autonomous Systems 1, 1 (2018).

[4]

H. H. Mattingly, K. Kamino, B. B. Machta, and T. Emonet, Escherichia coli chemotaxis is information limited, Nature Physics 17, 1426 (2021).

[5]

C. Adami, The use of information theory in evolutionary biology, Annals of the New York Academy of Sciences 1256, 49 (2012).

[6]

M. Hledík, N. Barton, and G. Tkačik, Accumulation and maintenance of information in evolution, Proceedings of the National Academy of Sciences 119, e2123152119 (2022).

[7]

C. E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal 27, 379 (1948).

[8]

S. P. Strong, R. Koberle, R. R. de Ruyter van Steveninck, and W. Bialek, Entropy and Information in Neural Spike Trains, Physical Review Letters 80, 197 (1998).

[9]

F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek, Spikes: exploring the neural code (MIT Press, Cambridge, Massachusetts, 1999).

[10]

G. Tkačik, C. G. Callan, and W. Bialek, Information flow and optimization in transcriptional regulation, Proceedings of the National Academy of Sciences 105, 12265 (2008).

[11]

P. Mehta, S. Goyal, T. Long, B. L. Bassler, and N. S. Wingreen, Information processing and signal integration in bacterial quorum sensing, Molecular Systems Biology 5, 325 (2009).

[12]

R. Cheong, A. Rhee, C. J. Wang, I. Nemenman, and A. Levchenko, Information Transduction Capacity of Noisy Biochemical Signaling Networks, Science 334, 354 (2011).

[13]

N. Tishby, F. C. Pereira, and W. Bialek, The information bottleneck method, arXiv (2000).

[14]

N. Tishby and N. Zaslavsky, Deep Learning and the Information Bottleneck Principle, arXiv (2015).

[15]

R. Shwartz-Ziv and N. Tishby, Opening the Black Box of Deep Neural Networks via Information, arXiv (2017).

[16]

P. Fiedor, Networks in financial markets based on the mutual information rate, Physical Review E 89, 052801 (2014).

[17]

J. L. Kelly, A New Interpretation of Information Rate, Bell System Technical Journal 35, 917 (1956).

[18]

M. Aspelmeyer, T. J. Kippenberg, and F. Marquardt, Cavity optomechanics, Reviews of Modern Physics 86, 1391 (2014).

[19]

K. J. H. Peters and S. R. K. Rodriguez, Exceptional Precision of a Nonlinear Optical Sensor at a Square-Root Singularity, Physical Review Letters 129, 013901 (2022).

[20]

T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. (John Wiley & Sons, 2006).

[21]

J. B. Kinney and G. S. Atwal, Equitability, mutual information, and the maximal information coefficient, Proceedings of the National Academy of Sciences 111, 3354 (2014).

[22]

K. R. Pilkiewicz and M. L. Mayo, Fluctuation sensitivity of a transcriptional signaling cascade, Physical Review E 94, 032412 (2016).

[23]

R. Fan and A. Hilfinger, Characterizing the nonmonotonic behavior of mutual information along biochemical reaction cascades, Physical Review E 110, 034309 (2024).

[24]

A. Das and P. R. ten Wolde, Exact computation of Transfer Entropy with Path Weight Sampling, arXiv (2024).

[25]

G. Tkačik, A. M. Walczak, and W. Bialek, Optimizing information flow in small genetic networks, Physical Review E 80, 031920 (2009).

[26]

A. M. Walczak, G. Tkačik, and W. Bialek, Optimizing information flow in small genetic networks. II. Feed-forward interactions, Physical Review E 81, 041905 (2010).

[27]

C. G. Bowsher and P. S. Swain, Identifying sources of variation and the flow of information in biochemical networks, Proceedings of the National Academy of Sciences 109, E1320 (2012).

[28]

D. Clausznitzer, G. Micali, S. Neumann, V. Sourjik, and R. G. Endres, Predicting Chemical Environments of Bacteria from Receptor Signaling, PLoS Computational Biology 10, e1003870 (2014).

[29]

S. E. Palmer, O. Marre, M. J. Berry, and W. Bialek, Predictive information in a sensory population, Proceedings of the National Academy of Sciences 112, 6908 (2015).

[30]

A. Borst and F. E. Theunissen, Information theory and neural coding, Nature Neuroscience 2, 947 (1999).

[31]

M. J. Boulware and J. S. Marchant, Timing in Cellular Ca2+ Signaling, Current Biology 18, R769 (2008).

[32]

T. Mitra, S. N. Menon, and S. Sinha, Emergent memory in cell signaling: Persistent adaptive dynamics in cascades can arise from the diversity of relaxation time-scales, Scientific Reports 8, 13230 (2018).

[33]

P. Nałęcz-Jawecki, P. A. Gagliardi, M. Kochańczyk, C. Dessauges, O. Pertz, and T. Lipniacki, The MAPK/ERK channel capacity exceeds 6 bit/hour, PLOS Computational Biology 19, e1011155 (2023).

[34]

J. Selimkhanov, B. Taylor, J. Yao, A. Pilko, J. Albeck, A. Hoffmann, L. Tsimring, and R. Wollman, Accurate information transmission through dynamic biochemical signaling networks, Science 346, 1370 (2014).

[35]

M. Staniek and K. Lehnertz, Symbolic Transfer Entropy, Physical Review Letters 100, 158101 (2008).

[36]

F. Tostevin and P. R. ten Wolde, Mutual Information between Input and Output Trajectories of Biochemical Networks, Physical Review Letters 102, 218101 (2009).

[37]

J. Runge, J. Heitzig, V. Petoukhov, and J. Kurths, Escaping the Curse of Dimensionality in Estimating Multivariate Transfer Entropy, Physical Review Letters 108, 258701 (2012).

[38]

M. Meijers, S. Ito, and P. R. ten Wolde, Behavior of information flow near criticality, Physical Review E 103, L010102 (2021).

[39]

Y. Tang, A. Adelaja, F. X.-F. Ye, E. Deeds, R. Wollman, and A. Hoffmann, Quantifying information accumulation encoded in the dynamics of biochemical signaling, Nature Communications 12, 1272 (2021).

[40]

L. Hahn, A. M. Walczak, and T. Mora, Dynamical Information Synergy in Biochemical Signaling Networks, Physical Review Letters 131, 128401 (2023).

[41]

N. Umeki, Y. Kabashima, and Y. Sako, Evaluation of information flows in the RAS-MAPK system using transfer entropy measurements, bioRxiv 2023.08.06.552214 (2024).

[42]

G. Nicoletti and D. M. Busiello, Tuning Transduction from Hidden Observables to Optimize Information Harvesting, Physical Review Letters 133, 158401 (2024).

[43]

T. Schreiber, Measuring information transfer, Physical Review Letters 85, 461 (2000).

[44]

A. Kaiser and T. Schreiber, Information transfer in continuous processes, Physica D: Nonlinear Phenomena 166, 43 (2002).

[45]

A. C. Barato, D. Hartich, and U. Seifert, Rate of Mutual Information Between Coarse-Grained Non-Markovian Variables, Journal of Statistical Physics 153, 460 (2013).

[46]

D. Hartich, A. C. Barato, and U. Seifert, Stochastic thermodynamics of bipartite systems: transfer entropy inequalities and a Maxwell’s demon interpretation, Journal of Statistical Mechanics: Theory and Experiment 2014, P02016 (2014).

[47]

L. Paninski, Estimation of Entropy and Mutual Information, Neural Computation 15, 1191 (2003).

[48]

G. Tkačik, J. O. Dubuis, M. D. Petkova, and T. Gregor, Positional Information, Positional Error, and Read-Out Precision in Morphogenesis: A Mathematical Framework, Genetics 199, genetics.114.171850 (2014).

[49]

A. Kraskov, H. Stögbauer, and P. Grassberger, Estimating mutual information, Physical Review E 69, 066138 (2004).

[50]

S. A. Cepeda-Humerez, J. Ruess, and G. Tkačik, Estimating information in time-varying signals., PLoS Computational Biology 15, e1007290 (2019).

[51]

M. D. Donsker and S. R. S. Varadhan, Asymptotic evaluation of certain markov process expectations for large time. IV, Communications on Pure and Applied Mathematics 36, 183 (1983).

[52]

M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio, A. Courville, and D. Hjelm, Mutual Information Neural Estimation, in Proceedings of the 35th International Conference on Machine Learning, edited by J. Dy and A. Krause, Vol. 80 (PMLR, 2018), pp. 531–540.

[53]

D. McAllester and K. Stratos, Formal Limitations on the Measurement of Mutual Information, arXiv (2018).

[54]

A. Baronchelli, E. Caglioti, and V. Loreto, Measuring complexity with zippers, European Journal of Physics 26, S69 (2005).

[55]

Y. Gao, I. Kontoyiannis, and E. Bienenstock, Estimating the Entropy of Binary Time Series: Methodology, Some Theory and a Simulation Study, Entropy 10, 71 (2008).

[56]

M. Sinzger-D’Angelo and H. Koeppl, Counting Processes with Piecewise-Deterministic Markov Conditional Intensity: Asymptotic Analysis, Implementation and Information-Theoretic Use, IEEE Transactions on Information Theory 1 (2023).

[57]

M. Gehri, N. Engelmann, and H. Koeppl, Mutual Information of a class of Poisson-type Channels using Markov Renewal Theory, arXiv (2024).

[58]

L. Duso and C. Zechner, Path Mutual Information for a Class of Biochemical Reaction Networks, in 2019 IEEE 58th Conference on Decision and Control (CDC) (2019), pp. 6610–6615.

[59]

A.-L. Moor and C. Zechner, Dynamic information transfer in stochastic biochemical networks, Physical Review Research 5, 013032 (2023).

[60]

M. N. Rosenbluth and A. W. Rosenbluth, Monte Carlo Calculation of the Average Extension of Molecular Chains, The Journal of Chemical Physics 23, 356 (1955).

[61]

J. I. Siepmann, A method for the direct calculation of chemical potentials for dense chain systems, Molecular Physics 70, 1145 (1990).

[62]

P. Grassberger, Pruned-enriched Rosenbluth method: Simulations of

\theta

polymers of chain length up to 1 000 000, Physical Review E 56, 3682 (1997).

[63]

D. Frenkel and B. Smit, Understanding Molecular Simulation, 2nd ed. (Academic Press, San Diego, 2002).

[64]

P. G. Bolhuis, D. Chandler, C. Dellago, and P. L. Geissler, TRANSITION PATH SAMPLING: Throwing Ropes Over Rough Mountain Passes, in the Dark, Annual Review of Physical Chemistry 53, 291 (2002).

[65]

A. Gelman and X.-L. Meng, Simulating normalizing constants: from importance sampling to bridge sampling to path sampling, Statistical Science 13, 163 (1998).

[66]

R. M. Neal, Annealed importance sampling, Statistics and Computing 11, 125 (2001).

[67]

D. P. Kingma and M. Welling, Auto-Encoding Variational Bayes, arXiv (2013).

[68]

M. Reinhardt, G. Tkačik, and P. R. ten Wolde, Path Weight Sampling: Exact Monte Carlo Computation of the Mutual Information between Stochastic Trajectories, Physical Review X 13, 041017 (2023).

In mathematical terms, this interplay between information and reward can be characterized by utility functions, which quantify the benefits of different actions based on available information [1,2].↩︎
In contrast to other correlation measures used in statistics, such as the Pearson correlation coefficient, the mutual information captures both linear and nonlinear dependencies between variables. Additionally, in contrast to other correlation measures, the mutual information satisfies the data processing inequality, which states that no type of post-processing can increase the mutual information between the input and output [20,21]. These properties make the mutual information uniquely suited for describing the fidelity of the input-output mapping in information-processing systems. Note however that a naïve use of the data processing inequality leads to seemingly contradictory results when applied to the stationary dynamics of processing cascades [22–24].↩︎
M. Reinhardt, G. Tkačik, and P. R. ten Wolde, Path Weight Sampling: Exact Monte Carlo Computation of the Mutual Information between Stochastic Trajectories, Phys. Rev. X 13, 041017 (2023) [68]↩︎