Info-metrics

Info-metrics is an interdisciplinary approach to scientific modeling, inference and efficient information processing. It is the science of modeling, reasoning, and drawing inferences under conditions of noisy and limited information. From the point of view of the sciences, this framework is at the intersection of information theory, statistical methods of inference, applied mathematics, computer science, econometrics, complexity theory, decision analysis, modeling, and the philosophy of science.

Info-metrics provides a constrained optimization framework to tackle under-determined or ill-posed problems – problems where there is not sufficient information for finding a unique solution. Such problems are very common across all sciences: available information is incomplete, limited, noisy and uncertain. Info-metrics is useful for modelling, information processing, theory building, and inference problems across the scientific spectrum. The info-metrics framework can also be used to test hypotheses about competing theories or causal mechanisms.

Info-metrics evolved from the classical maximum entropy formalism, which is based on the work of Shannon. Early contributions were mostly in the natural and mathematical/statistical sciences. Since the mid 1980s and especially in the mid 1990s the maximum entropy approach was generalized and extended to handle a larger class of problems in the social and behavioral sciences, especially for complex problems and data. The word ‘info-metrics’ was coined in 2009 by Amos Golan, right before the interdisciplinary Info-Metrics Institute was inaugurated.

Consider a random variable ${\textstyle X}$ that can result in one of K distinct outcomes. The probability ${\textstyle p_{k}}$ of each outcome ${\textstyle x_{k}}$ is ${\textstyle p_{k}=p(x_{k})}$ for ${\textstyle k=1,2,\ldots ,K}$ . Thus, ${\textstyle P}$ is a K-dimensional probability distribution defined for ${\textstyle X}$ such that $p_{k}\epsilon [0,1]$ and ${\textstyle \sum _{k}p_{k}=1}$ . Define the informational content of a single outcome ${\textstyle x_{k}}$ to be ${\textstyle h(x_{k})=h(p_{k})=\log _{2}(1/p_{k})}$ (e.g., Shannon). Observing an outcome at the tails of the distribution (a rare event) provides much more information than observing another, more probable, outcome. The entropy^[1] is the expected information content of an outcome of the random variable X whose probability distribution is P: $H(P)=\sum _{k=1}^{K}p_{k}\log _{2}\left({\frac {1}{p_{k}}}\right)=-\sum _{k=1}^{K}p_{k}\log _{2}(p_{k})=\operatorname {E} \left[\log _{2}\left({\frac {1}{P(X)}}\right)\right]$

Here $p_{k}\log _{2}(p_{k})\equiv 0$ if $p_{k}=0$ , and $\operatorname {E}$ is the expectation operator.

Consider the problem of modeling and inferring the unobserved probability distribution of some K-dimensional discrete random variable given just the mean (expected value) of that variable. We also know that the probabilities are nonnegative and normalized (i.e., sum up to exactly 1). For all K > 2 the problem is underdetermined. Within the info-metrics framework, the solution is to maximize the entropy of the random variable subject to the two constraints: mean and normalization. This yields the usual maximum entropy solution. The solutions to that problem can be extended and generalized in several ways. First, one can use another entropy instead of Shannon’s entropy. Second, the same approach can be used for continuous random variables, for all types of conditional models (e.g., regression, inequality and nonlinear models), and for many constraints. Third, priors can be incorporated within that framework. Fourth, the same framework can be extended to accommodate greater uncertainty: uncertainty about the observed values and/or uncertainty about the model itself. Last, the same basic framework can be used to develop new models/theories, validate these models using all available information, and test statistical hypotheses about the model.

Six-sided dice

Inference based on information resulting from repeated independent experiments.

The following example is attributed to Boltzmann and was further popularized by Jaynes. Consider a six-sided die, where tossing the die is the event and the distinct outcomes are the numbers 1 through 6 on the upper face of the die. The experiment is the independent repetitions of tossing the same die. Suppose you only observe the empirical mean value, y, of N tosses of a six-sided die. Given that information, you want to infer the probabilities that a specific value of the face will show up in the next toss of the die. You also know that the sum of the probabilities must be 1. Maximizing the entropy (and using log base 2) subject to these two constraints (mean and normalization) yields the most uninformed solution. ${\begin{aligned}&{\underset {\{P\}}{\text{maximize}}}&&H(\mathbf {p} )=-\sum _{k=1}^{6}p_{k}\log _{2}(p_{k})\\&{\text{subject to}}&&\sum _{k}p_{k}x_{k}=y{\text{ and }}\sum _{k}p_{k}=1\end{aligned}}$

for ${\textstyle x_{k}=k}$ and ${\textstyle k=1,2,\ldots ,6}$ . The solution is

{\widehat {p}}_{k}={\frac {2^{-{\widehat {\lambda }}x_{k}}}{\sum _{k=1}^{6}2^{-{\widehat {\lambda }}x_{k}}}}\equiv {\frac {2^{-\lambda x_{k}}}{\Omega }}

where ${\textstyle {\widehat {p}}_{k}}$ is the inferred probability of event ${\textstyle k}$ , ${\textstyle {\widehat {\lambda }}}$ is the inferred Lagrange multipliers associated with the mean constraint, and ${\textstyle \Omega }$ is the partition (normalization) function. If it’s a fair die with mean of 3.5 you would expect that all faces are equally likely and the probabilities are equal. This is what the maximum entropy solution gives. If the die is unfair (or loaded) with a mean of 4, the resulting maximum entropy solution will be ${\textstyle p_{k}=(0.103,0.123,0.146,0.174,0.207,0.247)}$ . For comparison, minimizing the least squares criterion ${\textstyle \left(\sum _{k=1}^{6}p_{k}^{2}\right)}$ instead of maximizing the entropy yields ${\textstyle p_{k}(LS)=(0.095,0.124,0.152,0.181,0.210,0.238)}$ .

Some cross-disciplinary examples

Rainfall prediction: Using the expected daily rainfall (arithmetic mean), the maximum entropy framework can be used to infer and forecast the daily rainfall distribution.^[2]

Portfolio management: Suppose there is a portfolio manager who needs to allocate some assets or assign portfolio weights to different assets, while taking into account the investor’s constraints and preferences. Using these preferences and constraints, as well as the observed information, such as the market mean return, and covariances, of each asset over some time period, the entropy maximization framework can be used to find the optimal portfolio weights. In this case, the entropy of the portfolio represents its diversity. This framework can be modified to include other constraints such as minimal variance, maximal diversity etc. That model involves inequalities and can be further generalized to include short sales. More such examples and related code can be found on ^[3]^[4]

An extensive list of work related to info-metrics can be found here: http://info-metrics.org/bibliography.html

[1]
Shannon, Claude (1948). "A mathematical theory of communication". Bell System Technical Journal. 27: 379–423.
[2]
Golan, Amos (2018). Foundations of Info-metrics: Modeling, Inference, and Imperfect Information. Oxford University Press.
[3]
Bera, Anil K.; Park, Sung Y. (2008). "Optimal portfolio diversification using the maximum entropy principle". Econometric Reviews. 27 (4–6): 484–512.
[4]
"Portfolio Allocation – Foundations of Info-Metrics". info-metrics.org.

Classics

Rudolf Clausius. "Xi. on the nature of the motion which we call heat". The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 14 (91):108–127, 1857.
Ludwig Boltzmann. "Further studies on the thermal equilibrium of gas molecules (weitere studien über das wärmegleichgewicht unter gasmolekülen)". Sitzungsberichte der Akademie der Wissenschaften, Mathematische-Naturwissenschaftliche Klasse, pages 275–370, 1872.
J. W. Gibbs. Elementary principles in statistical mechanics. (New Haven, CT: Yale University Press), 1902.
C. E. Shannon. "A mathematical theory of communication". Bell System Technical Journal, 27:379–423, 1948.
Y. Alhassid and R. D. Levine. "Experimental and inherent uncertainties in the information theoretic approach". Chemical Physics Letters, 73 (1):16–20, 1980.
R. B. Ash. Information Theory. Interscience, New York, 1965.
A Caticha. Relative Entropy and Inductive Inference. 2004.
A Caticha. "Lectures on probability, entropy, and statistical physics". MaxEnt, Sao Paulo, Brazil, 2008.
Jan M. Van Campenhout Cover and Thomas M. "Maximum entropy and conditional probability". IEEE Transactions on Information Theory, IT-27, No. 4, 1981.
I. Csiszar. "Why least squares and maximum entropy? an aximomatic approach to inference for linear inverse problem". The Annals of Statistics, 19:2032–2066, 1991.
David Donoho, Hossein Kakavand, and James Mammen. "The simplest solution to an underdetermined system of linear equations". In Information Theory, 2006 IEEE International Symposium on, pages 1924–1928. IEEE, 2007.

Basic books and research monographs

Golan, Amos. Foundations of Info-metrics: Modeling, Inference, and Imperfect Information. Oxford University Press, 2018.
Golan. "Information and entropy econometrics – a review and synthesis". Foundations and Trends in Econometrics, 2(1-2):1–145, 2008.
R. D. Levine and M. Tribus. The Maximum Entropy Formalism. MIT Press, Cambridge, MA, 1979.
J. N. Kapur. Maximum Entropy Models in Science and Engineering. Wiley, 1993.
J. Harte. Maximum Entropy and Ecology: A Theory of Abundance, Distribution and Energetics. Oxford U Press, 2011.
A. Golan, G. Judge, and D. Miller. Maximum entropy econometrics: Robust estimation with limited data. John Wiley&Sons, 1996.
E. T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, 2003.

Other representative applications

J. R. Banavar, A. Maritan, and I. Volkov. "Applications of the principle of maximum entropy: from physics to ecology". Journal of Physics-Condensed Matter, 22(6), 2010.
Anil K. Bera and Sung Y. Park. "Optimal portfolio diversification using the maximum entropy principle". Econometric Reviews, 27(4-6):484–512, 2008.
Bhati, B. Buyuksahin, and A. Golan. "Image reconstruction: An information theoretic approach". American Statistical Association Proceedings, 2005.
Peter W Buchen and Michael Kelly. "The maximum entropy distribution of an asset inferred from option prices". Journal of Financial and Quantitative Analysis, 31(01):143–159, 1996.
Randall C Campbell and R Carter Hill. "Predicting multinomial choices using maximum entropy". Economics Letters, 64(3):263–269, 1999.
Ariel Caticha and Amos Golan. "An entropic framework for modeling economies". Physica A: Statistical Mechanics and its Applications, 408:149–163, 2014.
Marsha Courchane, Amos Golan, and David Nickerson. "Estimation and evaluation of loan discrimination: An informational approach". Journal of Housing Research, 11(1):67–90, 2000.
Tsukasa Fujiwara and Yoshio Miyahara. "The minimal entropy martingale measures for geometric Lévy processes". Finance and Stochastics, 7(4):509–531, 2003.

Marco Frittelli. "The minimal entropy martingale measure and the valuation problem in incomplete markets". Mathematical finance, 10(1):39–52, 2000.

D. Glennon and A. Golan. "A Markov model of bank failure estimated using an information-theoretic approach banks". Report, US Treasury, 2003.
A. Golan. "A multivariable stochastic theory of size distribution of firms with empirical evidence". Advances in Econometrics, 10:1–46, 1994.
A. Golan. "Modcomp model of compensation's effect on personnel retention – an information theoretic approach". Report, US Navy, February 2003.

Amos Golan and Volker Dose. "A generalized information theoretical approach to tomographic reconstruction". Journal of Physics A: Mathematical and General, 34(7):1271, 2001.

Bart Haegeman and Rampal S Etienne. "Entropy maximization and the spatial distribution of species". The American Naturalist, 175(4):E74–E90, 2010.
U. V. Toussaint, A. Golan and V. Dose and, “Maximum Entropy Decomposition of Quadruple Mass Spectra.” Journal of Vacuum Science and Technology A 22(2), Mar/Apr 2004, 401–406
Golan A., and D. Volker, “A Generalized Information Theoretical Approach to Tomographic Reconstruction,” J. of Physics A: Mathematical and General (2001) 1271–1283.

"Info-Metrics Institute: Information-Theoretic Data Analysis and Exposition | American University, Washington, D.C." american.edu. Retrieved 2017-11-07.
"Center for Science of Information NSF STC". soihub.org. Retrieved 2017-11-07.
http://info-metrics.org/

[1] [1]
Shannon, Claude (1948). "A mathematical theory of communication". Bell System Technical Journal. 27: 379–423.

[2] [2]
Golan, Amos (2018). Foundations of Info-metrics: Modeling, Inference, and Imperfect Information. Oxford University Press.

[3] [3]
Bera, Anil K.; Park, Sung Y. (2008). "Optimal portfolio diversification using the maximum entropy principle". Econometric Reviews. 27 (4–6): 484–512.

[4] [4]
"Portfolio Allocation – Foundations of Info-Metrics". info-metrics.org.

[1]

[2]

[3]

[4]