Loading AI tools
From Wikipedia, the free encyclopedia
In information theory, Gibbs' inequality is a statement about the information entropy of a discrete probability distribution. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality. It was first presented by J. Willard Gibbs in the 19th century.
Suppose that and are discrete probability distributions. Then
with equality if and only if for .[1]: 68 Put in words, the information entropy of a distribution is less than or equal to its cross entropy with any other distribution .
The difference between the two quantities is the Kullback–Leibler divergence or relative entropy, so the inequality can also be written:[2]: 34
Note that the use of base-2 logarithms is optional, and allows one to refer to the quantity on each side of the inequality as an "average surprisal" measured in bits.
For simplicity, we prove the statement using the natural logarithm, denoted by ln, since
so the particular logarithm base b > 1 that we choose only scales the relationship by the factor 1 / ln b.
Let denote the set of all for which pi is non-zero. Then, since for all x > 0, with equality if and only if x=1, we have:
The last inequality is a consequence of the pi and qi being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero qi, however, may have been excluded since the choice of indices is conditioned upon the pi being non-zero. Therefore, the sum of the qi may be less than 1.
So far, over the index set , we have:
or equivalently
Both sums can be extended to all , i.e. including , by recalling that the expression tends to 0 as tends to 0, and tends to as tends to 0. We arrive at
For equality to hold, we require
This can happen if and only if for .
The result can alternatively be proved using Jensen's inequality, the log sum inequality, or the fact that the Kullback-Leibler divergence is a form of Bregman divergence.
Because log is a concave function, we have that:
where the first inequality is due to Jensen's inequality, and being a probability distribution implies the last equality.
Furthermore, since is strictly concave, by the equality condition of Jensen's inequality we get equality when
and
Suppose that this ratio is , then we have that
where we use the fact that are probability distributions. Therefore, the equality happens when .
Alternatively, it can be proved by noting thatfor all , with equality holding iff . Then, sum over the states, we havewith equality holding iff .
This is because the KL divergence is the Bregman divergence generated by the function .
The entropy of is bounded by:[1]: 68
The proof is trivial – simply set for all i.
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.