Gibbs' inequality

Summarize

Perspective

Suppose that $P=\{p_{1},\ldots ,p_{n}\}$ and $Q=\{q_{1},\ldots ,q_{n}\}$ are discrete probability distributions. Then

-\sum _{i=1}^{n}p_{i}\log p_{i}\leq -\sum _{i=1}^{n}p_{i}\log q_{i}

with equality if and only if $p_{i}=q_{i}$ for $i=1,\dots n$ .^[1]^: 68 Put in words, the information entropy of a distribution $P$ is less than or equal to its cross entropy with any other distribution $Q$ .

The difference between the two quantities is the Kullback–Leibler divergence or relative entropy, so the inequality can also be written:^[2]^: 34

D_{\mathrm {KL} }(P\|Q)\equiv \sum _{i=1}^{n}p_{i}\log {\frac {p_{i}}{q_{i}}}\geq 0.

Note that the use of base-2 logarithms is optional, and allows one to refer to the quantity on each side of the inequality as an "average surprisal" measured in bits.

Remove ads

Proof

Summarize

Perspective

For simplicity, we prove the statement using the natural logarithm, denoted by $ln$ , since

\log _{b}a={\frac {\ln a}{\ln b}},

so the particular logarithm base $b > 1$ that we choose only scales the relationship by the factor $1 / ln b$ .

Let $I$ denote the set of all $i$ for which p_i is non-zero. Then, since $\ln x\leq x-1$ for all x > 0, with equality if and only if x=1, we have:

-\sum _{i\in I}p_{i}\ln {\frac {q_{i}}{p_{i}}}\geq -\sum _{i\in I}p_{i}\left({\frac {q_{i}}{p_{i}}}-1\right)

=-\sum _{i\in I}q_{i}+\sum _{i\in I}p_{i}=-\sum _{i\in I}q_{i}+1\geq 0

The last inequality is a consequence of the p_i and q_i being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero q_i, however, may have been excluded since the choice of indices is conditioned upon the p_i being non-zero. Therefore, the sum of the q_i may be less than 1.

So far, over the index set $I$ , we have:

-\sum _{i\in I}p_{i}\ln {\frac {q_{i}}{p_{i}}}\geq 0

or equivalently

-\sum _{i\in I}p_{i}\ln q_{i}\geq -\sum _{i\in I}p_{i}\ln p_{i}

Both sums can be extended to all $i=1,\ldots ,n$ , i.e. including $p_{i}=0$ , by recalling that the expression $p\ln p$ tends to 0 as $p$ tends to 0, and $(-\ln q)$ tends to $\infty$ as $q$ tends to 0. We arrive at

-\sum _{i=1}^{n}p_{i}\ln q_{i}\geq -\sum _{i=1}^{n}p_{i}\ln p_{i}

For equality to hold, we require

${\frac {q_{i}}{p_{i}}}=1$ for all $i\in I$ so that the equality $\ln {\frac {q_{i}}{p_{i}}}={\frac {q_{i}}{p_{i}}}-1$ holds,
and $\sum _{i\in I}q_{i}=1$ which means $q_{i}=0$ if $i\notin I$ , that is, $q_{i}=0$ if $p_{i}=0$ .

This can happen if and only if $p_{i}=q_{i}$ for $i=1,\ldots ,n$ .

Remove ads

Alternative proofs

Summarize

Perspective

The result can alternatively be proved using Jensen's inequality, the log sum inequality, or the fact that the Kullback-Leibler divergence is a form of Bregman divergence.

Proof by Jensen's inequality

Because log is a concave function, we have that:

\sum _{i}p_{i}\log {\frac {q_{i}}{p_{i}}}\leq \log \sum _{i}p_{i}{\frac {q_{i}}{p_{i}}}=\log \sum _{i}q_{i}=0

where the first inequality is due to Jensen's inequality, and $q$ being a probability distribution implies the last equality.

Furthermore, since $\log$ is strictly concave, by the equality condition of Jensen's inequality we get equality when

{\frac {q_{1}}{p_{1}}}={\frac {q_{2}}{p_{2}}}=\cdots ={\frac {q_{n}}{p_{n}}}

and

\sum _{i}q_{i}=1

Suppose that this ratio is $\sigma$ , then we have that

1=\sum _{i}q_{i}=\sum _{i}\sigma p_{i}=\sigma

where we use the fact that $p,q$ are probability distributions. Therefore, the equality happens when $p=q$ .

Proof by Bregman divergence

Alternatively, it can be proved by noting that $q-p-p\ln {\frac {q}{p}}\geq 0$ for all $p,q>0$ , with equality holding iff $p=q$ . Then, sum over the states, we have $\sum _{i}q_{i}-p_{i}-p_{i}\ln {\frac {q_{i}}{p_{i}}}\geq 0$ with equality holding iff $p=q$ .