最大期望算法

歷史

最大期望值算法由亞瑟·P·丹普斯特（英語：Arthur P. Dempster），南·萊爾德（英語：Nan Laird）和唐納德·魯賓（英語：Donald Rubin）在他們1977年發表的經典論文中提出。他們指出此方法之前其實已經被很多作者「在他們特定的研究領域中多次提出過」。

介紹

EM算法用於在方程不能直接求解的情況下尋找統計模型的(局部)最大似然參數。這些模型中較為典型的是含有潛變量，未知參數並且已知觀測數據的模型。也就是說，要麼數據中存在缺失的值，要麼模型可以通過假設存在更多未觀測到的數據點來更簡單地表示。以混合模型（Mixture Model）為例，通過假設每個觀察到的數據點都有一個對應的未觀察到的數據點，也可以說是潛在變量，來指定每個數據點所屬的混合部分，這樣就可以更簡單地描述混合模型。

EM簡單教程

EM是一個在已知部分相關變量的情況下，估計未知變量的迭代技術。EM的算法流程如下：

初始化分佈參數
重複直到收斂：
1. E步驟：根據參數的假設值，給出未知變量的期望值估計，應用於缺失值。
2. M步驟：根據未知變量的估計值，給出當前的參數的極大似然估計。

最大期望值過程說明

我們用 ${\textbf {y}}$ 表示能夠觀察到的不完整的變量值，用 ${\textbf {x}}$ 表示無法觀察到的變量值，這樣 ${\textbf {x}}$ 和 ${\textbf {y}}$ 一起組成了完整的數據。 ${\textbf {x}}$ 可能是實際測量丟失的數據，也可能是能夠簡化問題的隱藏變量，如果它的值能夠知道的話。例如，在混合模型中，如果「產生」樣本的混合元素成分已知的話最大似然公式將變得更加便利（參見下面的例子）。

估計無法觀測的數據

讓 $p\,$ 代表矢量 $\theta$ : $p(\mathbf {y} ,\mathbf {x} |\theta )$ 定義的參數的全部數據的概率密度函數（連續情況下）或者概率質量函數（離散情況下），那麼從這個函數就可以得到全部數據的最大似然值，另外，在給定的觀察到的數據條件下未知數據的條件分佈可以表示為：

p(\mathbf {x} |\mathbf {y} ,\theta )={\frac {p(\mathbf {y} ,\mathbf {x} |\theta )}{p(\mathbf {y} |\theta )}}={\frac {p(\mathbf {y} |\mathbf {x} ,\theta )p(\mathbf {x} |\theta )}{\int p(\mathbf {y} |\mathbf {x} ,\theta )p(\mathbf {x} |\theta )d\mathbf {x} }}

參考文獻

Arthur Dempster, Nan Laird, and Donald Rubin. "Maximum likelihood from incomplete data via the EM algorithm". Journal of the Royal Statistical Society, Series B, 39 (1):1–38, 1977 [1].
Robert Hogg, Joseph McKean and Allen Craig. Introduction to Mathematical Statistics. pp. 359-364. Upper Saddle River, NJ: Pearson Prentice Hall, 2005.
Radford Neal, Geoffrey Hinton. "A view of the EM algorithm that justifies incremental, sparse, and other variants". In Michael I. Jordan (editor), Learning in Graphical Models pp 355-368. Cambridge, MA: MIT Press, 1999.
The on-line textbook: Information Theory, Inference, and Learning Algorithms （頁面存檔備份，存於互聯網檔案館），by David J.C. MacKay includes simple examples of the E-M algorithm such as clustering using the soft K-means algorithm, and emphasizes the variational view of the E-M algorithm.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models （頁面存檔備份，存於互聯網檔案館），by J. Bilmes includes a simplified derivation of the EM equations for Gaussian Mixtures and Gaussian Mixture Hidden Markov Models.
Information Geometry of the EM and em Algorithms for Neural Networks （頁面存檔備份，存於互聯網檔案館），by Shun-Ichi Amari give a view of EM algorithm from geometry view point.