吉洪諾夫正則化

吉洪諾夫正則化得名於安德烈·尼古拉耶維奇·吉洪諾夫，是在自變量高度相關的情景下估計多元回歸模型係數的方法。^[1]它已被用於許多領域，包括計量經濟學、化學和工程學。^[2]吉洪諾夫正則化為非適定性問題的正則化中最常見的方法。在統計學中，本方法被稱為脊迴歸或嶺回歸（ridge regression）；在機器學習領域則稱為權重衰減或權值衰減（weight decay）。因為有不同的數學家獨立發現此方法，此方法又稱做吉洪諾夫－米勒法（Tikhonov–Miller method）、菲利浦斯－圖米法（Phillips–Twomey method）、受限線性反演（constrained linear inversion method），或線性正規化（linear regularization）。此方法亦和用在非線性最小二乘法（英語：Non-linear_least_squares）的萊文貝格－馬夸特方法相關。它對於緩解線性回歸中的多重共線性問題特別有用，這常見於有大量參數的模型中。^[3]總的來說，這種方法提高了參數估計的效率，但也有可容忍的偏差（見偏差-方差權衡）。^[4]

該理論於1970年由Hoerl與Kennard發表在《技術計量學》上的文章《嶺回歸：非正交問題的偏估計》及《嶺回歸：非正交問題中的應用》中首次提出。^[5]^[6]^[1] This was the result of ten years of research into the field of ridge analysis.^[7]

嶺回歸是通過創建嶺回歸估計量（RR）實現的。當線性回歸模型具有多重共線（高度相關）的自變量時，嶺回歸對於最小二乘估計的不精確性是一種可能的解決方案。這提供了更精確的嶺參數估計，因為它的方差和均方估計量通常小於先前推導的最小二乘估計量。^[8]^[2]

當求解超定問題（即 $A_{m\times n}x=b,m>n$ ）時，矩陣 $A$ 的協方差矩陣 $A^{H}A$ 奇異或接近奇異時，利用最小二乘方法求出的結果 ${\hat {x}}_{LS}=(A^{H}A)^{-1}A^{H}b$ 會出現發散或對 $x$ 不合理的逼近。為了解決這一問題，吉洪諾夫於1963年提出了利用正則化項修改最小二乘的代價函數的方法，修改後的代價函數如下：

$J(x)={\frac {1}{2}}(\lVert Ax-b\rVert _{2}^{2}+\lambda \lVert x\rVert _{2}^{2})$

式中 $\lambda \geq 0$ 稱為正則化參數^[9]，這種方法被稱為吉洪諾夫正則化。

概覽

在最簡單的情況下，向主對角線添加正元素可以緩解近奇異矩量矩陣 $(\mathbf {X} ^{\mathsf {T}}\mathbf {X} )$ 問題，減少條件數。類似於最小二乘估計量，簡單嶺估計量可定義為

{\hat {\beta }}_{R}=(\mathbf {X} ^{\mathsf {T}}\mathbf {X} +\lambda \mathbf {I} )^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y}

其中 $\mathbf {y}$ 是回歸子， $\mathbf {X}$ 是設計矩陣， $\mathbf {I}$ 是單位矩陣，嶺參數 $\lambda \geq 0$ 則是矩量矩陣對角線的恆定位移。^[10]可以證明這個估計量是約束為 $\beta ^{\mathsf {T}}\beta =c$ 的最小二乘問題的解，可表達為拉格朗日形式：

\min _{\beta }\,(\mathbf {y} -\mathbf {X} \beta )^{\mathsf {T}}(\mathbf {y} -\mathbf {X} \beta )+\lambda (\beta ^{\mathsf {T}}\beta -c)

其說明， $\lambda$ 不過是約束的拉格朗日乘數。^[11]通常要根據啟發式準則選擇 $\lambda$ ，以便不完全滿足約束。特別是在約束 $\lambda =0$ ，即非約束約束（non-binding constrain），嶺估計量退化為普通最小二乘法。下面討論一種更通用的吉洪諾夫正則化方法。

歷史

吉洪諾夫正則化是在許多不同背景下獨立發明的。安德烈·吉洪諾夫^[12]^[13]^[14]^[15]^[16]和David L. Phillips最早使用了這種方法。^[17] 有限維情形由採用統計方法的Arthur E. Hoerl^[18]和Manus Foster完成，後者將其解釋為克里金法濾子。^[19]自Hoerl之後，這種方法在統計學文獻中被稱為嶺回歸，^[20]以沿單位矩陣對角線的形狀命名。

吉洪諾夫正則化

假設對已知矩陣 $A$ 和向量 $\mathbf {b}$ ，我們希望找到向量 $\mathbf {x}$ 使^{[需要解釋]}

A\mathbf {x} =\mathbf {b} .

標準方法是普通最小二乘法線性回歸。^{[需要解釋]}但若沒有 $\mathbf {x}$ 滿足方程或超過一個 $\mathbf {x}$ 滿足（即解不唯一），則待研究問題為不適定問題，普通最小二乘估計會導致方程組過定或欠定。大多數現實世界的現象在前向問題中都具有低通濾性質^{[需要解釋]}，其中 $A$ 將 $\mathbf {x}$ 映射到 $\mathbf {b}$ 。因此在解決逆問題時，逆映射作為高通濾波器，具有放大噪聲的不良趨勢（特徵值/奇異值在逆映射中最大，在正映射中最小）。此外，普通最小二乘隱式地消除了位於 $A$ 的零空間的 $\mathbf {x}$ 的重建版本的每個元素，而非允許將模型用作 $\mathbf {x}$ 的先驗。普通最小二乘尋找最小化殘差平方和，可以緊湊地寫作

\|A\mathbf {x} -\mathbf {b} \|_{2}^{2},

其中 $\|\cdot \|_{2}$ 是歐幾里得範數。

為優先選擇具有所需性質的特定解，可在最小化中包含正則化項：

\|A\mathbf {x} -\mathbf {b} \|_{2}^{2}+\|\Gamma \mathbf {x} \|_{2}^{2}

其中吉洪諾夫矩陣 $\Gamma$ 需要適當選取，許多時候選為單位矩陣的純量倍數（ $\Gamma =\alpha I$ ），並優先考慮範數較小的解；這叫做 $L 2$ 正則化。^[21]這之外，若認為基礎向量幾乎連續，則可使用高通運算（如遞推關係式或加權離散傅里葉變換）以實現平滑。這種正則化改進了問題條件，從而實現了直接的數值求解。顯式解表示為 ${\hat {x}}$ ，是這樣得到：

{\hat {x}}=(A^{\top }A+\Gamma ^{\top }\Gamma )^{-1}A^{\top }\mathbf {b} .

正則化的效果可能因矩陣 $\Gamma$ 的尺度而異。若擇 $\Gamma =0$ ，如(A^TA)⁻¹存在，則簡化為非正則化最小二乘解。

除線性回歸外， $L 2$ 正則化還有許多應用場景，如邏輯斯諦回歸或支持向量機分類，^[22]以及矩陣分解。^[23]

廣義吉洪諾夫正則化

對於 $x$ 和數據誤差的多元正態分佈，c可以應用變量的變換來簡化上述情況。等價地，可以尋求最小化 $x$ ：

\|Ax-b\|_{P}^{2}+\|x-x_{0}\|_{Q}^{2},

其中 $\|x\|_{Q}^{2}$ 表示加權範數平方 $x^{\top }Qx$ （比較馬哈拉諾比斯距離）。在貝葉斯解釋中， $P$ 是 $b$ 的逆協方差矩陣； $x_{0}$ 是 $x$ 的期望； $Q$ 是 $x$ 的逆協方差矩陣。吉洪諾夫矩陣為矩陣 $Q=\Gamma ^{\top }\Gamma$ 的分解（如科列斯基分解），可視作白化變換器。

這個推廣問題有最優解 $x^{*}$ ，可以使用公式顯式地寫為

x^{*}=(A^{\top }PA+Q)^{-1}(A^{\top }Pb+Qx_{0}),

或等效地，當Q非空：

x^{*}=x_{0}+(A^{\top }PA+Q)^{-1}(A^{\top }P(b-Ax_{0})).

拉夫連季耶夫正則化

有時可以避免使用 $A^{\top }$ ，這由米哈伊爾·拉夫連季耶夫指出。^[24]例如，若 $A$ 是對稱正定矩陣，即 $A=A^{\top }>0$ ，則其逆 $A^{-1}$ 可以用來在廣義吉洪諾夫正則化中構造加權範數平方 $\|x\|_{P}^{2}=x^{\top }A^{-1}x$ ，則有最小化

\|Ax-b\|_{A^{-1}}^{2}+\|x-x_{0}\|_{Q}^{2}

或等價地由常數項，

x^{\top }(A+Q)x-2x^{\top }(b+Qx_{0})

.

該最小化問題有最優解 $x^{*}$ ，可以緊湊地寫作公式

x^{*}=(A+Q)^{-1}(b+Qx_{0})

,

是廣義吉洪諾夫問題的解，其中 $A=A^{\top }=P^{-1}$ 。

拉夫連季耶夫正則化對原吉洪諾夫正則化有利，因為拉夫連季耶夫矩陣 $A+Q$ 的條件數比吉洪諾夫矩陣 $A^{\top }A+\Gamma ^{\top }\Gamma$ 小。

希爾伯特空間中的正則化

典型的離散線性非適定問題由積分方程的離散化引起，可以在原始的無窮維背景中實現吉洪諾夫正則化。上面，我們可以將 $A$ 解釋為希爾伯特空間上的緊算子， $x$ 、 $b$ 為 $A$ 的域與範圍上的元素。 $A^{*}A+\Gamma ^{\top }\Gamma$ 是自伴隨有界可逆運算。

與奇異值分解和維納濾波器的關係

有 $\Gamma =\alpha I$ 這個最小二乘解可用奇異值分解以特殊的方式分析。給定奇異值分解

A=U\Sigma V^{\top }

，奇異值 $\sigma _{i}$ ，則吉洪諾夫正則解可表為

{\hat {x}}=VDU^{\top }b,

其中 $D$ 的對角值為

D_{ii}={\frac {\sigma _{i}}{\sigma _{i}^{2}+\alpha ^{2}}}

其餘地方都是0。這表明吉洪諾夫參數對正則化問題條件數的影響。對於廣義情況，可以使用廣義奇異值分解推導出類似的表示。^[25]

最後，其與維納濾波有關：

{\hat {x}}=\sum _{i=1}^{q}f_{i}{\frac {u_{i}^{\top }b}{\sigma _{i}}}v_{i},

其中維納權為 $f_{i}={\frac {\sigma _{i}^{2}}{\sigma _{i}^{2}+\alpha ^{2}}}$ ； $q$ 是 $A$ 的秩。

確定吉洪諾夫因子

最佳正則化參數 $\alpha$ 一般未知，在實踐中常常臨時確定。一種可能的方法依賴於下面描述的貝葉斯解釋。其他方法包括偏差原理、交叉驗證、L曲線法、^[26]約束最大似然法和無偏預測風險估計。Grace Wahba證明，這種最優參數用留一交叉驗證最小^[27]^[28]

G={\frac {\operatorname {RSS} }{\tau ^{2}}}={\frac {\|X{\hat {\beta }}-y\|^{2}}{[\operatorname {Tr} (I-X(X^{T}X+\alpha ^{2}I)^{-1}X^{T})]^{2}}},

其中 $\operatorname {RSS}$ 是殘差平方和， $\tau$ 是自由度。

用前面的SVD分解，可以簡化上述表達式：

\operatorname {RSS} =\left\|y-\sum _{i=1}^{q}(u_{i}'b)u_{i}\right\|^{2}+\left\|\sum _{i=1}^{q}{\frac {\alpha ^{2}}{\sigma _{i}^{2}+\alpha ^{2}}}(u_{i}'b)u_{i}\right\|^{2},

\operatorname {RSS} =\operatorname {RSS} _{0}+\left\|\sum _{i=1}^{q}{\frac {\alpha ^{2}}{\sigma _{i}^{2}+\alpha ^{2}}}(u_{i}'b)u_{i}\right\|^{2},

；

\tau =m-\sum _{i=1}^{q}{\frac {\sigma _{i}^{2}}{\sigma _{i}^{2}+\alpha ^{2}}}=m-q+\sum _{i=1}^{q}{\frac {\alpha ^{2}}{\sigma _{i}^{2}+\alpha ^{2}}}.

與概率表述的關係

逆問題的概率公式引入了（當所有不確定量都為正態量時）表示模型參數先驗不確定性的協方差矩陣 $C_{M}$ ，以及表示觀測參數不確定性的協方差矩陣 $C_{D}$ 。^[29]當它們都是對角各向同性矩陣（ $C_{M}=\sigma _{M}^{2}I$ ），且 $C_{D}=\sigma _{D}^{2}I$ ，則逆理論方程簡化為上述方程，且 $\alpha ={\sigma _{D}}/{\sigma _{M}}$ 。

貝葉斯解釋

雖然選擇這個正則化問題的解可能看起來是人為的，而且矩陣 $\Gamma$ 似乎相當武斷，但從貝葉斯的角度來看，這個過程是合理的。^[30]注意，不適定問題必須引入額外假設才能得到唯一解。在統計學中， $x$ 的先驗分佈有時被認為是多元正態分佈。為簡單起見，此處做出以下假設：均值為零；組分獨立；組分標準差均為 $\sigma _{x}$ 。數據也受誤差影響，並且假設 $b$ 中的誤差獨立，均值為零，標準差為 $\sigma _{b}$ 。在這些假設下，根據貝葉斯定理，吉洪諾夫正則化解是給定數據和 $x$ 的先驗分佈的最可能的解。^[31]

若正態性假設被同方差和無關誤差假設代替，且若假設均值仍是零，則高斯-馬爾可夫定理意味着解是最小無偏線性估計量。^[32]

另見

Lasso算法是統計學中另一種正則化方法。
彈性網絡正則化
矩陣正則化

註釋

參考文獻

[1]
Hilt, Donald E.; Seegrist, Donald W. Ridge, a computer program for calculating ridge regression estimates. 1977 [2023-09-24]. doi:10.5962/bhl.title.68934. （原始內容存檔於2023-02-10）.^{[頁碼請求]}
[2]
Gruber, Marvin. Improving Efficiency by Shrinkage: The James--Stein and Ridge Regression Estimators. CRC Press. 1998: 2 [2023-09-24]. ISBN 978-0-8247-0156-7. （原始內容存檔於2022-05-10）.
[3]
Kennedy, Peter. A Guide to Econometrics Fifth. Cambridge: The MIT Press. 2003: 205–206. ISBN 0-262-61183-X.
[4]
Gruber, Marvin. Improving Efficiency by Shrinkage: The James–Stein and Ridge Regression Estimators. Boca Raton: CRC Press. 1998: 7–15. ISBN 0-8247-0156-9.
[5]
Hoerl, Arthur E.; Kennard, Robert W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970, 12 (1): 55–67. JSTOR 1267351. doi:10.2307/1267351.
[6]
Hoerl, Arthur E.; Kennard, Robert W. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics. 1970, 12 (1): 69–82. JSTOR 1267352. doi:10.2307/1267352.
[7]
Beck, James Vere; Arnold, Kenneth J. Parameter Estimation in Engineering and Science. James Beck. 1977: 287 [2023-09-24]. ISBN 978-0-471-06118-2. （原始內容存檔於2022-04-26）.
[8]
Jolliffe, I. T. Principal Component Analysis. Springer Science & Business Media. 2006: 178 [2023-09-24]. ISBN 978-0-387-22440-4. （原始內容存檔於2022-04-18）.
[9]
Tikhonov A.N. Solution of Incorrectly Formulated Problems and the Regularization Method. Soviet Mathematics Doklady. 1963, 4: 1035–1038.
[10]
關於實踐中 $\lambda$ 的選擇，參Khalaf, Ghadban; Shukur, Ghazi. Choosing Ridge Parameter for Regression Problems. Communications in Statistics – Theory and Methods. 2005, 34 (5): 1177–1182. S2CID 122983724. doi:10.1081/STA-200056836.
[11]
van Wieringen, Wessel. Lecture notes on ridge regression. 2021-05-31. arXiv:1509.09169  [stat.ME].
[12]
Tikhonov, Andrey Nikolayevich. Об устойчивости обратных задач [On the stability of inverse problems]. Doklady Akademii Nauk SSSR. 1943, 39 (5): 195–198. （原始內容存檔於2005-02-27）.
[13]
Tikhonov, A. N. О решении некорректно поставленных задач и методе регуляризации. Doklady Akademii Nauk SSSR. 1963, 151: 501–504.. Translated in Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics: 1035–1038.
[14]
Tikhonov, A. N.; V. Y. Arsenin. Solution of Ill-posed Problems. Washington: Winston & Sons. 1977. ISBN 0-470-99124-0.
[15]
Tikhonov, Andrey Nikolayevich; Goncharsky, A.; Stepanov, V. V.; Yagola, Anatolij Grigorevic. Numerical Methods for the Solution of Ill-Posed Problems. Netherlands: Springer Netherlands. 30 June 1995 [9 August 2018]. ISBN 079233583X. （原始內容存檔於2021-06-20）.
[16]
Tikhonov, Andrey Nikolaevich; Leonov, Aleksandr S.; Yagola, Anatolij Grigorevic. Nonlinear ill-posed problems. London: Chapman & Hall. 1998 [9 August 2018]. ISBN 0412786605. （原始內容存檔於2021-06-15）.
[17]
Phillips, D. L. A Technique for the Numerical Solution of Certain Integral Equations of the First Kind. Journal of the ACM. 1962, 9: 84–97. S2CID 35368397. doi:10.1145/321105.321114.
[18]
Hoerl, Arthur E. Application of Ridge Analysis to Regression Problems. Chemical Engineering Progress. 1962, 58 (3): 54–59.
[19]
Foster, M. An Application of the Wiener-Kolmogorov Smoothing Theory to Matrix Inversion. Journal of the Society for Industrial and Applied Mathematics. 1961, 9 (3): 387–392. doi:10.1137/0109031.
[20]
Hoerl, A. E.; R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970, 12 (1): 55–67. doi:10.1080/00401706.1970.10488634.
[21]
Ng, Andrew Y. Feature selection, L1 vs. L2 regularization, and rotational invariance (PDF). Proc. ICML. 2004 [2023-09-24]. （原始內容存檔 (PDF)於2023-03-15）.
[22]
R.-E. Fan; K.-W. Chang; C.-J. Hsieh; X.-R. Wang; C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research. 2008, 9: 1871–1874.
[23]
Guan, Naiyang; Tao, Dacheng; Luo, Zhigang; Yuan, Bo. Online nonnegative matrix factorization with robust stochastic approximation. IEEE Transactions on Neural Networks and Learning Systems. 2012, 23 (7): 1087–1099. PMID 24807135. S2CID 8755408. doi:10.1109/TNNLS.2012.2197827.
[24]
Lavrentiev, M. M. Some Improperly Posed Problems of Mathematical Physics. New York: Springer. 1967.
[25]
Hansen, Per Christian. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion 1st. Philadelphia, USA: SIAM. Jan 1, 1998. ISBN 9780898714036.
[26]
P. C. Hansen, "The L-curve and its use in the numerical treatment of inverse problems", [1] （頁面存檔備份，存於互聯網檔案館）
[27]
Wahba, G. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics (Society for Industrial and Applied Mathematics). 1990. Bibcode:1990smod.conf.....W.
[28]
Golub, G.; Heath, M.; Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter (PDF). Technometrics. 1979, 21 (2): 215–223 [2023-09-24]. doi:10.1080/00401706.1979.10489751. （原始內容存檔 (PDF)於2017-12-15）.
[29]
Tarantola, Albert. Inverse Problem Theory and Methods for Model Parameter Estimation 1st. Philadelphia: Society for Industrial and Applied Mathematics (SIAM). 2005 [2018-08-09]. ISBN 0898717922. （原始內容存檔於2021-02-25）.
[30]
Greenberg, Edward; Webster, Charles E., Jr. Advanced Econometrics : A Bridge to the Literature. New York: John Wiley & Sons. 1983: 207–213. ISBN 0-471-09077-8.
[31]
Vogel, Curtis R. Computational methods for inverse problems. Philadelphia: Society for Industrial and Applied Mathematics. 2002. ISBN 0-89871-550-4.
[32]
Amemiya, Takeshi. Advanced Econometrics. Harvard University Press. 1985: 60–61. ISBN 0-674-00560-0.

閱讀更多

Gruber, Marvin. Improving Efficiency by Shrinkage: The James–Stein and Ridge Regression Estimators. Boca Raton: CRC Press. 1998 [2023-09-24]. ISBN 0-8247-0156-9. （原始內容存檔於2022-10-17）.
Kress, Rainer. Tikhonov Regularization. Numerical Analysis. New York: Springer. 1998: 86–90 [2023-09-24]. ISBN 0-387-98408-9. （原始內容存檔於2022-10-17）.
Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P. Section 19.5. Linear Regularization Methods. Numerical Recipes: The Art of Scientific Computing 3rd. New York: Cambridge University Press. 2007 [2023-09-24]. ISBN 978-0-521-88068-8. （原始內容存檔於2011-08-11）.
Saleh, A. K. Md. Ehsanes; Arashi, Mohammad; Kibria, B. M. Golam. Theory of Ridge Regression Estimation with Applications. New York: John Wiley & Sons. 2019 [2023-09-24]. ISBN 978-1-118-64461-4. （原始內容存檔於2022-10-21）.
Taddy, Matt. Regularization. Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions. New York: McGraw-Hill. 2019: 69–104 [2023-09-24]. ISBN 978-1-260-45277-8. （原始內容存檔於2022-10-17）.

[Hilt-1] [1]
Hilt, Donald E.; Seegrist, Donald W. Ridge, a computer program for calculating ridge regression estimates. 1977 [2023-09-24]. doi:10.5962/bhl.title.68934. （原始內容存檔於2023-02-10）.^{[頁碼請求]}

[Gruber-2] [2]
Gruber, Marvin. Improving Efficiency by Shrinkage: The James--Stein and Ridge Regression Estimators. CRC Press. 1998: 2 [2023-09-24]. ISBN 978-0-8247-0156-7. （原始內容存檔於2022-05-10）.

[3] [3]
Kennedy, Peter. A Guide to Econometrics Fifth. Cambridge: The MIT Press. 2003: 205–206. ISBN 0-262-61183-X.

[4] [4]
Gruber, Marvin. Improving Efficiency by Shrinkage: The James–Stein and Ridge Regression Estimators. Boca Raton: CRC Press. 1998: 7–15. ISBN 0-8247-0156-9.

[5] [5]
Hoerl, Arthur E.; Kennard, Robert W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970, 12 (1): 55–67. JSTOR 1267351. doi:10.2307/1267351.

[6] [6]
Hoerl, Arthur E.; Kennard, Robert W. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics. 1970, 12 (1): 69–82. JSTOR 1267352. doi:10.2307/1267352.

[Beck-7] [7]
Beck, James Vere; Arnold, Kenneth J. Parameter Estimation in Engineering and Science. James Beck. 1977: 287 [2023-09-24]. ISBN 978-0-471-06118-2. （原始內容存檔於2022-04-26）.

[Jolliffe-8] [8]
Jolliffe, I. T. Principal Component Analysis. Springer Science & Business Media. 2006: 178 [2023-09-24]. ISBN 978-0-387-22440-4. （原始內容存檔於2022-04-18）.

[9] [9]
Tikhonov A.N. Solution of Incorrectly Formulated Problems and the Regularization Method. Soviet Mathematics Doklady. 1963, 4: 1035–1038.

[10] [10]
關於實踐中 $\lambda$ 的選擇，參Khalaf, Ghadban; Shukur, Ghazi. Choosing Ridge Parameter for Regression Problems. Communications in Statistics – Theory and Methods. 2005, 34 (5): 1177–1182. S2CID 122983724. doi:10.1081/STA-200056836.

[11] [11]
van Wieringen, Wessel. Lecture notes on ridge regression. 2021-05-31. arXiv:1509.09169  [stat.ME].

[12] [12]
Tikhonov, Andrey Nikolayevich. Об устойчивости обратных задач [On the stability of inverse problems]. Doklady Akademii Nauk SSSR. 1943, 39 (5): 195–198. （原始內容存檔於2005-02-27）.

[13] [13]
Tikhonov, A. N. О решении некорректно поставленных задач и методе регуляризации. Doklady Akademii Nauk SSSR. 1963, 151: 501–504.. Translated in Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics: 1035–1038.

[14] [14]
Tikhonov, A. N.; V. Y. Arsenin. Solution of Ill-posed Problems. Washington: Winston & Sons. 1977. ISBN 0-470-99124-0.

[15] [15]
Tikhonov, Andrey Nikolayevich; Goncharsky, A.; Stepanov, V. V.; Yagola, Anatolij Grigorevic. Numerical Methods for the Solution of Ill-Posed Problems. Netherlands: Springer Netherlands. 30 June 1995 [9 August 2018]. ISBN 079233583X. （原始內容存檔於2021-06-20）.

[16] [16]
Tikhonov, Andrey Nikolaevich; Leonov, Aleksandr S.; Yagola, Anatolij Grigorevic. Nonlinear ill-posed problems. London: Chapman & Hall. 1998 [9 August 2018]. ISBN 0412786605. （原始內容存檔於2021-06-15）.

[17] [17]
Phillips, D. L. A Technique for the Numerical Solution of Certain Integral Equations of the First Kind. Journal of the ACM. 1962, 9: 84–97. S2CID 35368397. doi:10.1145/321105.321114.

[18] [18]
Hoerl, Arthur E. Application of Ridge Analysis to Regression Problems. Chemical Engineering Progress. 1962, 58 (3): 54–59.

[19] [19]
Foster, M. An Application of the Wiener-Kolmogorov Smoothing Theory to Matrix Inversion. Journal of the Society for Industrial and Applied Mathematics. 1961, 9 (3): 387–392. doi:10.1137/0109031.

[20] [20]
Hoerl, A. E.; R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970, 12 (1): 55–67. doi:10.1080/00401706.1970.10488634.

[21] [21]
Ng, Andrew Y. Feature selection, L1 vs. L2 regularization, and rotational invariance (PDF). Proc. ICML. 2004 [2023-09-24]. （原始內容存檔 (PDF)於2023-03-15）.

[22] [22]
R.-E. Fan; K.-W. Chang; C.-J. Hsieh; X.-R. Wang; C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research. 2008, 9: 1871–1874.

[23] [23]
Guan, Naiyang; Tao, Dacheng; Luo, Zhigang; Yuan, Bo. Online nonnegative matrix factorization with robust stochastic approximation. IEEE Transactions on Neural Networks and Learning Systems. 2012, 23 (7): 1087–1099. PMID 24807135. S2CID 8755408. doi:10.1109/TNNLS.2012.2197827.

[24] [24]
Lavrentiev, M. M. Some Improperly Posed Problems of Mathematical Physics. New York: Springer. 1967.

[Hansen_SIAM_1998-25] [25]
Hansen, Per Christian. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion 1st. Philadelphia, USA: SIAM. Jan 1, 1998. ISBN 9780898714036.

[26] [26]
P. C. Hansen, "The L-curve and its use in the numerical treatment of inverse problems", [1] （頁面存檔備份，存於互聯網檔案館）

[27] [27]
Wahba, G. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics (Society for Industrial and Applied Mathematics). 1990. Bibcode:1990smod.conf.....W.

[28] [28]
Golub, G.; Heath, M.; Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter (PDF). Technometrics. 1979, 21 (2): 215–223 [2023-09-24]. doi:10.1080/00401706.1979.10489751. （原始內容存檔 (PDF)於2017-12-15）.

[29] [29]
Tarantola, Albert. Inverse Problem Theory and Methods for Model Parameter Estimation 1st. Philadelphia: Society for Industrial and Applied Mathematics (SIAM). 2005 [2018-08-09]. ISBN 0898717922. （原始內容存檔於2021-02-25）.

[30] [30]
Greenberg, Edward; Webster, Charles E., Jr. Advanced Econometrics : A Bridge to the Literature. New York: John Wiley & Sons. 1983: 207–213. ISBN 0-471-09077-8.

[31] [31]
Vogel, Curtis R. Computational methods for inverse problems. Philadelphia: Society for Industrial and Applied Mathematics. 2002. ISBN 0-89871-550-4.

[32] [32]
Amemiya, Takeshi. Advanced Econometrics. Harvard University Press. 1985: 60–61. ISBN 0-674-00560-0.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]