# Ridge Regression¶

## Notes¶

• Also known as Tikhonov Regularization

• Helps to reduce overfitting by reducing model variance through the addition of shrinkage towards zero across all coefficients.

• Can be useful in times when high multicollinearity is found between predictors

## Loss Function and Optimization Problem¶

For the case of Ridge Regression, the OLS loss function is modified by the addition of an $$\mathbf{L}_2$$ penalty with an associated tuning parameter, $$\lambda$$:

$L(\mathbf{\beta}) = \|\mathbf{y} - \mathbf{X}\mathbf{\beta}\|_2^2 + \lambda\|\mathbf{\beta}\|_2^2 \: \: \: \text{ with tuning parameter \lambda \geq 0}$

Using this function to formulate a least-squares optimization problem yields:

$\hat{\mathbf{\beta}} = \arg\min_{\mathbf{\beta}} L(\mathbf{\beta}) = \arg\min_{\mathbf{\beta}} \frac{1}{2n} \|\mathbf{y}-\mathbf{X}\mathbf{\beta} \|_{2}^{2} + \lambda\|\mathbf{\beta}\|_2^2$

Just like OLS, the $$\frac{1}{2n}$$ term is added in order to simplify gradient solving ($$\frac{1}{2}$$) and allow objective function convergence to the expected value of model error by the Law of Large Numbers ($$\frac{1}{n}$$).

## Model Estimator¶

By setting the gradient of the loss function equal to zero and solving for the coefficient vector, $$\hat{\mathbf{ \beta }}$$, the Ridge Estimator is found:

${\hat {\beta }}=(\mathbf {X} ^{\mathsf {T}}\mathbf {X} +\lambda \mathbf {I} )^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y}$

### Proving Uniqueness of the Estimator¶

It turns out that the Ridge problem can be shown to be strongly convex with a positive definite associated Hessian matrix. This Hessian is found as:

$\mathbf{H} = 2\mathbf{X}^\mathbf{T}\mathbf{X} + 2 \lambda \mathbf {I}$

And to show its positive definiteness:

$\mathbf{\beta}^\mathbf{T} (\mathbf{X}^\mathbf{T}\mathbf{X} + \lambda \mathbf {I})\mathbf{\beta} = (\mathbf{X}\mathbf{\beta})\mathbf{X}\mathbf{\beta} + \lambda \mathbf{\beta}^\mathbf{T}\mathbf{\beta} = \|\mathbf{X}\mathbf{\beta}\|_2^2 + \lambda \|\mathbf{\beta}\|_2^2 \succ 0 \: \: \: \forall \:\:\: \mathbf{\beta} \neq \mathbf{0}$

Thus, the Ridge estimator is the unique global minimizer to the Ridge Regression problem. [1][2]

## Sources¶

1

Uc berkeley fall 2020 cs189 (introduction to machine learning) note 2. Sep 2020. URL: https://www.eecs189.org/static/notes/n2.pdf.

2

Anil Aswani. Ieor 165 – engineering statistics, quality control, and forecasting lecture notes 8. Jan 2021. URL: http://courses.ieor.berkeley.edu/ieor165/lecture_notes/ieor165_lec8.pdf.

Contributions made by our wonderful GitHub Contributors: @wyattowalsh