| ... | @@ -16,9 +16,9 @@ This formulation however, has strong limitations and is prone to over-fitting an |
... | @@ -16,9 +16,9 @@ This formulation however, has strong limitations and is prone to over-fitting an |
|
|
|
|
|
|
|
```math
|
|
```math
|
|
|
E_data = y_true - y_p(x, w)
|
|
E_data = y_true - y_p(x, w)
|
|
|
E_total = E_data + Lambda * E_regularization
|
|
E_total = E_data + Lambda * E_{regularization}
|
|
|
```
|
|
```
|
|
|
where Lambda defines the relative effect of the regularization term. $`E_regularization`$ is typically defined as a function of the weight vector and the variation in this dependency leads to alternative regularization methods. The underlying idea is to enforce the optimizer to decay the weight values towards zero, unless the opposite is enforces by the data. In statistics, this is called parameter [shrinkage method](https://en.wikipedia.org/wiki/Shrinkage_(statistics)).
|
|
where Lambda defines the relative effect of the regularization term. $`E_{regularization}`$ is typically defined as a function of the weight vector and the variation in this dependency leads to alternative regularization methods. The underlying idea is to enforce the optimizer to decay the weight values towards zero, unless the opposite is enforces by the data. In statistics, this is called parameter [shrinkage method](https://en.wikipedia.org/wiki/Shrinkage_(statistics)).
|
|
|
|
|
|
|
|
|
|
|
|
|
## Additional Sources
|
|
## Additional Sources
|
| ... | |
... | |
| ... | | ... | |