| ... | ... | @@ -16,6 +16,7 @@ This formulation however, has strong limitations and is prone to over-fitting an |
|
|
|
|
|
|
|
```math
|
|
|
|
E_{data} = y_{true} - y_p(x, w)
|
|
|
|
|
|
|
|
E_{total} = E_{data} + λE_{regularization}
|
|
|
|
```
|
|
|
|
where Lambda defines the relative effect of the regularization term. $`E_{regularization}`$ is typically defined as a function of the weight vector (w) and the variations in this dependency lead to alternative regularization methods. The underlying idea is to enforce the optimizer to decay the weight values towards zero, unless the opposite is enforces by the data. In statistics, this is called [parameter shrinkage method](https://en.wikipedia.org/wiki/Shrinkage_(statistics)).
|
| ... | ... | @@ -31,6 +32,7 @@ For mathematical convenience, let’s go in to the number domain, rather than st |
|
|
|
|
|
|
|
```math
|
|
|
|
y_{i} = w . x_i + b >= 0 for positive points
|
|
|
|
|
|
|
|
y_{i} = w . x_i + b <= 0 for negative points
|
|
|
|
```
|
|
|
|
|
| ... | ... | |
| ... | ... | |