| ... | @@ -17,9 +17,11 @@ This formulation however, has strong limitations and is prone to over-fitting an |
... | @@ -17,9 +17,11 @@ This formulation however, has strong limitations and is prone to over-fitting an |
|
|
```math
|
|
```math
|
|
|
E_{data} = y_{true} - y_p(x, w)
|
|
E_{data} = y_{true} - y_p(x, w)
|
|
|
```
|
|
```
|
|
|
|
|
|
|
```math
|
|
```math
|
|
|
E_{total} = E_{data} + λE_{regularization}
|
|
E_{total} = E_{data} + λE_{regularization}
|
|
|
```
|
|
```
|
|
|
|
|
|
|
where Lambda defines the relative effect of the regularization term. $`E_{regularization}`$ is typically defined as a function of the weight vector (w) and the variations in this dependency lead to alternative regularization methods. The underlying idea is to enforce the optimizer to decay the weight values towards zero, unless the opposite is enforces by the data. In statistics, this is called [parameter shrinkage method](https://en.wikipedia.org/wiki/Shrinkage_(statistics)).
|
|
where Lambda defines the relative effect of the regularization term. $`E_{regularization}`$ is typically defined as a function of the weight vector (w) and the variations in this dependency lead to alternative regularization methods. The underlying idea is to enforce the optimizer to decay the weight values towards zero, unless the opposite is enforces by the data. In statistics, this is called [parameter shrinkage method](https://en.wikipedia.org/wiki/Shrinkage_(statistics)).
|
|
|
|
|
|
|
|
With regularization, we can reduce the the effective model complexity so that the models can be trained with limited amount of data with much less over-fitting. It should be noted that, however, this addition creates another hyperparameter (λ), which is needed to be determined for the case of interest.
|
|
With regularization, we can reduce the the effective model complexity so that the models can be trained with limited amount of data with much less over-fitting. It should be noted that, however, this addition creates another hyperparameter (λ), which is needed to be determined for the case of interest.
|
| ... | @@ -28,24 +30,22 @@ With regularization, we can reduce the the effective model complexity so that th |
... | @@ -28,24 +30,22 @@ With regularization, we can reduce the the effective model complexity so that th |
|
|
|
|
|
|
|
SVM is now one of the mostly applied techniques in supervised machine learning tasks. Its origins can be traced back to 1960s and became a popular method with its ability to recognize hand written notes –better than the famous neural networks- in early 1990s. Compared to linear regression, its mathematical engine is much more complicated but ensures to find the global minima / maxima; a rare ability in data driven techniques. It can also be applied to “draw” nonlinear decision boundaries by using transformation functions commonly called the kernel trick. This is what enables the SVM classifier to perform better than NN in 1990s.
|
|
SVM is now one of the mostly applied techniques in supervised machine learning tasks. Its origins can be traced back to 1960s and became a popular method with its ability to recognize hand written notes –better than the famous neural networks- in early 1990s. Compared to linear regression, its mathematical engine is much more complicated but ensures to find the global minima / maxima; a rare ability in data driven techniques. It can also be applied to “draw” nonlinear decision boundaries by using transformation functions commonly called the kernel trick. This is what enables the SVM classifier to perform better than NN in 1990s.
|
|
|
|
|
|
|
|
In order to understand how it works, let’s go back to its origins and imagine a binary classification task (reds and blues) in 2D data space, which is linearly separable. In other words, it means that there exists at least one line that can separate these points. If we try to a pencil, we see however that may lines do exist and the million-dollar question here is to figure out which line (a hyperplane in N dimensional case) is the best way to separate.
|
|
In order to understand how it works, let’s go back to its origins and imagine a binary classification task (reds and blues) in 2D data space, which is linearly separable. In other words, it means that there exists at least one line that can separate these points. If we try to a pencil, we see however that may lines do exist and the million-dollar question here is to figure out which line (a hyperplane in N dimensional case) is the best way to separate. For mathematical convenience, let’s go in to the number domain, rather than sticking to colours, and say that we are trying to separate positive numbers from the negative numbers. In this case, the decision boundary corresponds to the locations of zeros in this special line:
|
|
|
For mathematical convenience, let’s go in to the number domain, rather than sticking to colours, and say that we are trying to separate positive numbers from the negative numbers. In this case, the decision boundary corresponds to the locations of zeros in this special line:
|
|
|
|
|
|
|
|
|
|
<div align="center">
|
|
<div>
|
|
|
|
|
|
|
|
$`y_{i} = w . x_i + b >= 0`$ for positive points
|
|
$`y_{i} = w . x_i + b >= 0`$ for positive points
|
|
|
|
|
|
|
|
$`y_{i} = w . x_i + b <= 0`$ for negative points
|
|
$`y_{i} = w . x_i + b <= 0`$ for negative points
|
|
|
|
|
|
|
|
</div>
|
|
</div> ...
|
|
|
...
|
|
|
|
|
|
|
|
|
|
## Additional Sources
|
|
## Additional Sources
|
|
|
|
|
|
|
|
- [Seeing theory: linear regression](https://seeing-theory.brown.edu/#secondPage/chapter6)
|
|
- [Seeing theory: linear regression](https://seeing-theory.brown.edu/#secondPage/chapter6)
|
|
|
- [Cross validation](https://scikit-learn.org/stable/modules/cross_validation.html?highlight=repeatedkfold)
|
|
- [Cross validation](https://scikit-learn.org/stable/modules/cross_validation.html?highlight=repeatedkfold)
|
|
|
- [Basics of regression](https://ml-cheatsheet.readthedocs.io/en/latest/linear_regression.html)
|
|
- [Basics of regression](https://ml-cheatsheet.readthedocs.io/en/latest/linear_regression.html)
|
|
|
- [Introduction to linear regression analysis](https://people.duke.edu/~rnau/regintro.htm)
|
|
- [Introduction to linear regression analysis](https://people.duke.edu/\~rnau/regintro.htm)
|
|
|
- [Simple Linear Regression Tutorial](https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/)
|
|
- [Simple Linear Regression Tutorial](https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/)
|
|
|
- [Linear Regression Notes](http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/)
|
|
- [Linear Regression Notes](http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/)
|
|
|
- [Support Vector Machines (SVM) Algorithm Explained](https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/)
|
|
- [Support Vector Machines (SVM) Algorithm Explained](https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/)
|
| ... | |
... | |
| ... | | ... | |