| ... | @@ -28,9 +28,11 @@ With regularization, we can reduce the the effective model complexity so that th |
... | @@ -28,9 +28,11 @@ With regularization, we can reduce the the effective model complexity so that th |
|
|
|
|
|
|
|
## Support Vector Machines (SVM)
|
|
## Support Vector Machines (SVM)
|
|
|
|
|
|
|
|
SVM is now one of the mostly applied techniques in supervised machine learning tasks. Its origins can be traced back to 1960s and became a popular method with its ability to recognize hand written notes –better than the famous neural networks- in early 1990s. Compared to linear regression, its mathematical engine is much more complicated but ensures to find the global minima / maxima; a rare ability in data driven techniques. It can also be applied to “draw” nonlinear decision boundaries by using transformation functions commonly called the kernel trick. This is what enables the SVM classifier to perform better than NN in 1990s.
|
|
SVM is now one of the mostly applied techniques in supervised machine learning. Its origins can be traced back to 1960s and became a popular method with its ability to recognize hand written notes –better than the famous neural networks- in early 1990s. Compared to linear regression, its mathematical engine is much more complicated but ensures to find the global minima / maxima; a rare ability in data driven techniques. It can also be used to “draw” nonlinear decision boundaries by using transformation functions, commonly referred as the kernel trick. This is, as a matter of fact, what enabled the SVM classifier to perform better than NN in 1990s.
|
|
|
|
|
|
|
|
In order to understand how it works, let’s go back to its origins and imagine a binary classification task (reds and blues) in 2D data space, which is linearly separable. In other words, it means that there exists at least one line that can separate these points. If we try to a pencil, we see however that may lines do exist and the million-dollar question here is to figure out which line (a hyperplane in N dimensional case) is the best way to separate. For mathematical convenience, let’s go in to the number domain, rather than sticking to colours, and say that we are trying to separate positive numbers from the negative numbers. In this case, the decision boundary corresponds to the locations of zeros in this special line:
|
|
In order to understand how it works, let’s go back to its origins and look at a binary classification task (reds and blues) in 2D data space, which is linearly separable. If we try with paper & pen, we see that may lines do exist and the million-dollar question here is to figure out which line (a hyperplane in N dimensional case) is the best way to separate.
|
|
|
|
|
|
|
|
For mathematical convenience, let’s go into the number domain, rather than sticking to colors, and say that we are trying to separate positive numbers from the negative numbers. In this case, the decision boundary will correspond to the locations of zeros along this special line:
|
|
|
|
|
|
|
|
<div>
|
|
<div>
|
|
|
|
|
|
| ... | @@ -38,7 +40,9 @@ $`y_{i} = w . x_i + b >= 0`$ for positive points |
... | @@ -38,7 +40,9 @@ $`y_{i} = w . x_i + b >= 0`$ for positive points |
|
|
|
|
|
|
|
$`y_{i} = w . x_i + b <= 0`$ for negative points
|
|
$`y_{i} = w . x_i + b <= 0`$ for negative points
|
|
|
|
|
|
|
|
</div> ...
|
|
</div>
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
## Additional Sources
|
|
## Additional Sources
|
|
|
|
|
|
| ... | |
... | |
| ... | | ... | |