| ... | ... | @@ -32,7 +32,7 @@ SVM is now one of the mostly applied techniques in supervised machine learning. |
|
|
|
|
|
|
|
In order to understand how it works, let’s go back to its origins and look at a binary classification task (reds and blues) in 2D data space, which is linearly separable. If we try with paper & pen, we see that may lines do exist and the million-dollar question here is to figure out which line (a hyperplane in N dimensional case) is the best way to separate.
|
|
|
|
|
|
|
|
<img src="uploads/5581c5e5427aaeb3bd608c9053ddba58/svm_1.png" width="300" height="300">
|
|
|
|

|
|
|
|
|
|
|
|
For mathematical convenience, let’s go into the number domain, rather than sticking to colors, and say that we are trying to separate positive numbers from the negative numbers. In this case, the decision boundary will correspond to the locations of zeros along this special line:
|
|
|
|
|
| ... | ... | @@ -42,9 +42,7 @@ $`y_{i} = w . x_i + b >= 0`$ for positive points |
|
|
|
|
|
|
|
$`y_{i} = w . x_i + b <= 0`$ for negative points
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
Note that we do not know what w or b is.
|
|
|
|
</div>Note that we do not know what w or b is.
|
|
|
|
|
|
|
|
At this stage, we will add a custom rule, stating that the samples should be further away from that zero-line solution, say 1-unit distance:
|
|
|
|
|
| ... | ... | @@ -54,10 +52,9 @@ $`y_{i} = w . x_i + b >= 1`$ |
|
|
|
|
|
|
|
$`y_{i} = w . x_i + b <= -1`$
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
We are adding this rule to limit of degrees of freedom, as we can propose many solutions to the classification problem. This little touch is what we call ‘margin’. The margin is defined as the as the perpendicular distance between the decision boundary and the closest of the data points. When we set an objective such as “maximize the margin”, then we end up with a particular solution. Herein, the solution is found via a subset of points called support vectors.
|
|
|
|
</div>We are adding this rule to limit of degrees of freedom, as we can propose many solutions to the classification problem. This little touch is what we call ‘margin’. The margin is defined as the as the perpendicular distance between the decision boundary and the closest of the data points. When we set an objective such as “maximize the margin”, then we end up with a particular solution. Herein, the solution is found via a subset of points called support vectors.
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
## Additional Sources
|
|
|
|
|
| ... | ... | |
| ... | ... | |