| ... | @@ -32,7 +32,7 @@ SVM is now one of the mostly applied techniques in supervised machine learning. |
... | @@ -32,7 +32,7 @@ SVM is now one of the mostly applied techniques in supervised machine learning. |
|
|
|
|
|
|
|
In order to understand how it works, let’s go back to its origins and look at a binary classification task (reds and blues) in 2D data space, which is linearly separable. If we try with paper & pen, we see that may lines do exist and the million-dollar question here is to figure out which line (a hyperplane in N dimensional case) is the best way to separate.
|
|
In order to understand how it works, let’s go back to its origins and look at a binary classification task (reds and blues) in 2D data space, which is linearly separable. If we try with paper & pen, we see that may lines do exist and the million-dollar question here is to figure out which line (a hyperplane in N dimensional case) is the best way to separate.
|
|
|
|
|
|
|
|
<img src="uploads/a74fff981e021b4459b34f8db9286161/svm_1.png" width="300">
|
|
<img src="uploads/a74fff981e021b4459b34f8db9286161/svm_1.png" width="200">
|
|
|
|
|
|
|
|
For mathematical convenience, let’s go into the number domain, rather than sticking to colors, and say that we are trying to separate positive numbers from the negative numbers. In this case, the decision boundary will correspond to the locations of zeros along this special line:
|
|
For mathematical convenience, let’s go into the number domain, rather than sticking to colors, and say that we are trying to separate positive numbers from the negative numbers. In this case, the decision boundary will correspond to the locations of zeros along this special line:
|
|
|
|
|
|
| ... | @@ -56,11 +56,11 @@ $`y_{i} = w . x_i + b <= -1`$ |
... | @@ -56,11 +56,11 @@ $`y_{i} = w . x_i + b <= -1`$ |
|
|
|
|
|
|
|
We are adding this rule to limit of degrees of freedom, as we can propose many solutions to the classification problem. This little touch is what we call ‘margin’. The margin is defined as the perpendicular distance between the decision boundary and the closest data points. When we set an objective such as “maximize the margin”, then we end up with a particular solution. Herein, the solution is found via a subset of points called support vectors.
|
|
We are adding this rule to limit of degrees of freedom, as we can propose many solutions to the classification problem. This little touch is what we call ‘margin’. The margin is defined as the perpendicular distance between the decision boundary and the closest data points. When we set an objective such as “maximize the margin”, then we end up with a particular solution. Herein, the solution is found via a subset of points called support vectors.
|
|
|
|
|
|
|
|
<img src="uploads/1363053e315b3f87492ed55dbdbfeee2/svm_2.png" width="300">
|
|
<img src="uploads/1363053e315b3f87492ed55dbdbfeee2/svm_2.png" width="400">
|
|
|
|
|
|
|
|
Then the question becomes “how can we evaluate the margin in the first place?”. We should first have an expression for the margin then use a smart way to find the maximum point. Here we will use a normalized definition. It is best seen with illustration: consider a plane separating blue and red data points as shown below. Here we can easily evaluate the width of the “street” separating the observations with 2 x margin (γ):
|
|
Then the question becomes “how can we evaluate the margin in the first place?”. We should first have an expression for the margin then use a smart way to find the maximum point. Here we will use a normalized definition. It is best seen with illustration: consider a plane separating blue and red data points as shown below. Here we can easily evaluate the width of the “street” separating the observations with 2 x margin (γ):
|
|
|
|
|
|
|
|
<img src="uploads/f1688d840610c5aabf43042f8111e57c/svm_3.png" width="300">
|
|
<img src="uploads/f1688d840610c5aabf43042f8111e57c/svm_3.png" width="200">
|
|
|
|
|
|
|
|
Now our objective becomes the maximization of the geometric margin γ. Nonetheless, the direct solution of this optimization problem is very complex, so we make another decision to convert it into a different problem, easier to solve (as always we do).
|
|
Now our objective becomes the maximization of the geometric margin γ. Nonetheless, the direct solution of this optimization problem is very complex, so we make another decision to convert it into a different problem, easier to solve (as always we do).
|
|
|
|
|
|
| ... | |
... | |
| ... | | ... | |