Changes

Cihan Ates · ae72224b
--- a/DDE-1/Regression.md
+++ b/DDE-1/Regression.md
@@ -22,6 +22,18 @@ where Lambda defines the relative effect of the regularization term.  $`E_{regul
 With regularization, we can reduce the the effective model complexity so that the models can be trained with limited amount of data with much less over-fitting. It should be noted that, however, this addition creates another hyperparameter (λ), which is needed to be determined for the case of interest.  
+## Support Vector Machines (SVM)
+SVM is now one of the mostly applied techniques in supervised machine learning tasks. Its origins can be traced back to 1960s and became a popular method with its ability to recognize hand written notes –better than the famous neural networks- in early 1990s. Compared to linear regression, its mathematical engine is much more complicated but ensures to find the global minima / maxima; a rare ability in data driven techniques. It can also be applied to “draw” nonlinear decision boundaries by using transformation functions commonly called the kernel trick. This is what enables the SVM classifier to perform better than NN in 1990s. 
+In order to understand how it works, let’s go back to its origins and imagine a binary classification task (reds and blues) in 2D data space, which is linearly separable. In other words, it means that there exists at least one line that can separate these points. If we try to a pencil, we see however that may lines do exist and the million-dollar question here is to figure out which line (a hyperplane in N dimensional case) is the best way to separate. 
+For mathematical convenience, let’s go in to the number domain, rather than sticking to colours, and say that we are trying to separate positive numbers from the negative numbers. In this case, the decision boundary corresponds to the locations of zeros in this special line:
+```math 
+y_{i} = w . x_i + b >= 0 for positive points
+y_{i} = w . x_i + b <= 0 for negative points
+```
 ## Additional Sources