Changes

Cihan Ates · eda7064a
--- a/DDE-1/Classification.md
+++ b/DDE-1/Classification.md
@@ -38,11 +38,13 @@ One popular building block here is the decision tree (DT). The model can be inte
 <img src="uploads/fcafc1afd4bd03ae016ffcf29039ca31/DT-0.png"  width="600">
+Fig. 1: How decision trees work: 2D feature space
 During the training, we learn the split criteria and the values for the split conditions. We also learn what are the leaf values (used for classes or regression tasks). How does this managed? How can we create useful tree graphs? One option is to try combinations, but such an approach becomes practically impossible to perform, even for fixed number of nodes in a decision tree. Therefore, greedy optimization is usually deployed. Herein, we start with a single root node, splitting the space into two. For a simple illustration, the classification task in 2D is drawn below. In 2D, we have two candidate first nodes: vertical line and the horizontal line for the decision boundary. We will test both and decide which is better. Then, we go into the second iteration: here the number of candidates increase in terms of regions as well. As we go further, the model will get more complex, while fitting the data better. 
 <img src="uploads/6c8a02398ab92843ff69d30d54008702/DT_learning.png"  width="600">
-Fig 2. Learning boundaries in DT
+Fig. 2: Learning decision boundaries in DT
 In general, the optimization procedure includes (i) which of the input variables are divided into regions, (ii) choice of the input variables and (iii) the value of the threshold. In short, we try to split the sub-samples even further at every iteration. We can continue until we have the same labels (for classification) in a sub-space or it is not possible to split it further. It means, there will be no error of classification in the model (Fig. 2, bottom, middle). However, such an effort would give complex trees, leading to poor generalization capabilities. In practice, simple trees are preferred over the complex structures, which is achieved by utilizing different tree inducer algorithms (Fig. 2, bottom, right).