| ... | @@ -48,6 +48,33 @@ Fig. 2: Learning decision boundaries in DT |
... | @@ -48,6 +48,33 @@ Fig. 2: Learning decision boundaries in DT |
|
|
|
|
|
|
|
In general, the optimization procedure includes (i) which of the input variables are divided into regions, (ii) choice of the input variables and (iii) the value of the threshold. In short, we try to split the sub-samples even further at every iteration. We can continue until we have the same labels (for classification) in a sub-space or it is not possible to split it further. It means, there will be no error of classification in the model (Fig. 2, bottom, middle). However, such an effort would give complex trees, leading to poor generalization capabilities. In practice, simple trees are preferred over the complex structures, which is achieved by utilizing different tree inducer algorithms (Fig. 2, bottom, right).
|
|
In general, the optimization procedure includes (i) which of the input variables are divided into regions, (ii) choice of the input variables and (iii) the value of the threshold. In short, we try to split the sub-samples even further at every iteration. We can continue until we have the same labels (for classification) in a sub-space or it is not possible to split it further. It means, there will be no error of classification in the model (Fig. 2, bottom, middle). However, such an effort would give complex trees, leading to poor generalization capabilities. In practice, simple trees are preferred over the complex structures, which is achieved by utilizing different tree inducer algorithms (Fig. 2, bottom, right).
|
|
|
|
|
|
|
|
|
### Learning in Decision Trees
|
|
|
|
|
|
|
|
Decision trees are one of the fundamental building blocks of data driven learning so let’s look a bit closer on the learning algorithms. One important feature we should remember is the fact that learning is based on domain splitting (see figures above). Therefore, size of the training set is of critical importance, as it dictates the complexity of the tree graphs. To be more precise, the algorithms will create trees with certain set of rules, such as number of allowed leaves in the model. Therefore, we should note that the tree architectures are indirectly controlled by the training dataset so we should always keep it in mind while discussing the learning procedure for DTs.
|
|
|
|
|
|
|
|
The first important aspect of learning DT from a given example set is the fact that it is a very, very difficult task as an optimization problem (NP hard). In short, it is not possible to solve for complex problems. So, we rely on heuristic methods such as the top-down (ID3, C4.5, CART) or bottom-up methods, were the latter is more preferred (also the case in scikit learn—CART. Note that CART uses growing and pruning strategy). The top-down method consists of the following steps:
|
|
|
|
createTree(exampleSet, Features, labels, splitEvaluator, stopCriteria):
|
|
|
|
i. Create a new tree “Tree”
|
|
|
|
ii. If the stopCriteria is True for given instances in exampleSet:
|
|
|
|
a. Node is a “Leaf” with the most common label in in exampleSet
|
|
|
|
iii. Else:
|
|
|
|
a. Use splitEvaluator function on each feature f to find best feature f* for exampleSet; splitEvaluator (f, exampleSet)
|
|
|
|
b. Label the tree as f*
|
|
|
|
c. for each outcome criteria v* of f*:
|
|
|
|
i. create subTree via createTree (exampleSet . v*, Features, labels)
|
|
|
|
ii. connect the subTree to Tree with “edge” information v*
|
|
|
|
iv. Prune the tree
|
|
|
|
v. Return the Tree
|
|
|
|
The recursive algorithm is stopped when either of the followings is true:
|
|
|
|
• All examples in the training set belong to a single class / value y,
|
|
|
|
• The maximum tree depth is reached,
|
|
|
|
• If the node is further split, the number of cases in one or more child nodes will be less than the minimum number instances in a node,
|
|
|
|
• The number of examples left in the node is less than the minimum number set for parent nodes,
|
|
|
|
• Splitting leads to a split score less than a threshold value.
|
|
|
|
|
|
|
|
### CART
|
|
|
|
|
|
|
|
[CART]( https://www.analyticssteps.com/blogs/classification-and-regression-tree-cart-algorithm) is the tree generator used in [scikit learn]( https://scikit-learn.org/stable/modules/tree.html#tree-algorithms). It can create both Classification And Regression Trees, --see the initials. Constructed trees are binary: each parent has two child nodes. The split is done based on the twoing criteria. Herein, we search for the best homogeneity for the child nodes via [Gini index]( https://en.wikipedia.org/wiki/Gini_coefficient). Once the tree is created, it is pruned via cost-complexity balancing. In the regression, the split criterion is based on the minimization of the squared errors.
|
|
|
|
|
|
|
|
|
|
|
|
|
## Additional references
|
|
## Additional references
|
| ... | |
... | |
| ... | | ... | |