| ... | @@ -34,7 +34,10 @@ Learning is similar to regression: for M dimensional input array, we learn M tra |
... | @@ -34,7 +34,10 @@ Learning is similar to regression: for M dimensional input array, we learn M tra |
|
|
|
|
|
|
|
Nobody is perfect. That is also true for our ML models. Generalization is major challenge in all ML implementations and we do select our model complexity very carefully and deploy various regularization strategies in order to overcome this issue. This is, however, one way of solving a difficult problem. In the opposite end, we aim to have weak learners –many of them- instead of one, powerful, tailored model. In the most basic case, we do ask N number of models about their opinion then use the most frequent answer / the average prediction. Another way is to split the data space into regions and train “expert” models in a sub-domain of the input space (mixtures of experts). Training of such individual models can be decoupled (independently trained models, preferably relying on a different learning method); or be coupled. One smart way of doing it is to use the feedback obtained from the ith model during the training of (i+1)th model. This strategy is referred as boosting.
|
|
Nobody is perfect. That is also true for our ML models. Generalization is major challenge in all ML implementations and we do select our model complexity very carefully and deploy various regularization strategies in order to overcome this issue. This is, however, one way of solving a difficult problem. In the opposite end, we aim to have weak learners –many of them- instead of one, powerful, tailored model. In the most basic case, we do ask N number of models about their opinion then use the most frequent answer / the average prediction. Another way is to split the data space into regions and train “expert” models in a sub-domain of the input space (mixtures of experts). Training of such individual models can be decoupled (independently trained models, preferably relying on a different learning method); or be coupled. One smart way of doing it is to use the feedback obtained from the ith model during the training of (i+1)th model. This strategy is referred as boosting.
|
|
|
|
|
|
|
|
One popular building block here is the decision tree. The model can be interpreted as a graph, consisting of sequential decisions performed at each node. These decisions are how the data space is divided into sub-fractions. To illustrate, lets imagine a 2D dataspace. In the first step, data-space is divided into two based on a model parameter $`γ_1`$. Then, each sub-domain is further split based on the model parameters $`γ_2`$ and $`γ_3`$. Once we have a tree, we can pass a new input x' and find out in which zone that new instance falls into. At the leaf nodes, we may have constants or class indices printed out as the output. During the training, we learn the split criteria and the values for the split conditions.
|
|
One popular building block here is the decision tree (DT). The model can be interpreted as a graph, consisting of sequential decisions performed at each node. These decisions are how the data space is divided into sub-fractions. To illustrate, lets imagine a 2D dataspace. In the first step, data-space is divided into two based on a model parameter $`γ_1`$. Then, each sub-domain is further split based on the model parameters $`γ_2`$ and $`γ_3`$. Once we have a tree, we can pass a new input x' and find out in which zone that new instance falls into. At the leaf nodes, we may have constants or class indices printed out as the output. Decision trees are transparent models: we can trace the reasoning behind output and judge the relative influences of the input features on that decision. This property is what makes DT based models a popular choice in the medical field.
|
|
|
|
|
|
|
|
|
|
|
|
During the training, we learn the split criteria and the values for the split conditions. We also learn what are the leaf values (used for classes or regression tasks).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ... | |
... | |
| ... | | ... | |