| ... | @@ -10,6 +10,7 @@ In the previous week, we discuss one of the predictive learning tasks, [regressi |
... | @@ -10,6 +10,7 @@ In the previous week, we discuss one of the predictive learning tasks, [regressi |
|
|
|
|
|
|
|
<div align="center">
|
|
<div align="center">
|
|
|
<img src="uploads/c8788ae7ed388aac4da978e352a6e1e6/lr1.png" width="600">
|
|
<img src="uploads/c8788ae7ed388aac4da978e352a6e1e6/lr1.png" width="600">
|
|
|
|
|
|
|
Outline of the logistic regression model
|
|
Outline of the logistic regression model
|
|
|
</div>
|
|
</div>
|
|
|
|
|
|
| ... | @@ -101,6 +102,18 @@ The recursive algorithm is stopped when either of the followings is true: |
... | @@ -101,6 +102,18 @@ The recursive algorithm is stopped when either of the followings is true: |
|
|
|
|
|
|
|
[CART]( https://www.analyticssteps.com/blogs/classification-and-regression-tree-cart-algorithm) is the tree generator used in [scikit learn]( https://scikit-learn.org/stable/modules/tree.html#tree-algorithms). It can create both Classification And Regression Trees, --see the initials. Constructed trees are binary: each parent has two child nodes. The split is done based on the twoing criteria. Herein, we search for the best homogeneity for the child nodes via [Gini index]( https://en.wikipedia.org/wiki/Gini_coefficient). Once the tree is created, it is pruned via cost-complexity balancing. In the regression, the split criterion is based on the minimization of the squared errors.
|
|
[CART]( https://www.analyticssteps.com/blogs/classification-and-regression-tree-cart-algorithm) is the tree generator used in [scikit learn]( https://scikit-learn.org/stable/modules/tree.html#tree-algorithms). It can create both Classification And Regression Trees, --see the initials. Constructed trees are binary: each parent has two child nodes. The split is done based on the twoing criteria. Herein, we search for the best homogeneity for the child nodes via [Gini index]( https://en.wikipedia.org/wiki/Gini_coefficient). Once the tree is created, it is pruned via cost-complexity balancing. In the regression, the split criterion is based on the minimization of the squared errors.
|
|
|
|
|
|
|
|
|
## Model evaluation
|
|
|
|
|
|
|
|
Once the models of interest are trained and ready, we test them by using various tools:
|
|
|
|
|
|
|
|
|
|
|
|
<div align="center">
|
|
|
|
<img src="uploads/89b65167fd3b61ce5bd3779b7db794cb/lr2.png" width="600">
|
|
|
|
|
|
|
|
Outline of model testing with LogLoss, Confusion Matrix, ROC and PR curve
|
|
|
|
</div>
|
|
|
|
|
|
|
|
You may find the details in the lecture notes.
|
|
|
|
|
|
|
|
## Additional references
|
|
## Additional references
|
|
|
|
|
|
| ... | |
... | |
| ... | | ... | |