| ... | ... | @@ -28,7 +28,5 @@ y_p(x, w) = σ(w_0 + w_1x_1 + . . . + w_ix_i) |
|
|
|
|
|
|
|
The continuous nature of the basis function will give us a gentle transition from Class 1 to Class 2, which can be interpreted as possibility to belong to a class.
|
|
|
|
|
|
|
|
Learning is similar to regression: for M dimensional input array, we learn M trainable parameters (and bias). So, the model training is very fast at high dimensions. Similar to regression, learning is typically managed by the gradient descent algorithm.
|
|
|
|
|
|
|
|
Also note that logistic regression suffers from over-fitting, if the training dataset is linearly separable. Once trained, logistic sigmoid function will be very steep, like [a Heaviside step function](https://en.wikipedia.org/wiki/Heaviside_step_function). Therefore, you should add regularization to the error function (penalize w going to very large values).
|
|
|
|
Learning is similar to regression: for M dimensional input array, we learn M trainable parameters (and bias). So, the model training is very fast at high dimensions. Similar to regression, learning is typically managed by the gradient descent (GD) algorithm. It should be noted that logistic regression also suffers from the over-fitting, if the training dataset is perfectly linearly separable. Once trained with such a dataset, the logistic sigmoid function will be very steep, like a [Heaviside step function](https://en.wikipedia.org/wiki/Heaviside_step_function). Therefore, we should add regularization to the error function that we apply the GD (penalize w going to very large values).
|
|
|
|
|