| ... | @@ -13,7 +13,7 @@ One alternative model structure here is the recurrent neuron cell. Herein, we as |
... | @@ -13,7 +13,7 @@ One alternative model structure here is the recurrent neuron cell. Herein, we as |
|
|
```math
|
|
```math
|
|
|
h_t = tanh(W^{xh}_{64}X_t + W^{hh}_{66}h_{t-1})
|
|
h_t = tanh(W^{xh}_{64}X_t + W^{hh}_{66}h_{t-1})
|
|
|
```
|
|
```
|
|
|
In the training, what we learn is how to combine X_t and h_{t-1} by finding the best weights giving the minimum error. Note that I use superscripts to explain the relationships in the weights and subscripts to denote the size of the wieght matrix. Also note that tanh is applied in an elementwise fashion. Since the equation is additive, at the beginning, h can be a vector of zeros as well.
|
|
In the training, what we learn is how to combine $`X_t`$ and $`h_{t-1}`$ by finding the best weights giving the minimum error. Note that I use superscripts to explain the relationships in the weights and subscripts to denote the size of the weight matrix. Also note that tanh is applied in an element-wise fashion. Since the equation is additive, at the beginning, h can be a vector of zeros as well.
|
|
|
In the next step, we find y from h. If we want to create an output of the same size, we adjust the weight matrix accordingly:
|
|
In the next step, we find y from h. If we want to create an output of the same size, we adjust the weight matrix accordingly:
|
|
|
|
|
|
|
|
```math
|
|
```math
|
| ... | @@ -26,6 +26,13 @@ For single output: |
... | @@ -26,6 +26,13 @@ For single output: |
|
|
y_t = W^{yh}_{16}h_t
|
|
y_t = W^{yh}_{16}h_t
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
|
or, we can also have another activation function here as well:
|
|
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
y_t = ReLU(W^{yh}_{16}h_t)
|
|
|
|
```
|
|
|
|
|
|
|
Imagine that we have an input X of the shape (1,3,4) where the second dimension is the time. If we use our model for all time steps, we get the following:
|
|
Imagine that we have an input X of the shape (1,3,4) where the second dimension is the time. If we use our model for all time steps, we get the following:
|
|
|
|
|
|
|
|
```math
|
|
```math
|
| ... | |
... | |
| ... | | ... | |