| ... | ... | @@ -180,7 +180,14 @@ Once trained, such an approach can make predictions for long periods of time wit |
|
|
|
|
|
|
|
## Backpropagation Through Time
|
|
|
|
|
|
|
|
...
|
|
|
|
Training of NN is based on updating the weight matrices with back-propagation (you may want to [check here](https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/) for a better understanding of backprop). The same strategy is also deployed here. The only difference is we first unroll the graph in time before applying the same algorithm.
|
|
|
|
|
|
|
|
You can think each time step of the unrolled RNN as an additional MLP layer, where the internal state from the previous step is the input of the following time step. This is not something we can overlook; if you have a 3 layered RNN which process a sequence of 24, it is as if you have 72 layers of MLP in the training. It means you have to calculate many derivatives required for a single weight update, making the learning process slow (compare the notebooks from week 9 and 7). The practically much deeper architecture also causes weights to vanish or explode, and this is why we typically need to use special recurrent cells if the sequence lengths and/or the NN architecture is deep.
|
|
|
|
|
|
|
|
If interested in more technical details, see:
|
|
|
|
|
|
|
|
- [Backpropagation through time](https://d2l.ai/chapter_recurrent-neural-networks/bptt.html)
|
|
|
|
- [Backpropagation through time: what it does and how to do it](https://ieeexplore.ieee.org/document/58337)
|
|
|
|
|
|
|
|
## Special Recurrent Cells
|
|
|
|
|
| ... | ... | |
| ... | ... | |