Changes

Cihan Ates · e6782b4f
--- a/DDE-1/Time-series-analysis-II.md
+++ b/DDE-1/Time-series-analysis-II.md
@@ -116,10 +116,28 @@ This is how the custom model looks like:

 RNN model as a graph. Here the features are noted as a,b,c,d for simplicity. Computational nodes are given as gray-blue. Time information is highlighted with purple. Note that recurrent neurons (RNs) are unrolled in time (for three steps we pass in the training instances (1,3,4)). 
 </div>
-...

 Let's go over it step by step. In one instance, we have 12 values: 4 features at 3 consecutive time steps. These are denoted with numbers. When we pass an example to the model, what we do first is to feed the data with time label (1) to the 6 recurrent neurons(RNs). Note here that we have increased the number of dimensions from 4 to 6. This is a critical decision and what we essentially do here is feature engineering. We looked at different combinations of our base 4 features. In this very first step, RNs will update their hidden states and give us 6 memory signals, h (denoted as purple arrows).

 In the next time iteration, same RNs will use the current hidden states and the second time data for base features a,b,c,d (denoted as (2)) to update the hidden states once more. We do the same again for the input data for the third time step (denoted as (3) in the figure).

+<div align="center">
+<img src="uploads/70f0652481b74afbd53f9d7e7d4b7a7d/rnn_n2.png"  width="600">
+
+RNN graph representation. Here you can see which time data is fed at which unrolled time for the first RNN layer explicitly. 
+</div>
+
+At this stage, we reach a critical decision point. (i) Do I want to use the intermediate temporal outputs from the first RNN layer, or (ii) am I only interested in the final output of the first layer? This decision will change how the following RNN layer functions. 
+
+In the first choice, we simply connects the output of every unrolled time to the corresponding time of the following layer. Lets call it m:
+
+<div align="center">
+<img src="uploads/278fad37adc0e79876326a7b2f2d10a7/rnn_n3.png"  width="600">
+
+RNN graph representation. Here you can see which time data is fed at which unrolled time for the first and second RNN layer explicitly. Outputs of the first RNN layer m is connected to the corresponding time of the second RNN layer. Note that the output of the second RNN layer is denoted as n, which is connected to the first MLP layer. The output of the first MLP layer p is then fed to the final output layer.
+</div>
+
+Note that each m vector has a different time stamp. By doing so, we are keeping the relationships between time steps separately and following it explicitly. In TensorFlow, this is done by simply stating "return_sequences=True" in the layer definition.  
+
+The next layer of our graph is the MLP layer, a basic neural network layer. We again have two options: (i) we can connect every time step to a different MLP layer, (ii) use the most recent time output. We will discuss the former in encoder-decoder implementation below. This time, I am only interested making predictions one hour ahead of time so my network should tell me the end result. I am not interested in guessing sequences. Therefore, we only pass the most recent output $`n_i^3`$ to the MLP. In TensorFlow, we either state "return_sequences=False" or say nothing for this option, as this is the default setting. In the final step, we will reduce the number of dimensions further from 3 to 1, by adding a single-neuron layer with linear activation function.