Autoencoders

C. Ates

Autoencoders
Hints and Tips
Additional resources

Autoencoders are feed-forward neural networks designed for a particular purpose: feature discovery. The model can be considered as a form of unsupervised learning application as it does not require labels. In the backpropagation algorithm, the cost function is calculated with the input X itself so that the model is trained to predict the input itself.

Figure 1.

The number of neurons within the hidden layers typically follows a bow tie trend. This is done to ensure that model is forced to learn low dimensional representations rather than creating a useless identity mapping. With this strategy, we can minimize the reconstruction error (‘X-X^{‘’}’) while hidden layers learn the most important features of the original high dimensional data.

...

Hints and Tips

Linear auto-encoders are equivalent to PCA. If you do not use a nonlinear activation function, no need to bother with NN training.
If the reconstruction loss cannot be improved, you may got stuck in a local minima. Play around to pay pass it. If still not possible, a greedy training approach (like Boltzmann machines) can be used to initialize the network weights for a better performance. Alternatively, you can recursively train the layers of the autoencoder. First, train a simple AE instead of a deep one. Once you are fine with the reconstruction loss, get the latent representation as a second data set and create a second AE model, further reduce the number of dimensions. Once the training of the second model is done, use its latent space for the third model and so on. Once it is finished, you can combined the trained layers together, giving a big AE.
If you want to build an autoencoder for high dimensional datasets like images, you will need a convolutional autoencoder. In such a case, you can trade the spatial dimensions of the data (width and height of an image) with the image features (depth; number of channels) in the encoder. The decoder is to be working in the opposite direction. For such a purpose, you can use transpose convolutional layers in TensorFlow (e.g., layers.Conv2DTranspose). This is illustrated in DDE 2 lecture, CNN week.
If you are dealing with series, you can use RNN layers as an AE as well. Herein, AE is to be built as a seq-to-seq model, where the encoder acts like a sequence-to-vector RNN and the decoder functions as a vector-to-sequence RNN. This can be achieved via “return_sequences=False” ,RepeatVector layer and TimeDistributed layer in TF. See the notebook from RNN week for an explicit illustration.

Additional resources

Comments

Please register or sign in to add a comment.