The lecture and notebook goes through many tricks to iteratively improve MAE on the dataset:
- Lambda layer after dense layer to upscale
- 2 Bidirectional LSTM layers instead of RNN
- Learning rate athletics
- Huber Loss instead of conventional mse
- Specialized SGD optimizer with momentum, etc.
(6. 400 epochs)
Just to experiment, I implemented a very simple RNN without using any of the above 5 tricks. I was expecting poor MAE given my uber simple implementation. Surprisingly, I am getting much better MAE on on both validation set and training set in just 100 epochs. How can this be? I mean how can I get better results than shown in lectures with a 5X simpler implementation? Am I doing something really wrong?