C4_W3_Lab2
The lecture and notebook goes through many tricks to iteratively improve MAE on the dataset:
- Lambda layer after dense layer to upscale
 - 2 Bidirectional LSTM layers instead of RNN
 - Learning rate athletics
 - Huber Loss instead of conventional mse
 - Specialized SGD optimizer with momentum, etc.
(6. 400 epochs) 
Just to experiment, I implemented a very simple RNN without using any of the above 5 tricks. I was expecting poor MAE given my uber simple implementation. Surprisingly, I am getting much better MAE on on both validation set and training set in just 100 epochs. How can this be? I mean how can I get better results than shown in lectures with a 5X simpler implementation? Am I doing something really wrong?
