Course4, Week3 Lab 2 Better MAE without any tricks. How can this be?

The lecture and notebook goes through many tricks to iteratively improve MAE on the dataset:

  1. Lambda layer after dense layer to upscale
  2. 2 Bidirectional LSTM layers instead of RNN
  3. Learning rate athletics
  4. Huber Loss instead of conventional mse
  5. Specialized SGD optimizer with momentum, etc.
    (6. 400 epochs)

Just to experiment, I implemented a very simple RNN without using any of the above 5 tricks. I was expecting poor MAE given my uber simple implementation. Surprisingly, I am getting much better MAE on on both validation set and training set in just 100 epochs. How can this be? I mean how can I get better results than shown in lectures with a 5X simpler implementation? Am I doing something really wrong?

Did you restart the kernel before doing this
There is a high probability that if you are reinitializing same model without restarting the kernel it has information learnt from previous epochs
Can you restart the kernel and try again