Hello! 
Laurence describes his use of a lambda layer as a way to change the scale of the inputs/outputs so that they match the values in the series (if I understood correctly). Is there a rule of thumb for picking a relevant scale factor, knowing that the trend over time can bring in the same series values separated by orders of magnitude? Could the mean value of Ys be suitable or is there something more grounded with the nature of the series?
Scale of data matters when it comes to training a model quickly. This goes back to course 1 week 1 (house price prediction) that asks you to take this into consideration. Hereβs another post that iterates on the importance of scale of data.
Please look at course 4 week 1 material on how to remove trend and seasonality to make time series stationary. This step is done before you take further steps for rescaling the dataset based on parameters learnt from the training split.
Do look at these posts as well:
- Post 1
- Post 2
Thanks Balaji! I read through the posts you referred me to. So, the scaling layer acts in some way as normalising the data to avoid problems with the activation functions, and it gives a new hyperparameter to play with, right?
Scaling has to do with the optimizer and not the activation function of a layer. The scaling parameter is indeed a hyperparameter to tune.
Edit: While the choice of activation function matters, scaling significantly impacts the learning ability of the modal.
1 Like