Neural network hyperparameters are the number of hidden layers, neurons per hidden layer, learning rate, and batch size. Hyperparameter tuning methods include grid search, random search and optimisation. As this analysis is a time-series analysis of sunspot prediction each window_size is set to 30 points (equal to 2.5 years) but can be changed later on if you want to experiment.
If you see in the window dataset function, each window_size is flattened and then shuffled which further is divided into batch_size.
LSTM layer is grid search hyperparameter help to prevent overfitting by ignoring randomly selected neurons during training, and hence reduces the sensitivity to the specific weights of individual neurons.
Lambda Layer allows us to perform arbitrary operations to effectively expand the functionality of TensorFlow’s Keras, and we can do this within the model definition itself. So the first Lambda layer will be used to help us with our dimensionality.
If you recall when we wrote the window dataset helper function, it returned two-dimensional batches of Windows on the data, with the first being the batch size and the second the number of timestamps. But an RNN expects three-dimensions; batch size, the number of timestamps, and the series dimensionality. With the Lambda layer, we can fix this without rewriting our Window dataset helper function. Using the Lambda, we just expand the array by one dimension. Similarly, if we scale up the outputs by 400, we can help training. The default activation function in the RNN layers is tan H which is the hyperbolic tangent activation. This outputs values between negative one and one. Since the time series values are in particular order, then scaling up the outputs to the same ballpark can help us with learning. We can do that in a Lambda layer too, we just simply multiply that by a multiples of 100 depending on dataset. This is agains affect the learning rate and well as time series analysis of training model.
Here again in this lab the optimizer used is SGD which is a variant of the Gradient Descent algorithm that is used for optimizing machine learning models. It addresses the computational inefficiency of traditional Gradient Descent methods when dealing with large datasets in machine learning projects.
Momentum of 0.9
If the momentum hyperparameter is set too close to 1 (e.g., 0.99999) when using an SGD optimizer, then the algorithm will likely pick up a lot of speed, hopefully moving roughly toward the global minimum, but its momentum will carry it right past the minimum.
Why dropout is not used in this model because we already have flatten the window batch in window dataset function and our aim in this model to train a model which could predict sunspot will not require this dropout layer again in the model algorithm as it is already been divided in a batch_size with a set window dataset function and the last lamba layer which is a multiple of 100s creating uniform scaling.
Regards
DP