How many dense layers for a time series LSTM model

Hey,

I have been working on building a time series LSTM model to predict energy consumption of a building. My data set consists of date, energy consumption and temperature.

In the DeepLearning.AI TensorFlow course, Laurence Moroney uses either a single dense layer with the number of neurons=number of output predicted, or 2 dense layers with the first dense layer having more than one neuron (see architecture below) for its time series LSTM model.

e.g:
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(10, input_shape=[window_size], activation=‘relu’),
tf.keras.layers.Dense(150, activation=‘relu’),
tf.keras.layers.Dense(1)

Could anyone kindly enlighten me on why 2 dense layers is better than just the one? And how do you choose how many neurons you put in the first dense layer?

For reference here is my time series LSTM model I have build:

lstm_model = Sequential()

lstm_model.add(LSTM(128, activation=“relu”, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))
lstm_model.add(Dropout(0.1))

lstm_model.add(LSTM(128, activation=“relu”, return_sequences=True))
lstm_model.add(Dropout(0.1))

lstm_model.add(LSTM(40, activation=“relu”, return_sequences=False))
lstm_model.add(Dropout(0.1))

Output layer (predicting master_consumption)

lstm_model.add(Dense(10, activation=‘relu’))
lstm_model.add(Dense(1))

lstm_model.summary()
Model: “sequential”


Layer (type) Output Shape Param #

lstm (LSTM) (None, 48, 128) 69120

dropout (Dropout) (None, 48, 128) 0

lstm_1 (LSTM) (None, 48, 128) 131584

dropout_1 (Dropout) (None, 48, 128) 0

lstm_2 (LSTM) (None, 40) 27040

dropout_2 (Dropout) (None, 40) 0

dense (Dense) (None, 10) 410

dense_1 (Dense) (None, 1) 11

=================================================================
Total params: 228165 (891.27 KB)
Trainable params: 228165 (891.27 KB)
Non-trainable params: 0 (0.00 Byte)

Thanks for your help :slight_smile:!

The more layers you have (and also neurons) the more complex relationships between data you can fit because you have more paths and parameters! There is no particular rule as to how many neurons exactly you are to choose, its a trial and error whichever gives the best results!

1 Like