This had me baffled for a while as well. Looking at the results of model.summary()
gave some helpful clues. The issue was with the second LSTM
including return_sequences=True
.
First, I made a change to the input_size
from [None, 1]
to [window_size, 1]
. The Conv1D
takes an input_size
as [n_timesteps, n_features]
. While it can take an input_size
of [None, n_features]
to accept variable length sequences each with n_features
, we are working with fix-length sequences so why not use the explicit input size?
Check out the model summary of the original model (with the change to the input_size
):
window_size = 64
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=5, strides=1, padding="causal", activation="relu", input_shape=[window_size, 1]),
tf.keras.layers.LSTM(60, return_sequences=True),
tf.keras.layers.LSTM(60, return_sequences=True),
tf.keras.layers.Dense(30, activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 400)
])
model.summary()
Output:
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_4 (Conv1D) (None, 64, 64) 384
lstm_8 (LSTM) (None, 64, 60) 30000
lstm_9 (LSTM) (None, 64, 60) 29040
dense_12 (Dense) (None, 64, 30) 1830
dense_13 (Dense) (None, 64, 10) 310
dense_14 (Dense) (None, 64, 1) 11
lambda_4 (Lambda) (None, 64, 1) 0
=================================================================
Total params: 61,575
Trainable params: 61,575
Non-trainable params: 0
_________________________________________________________________
You can see that the output is shaped [batch_size, 64, 1]
Changing this last LSTM to return_sequences=False
changes the final output shape.
window_size = 64
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=5, strides=1, padding="causal", activation="relu", input_shape=[window_size, 1]),
tf.keras.layers.LSTM(60, return_sequences=True),
tf.keras.layers.LSTM(60, return_sequences=False),
tf.keras.layers.Dense(30, activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 400)
])
model.summary()
Outputs:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_5 (Conv1D) (None, 64, 64) 384
lstm_10 (LSTM) (None, 64, 60) 30000
lstm_11 (LSTM) (None, 60) 29040
dense_15 (Dense) (None, 30) 1830
dense_16 (Dense) (None, 10) 310
dense_17 (Dense) (None, 1) 11
lambda_5 (Lambda) (None, 1) 0
=================================================================
Total params: 61,575
Trainable params: 61,575
Non-trainable params: 0
_________________________________________________________________
The two models have the same number of weights, so the model training time is similar, but the training loss and error metrics by epoch are very different.
It’s been a long time since this thread started but I hope others find this useful.
cheers,
Dennis