This had me baffled for a while as well. Looking at the results of `model.summary()`

gave some helpful clues. The issue was with the second `LSTM`

including `return_sequences=True`

.

First, I made a change to the `input_size`

from `[None, 1]`

to `[window_size, 1]`

. The `Conv1D`

takes an `input_size`

as `[n_timesteps, n_features]`

. While it can take an `input_size`

of `[None, n_features]`

to accept variable length sequences each with `n_features`

, we are working with fix-length sequences so why not use the explicit input size?

Check out the model summary of the original model (with the change to the `input_size`

):

```
window_size = 64
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=5, strides=1, padding="causal", activation="relu", input_shape=[window_size, 1]),
tf.keras.layers.LSTM(60, return_sequences=True),
tf.keras.layers.LSTM(60, return_sequences=True),
tf.keras.layers.Dense(30, activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 400)
])
model.summary()
```

Output:

```
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_4 (Conv1D) (None, 64, 64) 384
lstm_8 (LSTM) (None, 64, 60) 30000
lstm_9 (LSTM) (None, 64, 60) 29040
dense_12 (Dense) (None, 64, 30) 1830
dense_13 (Dense) (None, 64, 10) 310
dense_14 (Dense) (None, 64, 1) 11
lambda_4 (Lambda) (None, 64, 1) 0
=================================================================
Total params: 61,575
Trainable params: 61,575
Non-trainable params: 0
_________________________________________________________________
```

You can see that the output is shaped `[batch_size, 64, 1]`

Changing this last LSTM to `return_sequences=False`

changes the final output shape.

```
window_size = 64
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=5, strides=1, padding="causal", activation="relu", input_shape=[window_size, 1]),
tf.keras.layers.LSTM(60, return_sequences=True),
tf.keras.layers.LSTM(60, return_sequences=False),
tf.keras.layers.Dense(30, activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 400)
])
model.summary()
```

Outputs:

```
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_5 (Conv1D) (None, 64, 64) 384
lstm_10 (LSTM) (None, 64, 60) 30000
lstm_11 (LSTM) (None, 60) 29040
dense_15 (Dense) (None, 30) 1830
dense_16 (Dense) (None, 10) 310
dense_17 (Dense) (None, 1) 11
lambda_5 (Lambda) (None, 1) 0
=================================================================
Total params: 61,575
Trainable params: 61,575
Non-trainable params: 0
_________________________________________________________________
```

The two models have the same number of weights, so the model training time is similar, but the training loss and error metrics by epoch are very different.

It’s been a long time since this thread started but I hope others find this useful.

cheers,

Dennis