Hello,
I built two time-series models with difference architectures trying to predict an arbitrary integer sequence. The integer sequence was one-hot encoded.
The first model looks like this
Model: “sequential_1”
Layer (type) Output Shape Param #
lstm_2 (LSTM) (None, 1, 50) 11000
bidirectional_1 (Bidirectional) (None, 1, 2000) 8408000
lstm_4 (LSTM) (None, 1000) 12004000
batch_normalization_1 (BatchNormalization) (None, 1000) 4000
dense_2 (Dense) (None, 100) 100100
dense_3 (Dense) (None, 4) 404
=================================================================
Total params: 20,527,504
Trainable params: 20,525,504
Non-trainable params: 2,000
The second, more complicated model looks like this
Model: “sequential”
Layer (type) Output Shape Param #
lstm (LSTM) (None, 1, 50) 11000
bidirectional (Bidirectional) (None, 1, 2000) 8408000
bidirectional_1 (Bidirectional) (None, 1, 2000) 24008000
bidirectional_2 (Bidirectional) (None, 1, 2000) 24008000
bidirectional_3 (Bidirectional) (None, 1, 2000) 24008000
lstm_5 (LSTM) (None, 1000) 12004000
batch_normalization (BatchNormalization) (None, 1000) 4000
dense (Dense) (None, 100) 100100
dense_1 (Dense) (None, 4) 404
=================================================================
Total params: 92,551,504
Trainable params: 92,549,504
Non-trainable params: 2,000
Training (20k epochs) of both models ended up with similar loss figures
(first)
loss: 1.3517 - accuracy: 0.3077 - val_loss: 1.5252 - val_accuracy: 0.5000 - lr: 8.1873e-15
(second)
loss: 1.3517 - accuracy: 0.2308 - val_loss: 1.5252 - val_accuracy: 0.0000e+00 - lr: 6.7032e-15
I already have incorporated LearningRateScheduler and ReduceLROnPlateau callbacks
My questions are
- Why did the accuracies never change after so many training epochs?
- What can I do to raise the accuracy?
- Why did the losses of both architecture end up with same loss figures?
- I one-hot encoded the input sequence to ensure the prediction would produce an integer. Is one-hot encoding the right approach at all?
Thanks in advance.