Building model for categorical time series

Hello,

I built two time-series models with difference architectures trying to predict an arbitrary integer sequence. The integer sequence was one-hot encoded.

The first model looks like this

Model: “sequential_1”


Layer (type) Output Shape Param #


lstm_2 (LSTM) (None, 1, 50) 11000

bidirectional_1 (Bidirectional) (None, 1, 2000) 8408000

lstm_4 (LSTM) (None, 1000) 12004000

batch_normalization_1 (BatchNormalization) (None, 1000) 4000

dense_2 (Dense) (None, 100) 100100

dense_3 (Dense) (None, 4) 404

=================================================================
Total params: 20,527,504
Trainable params: 20,525,504
Non-trainable params: 2,000

The second, more complicated model looks like this
Model: “sequential”


Layer (type) Output Shape Param #


lstm (LSTM) (None, 1, 50) 11000

bidirectional (Bidirectional) (None, 1, 2000) 8408000

bidirectional_1 (Bidirectional) (None, 1, 2000) 24008000

bidirectional_2 (Bidirectional) (None, 1, 2000) 24008000

bidirectional_3 (Bidirectional) (None, 1, 2000) 24008000

lstm_5 (LSTM) (None, 1000) 12004000

batch_normalization (BatchNormalization) (None, 1000) 4000

dense (Dense) (None, 100) 100100

dense_1 (Dense) (None, 4) 404

=================================================================
Total params: 92,551,504
Trainable params: 92,549,504
Non-trainable params: 2,000

Training (20k epochs) of both models ended up with similar loss figures
(first)
loss: 1.3517 - accuracy: 0.3077 - val_loss: 1.5252 - val_accuracy: 0.5000 - lr: 8.1873e-15
image
image

(second)
loss: 1.3517 - accuracy: 0.2308 - val_loss: 1.5252 - val_accuracy: 0.0000e+00 - lr: 6.7032e-15
image
image

I already have incorporated LearningRateScheduler and ReduceLROnPlateau callbacks

My questions are

  1. Why did the accuracies never change after so many training epochs?
  2. What can I do to raise the accuracy?
  3. Why did the losses of both architecture end up with same loss figures?
  4. I one-hot encoded the input sequence to ensure the prediction would produce an integer. Is one-hot encoding the right approach at all?

Thanks in advance.

First of all, let me explain that the Bi-LSTM setting should not exceed two layers, otherwise it is very easy to overfit; the problems of accuracy and loss are caused by many aspects, you can consider using a smaller learning rate, or reduce the number of network layers. Another point, in general, time series do not need one-hot encoding.

Thanks for your reply. I will try other architectures, but why do you think my model is overfitting? The loss practically didn’t change after 100 epochs, while the accuracy was not practically moving either.

But my main issue is, how can we improve the situation when the loss goes flat for a thousand epochs? I have used learning rate scheduler and the learning rate was in the order of 1e-15.

The other point is, if we are trying to predict a categorical sequence and not using one-hot encoder, what other options do we have?

The reason I put up the performances of two models is that, no matter how complex (model 2) or simple (model 1) the model is, the resulting loss is more or less the same. Is that something we can improve by tweaking the model architecture?

As the loss has stabilised after 10000 training epochs, can we use the trained model to predict, even the accuracy is zero?