Building model for categorical time series

gustavyeung · June 15, 2022, 6:02am

Hello,

I built two time-series models with difference architectures trying to predict an arbitrary integer sequence. The integer sequence was one-hot encoded.

The first model looks like this

Model: “sequential_1”

Layer (type) Output Shape Param #

lstm_2 (LSTM) (None, 1, 50) 11000

bidirectional_1 (Bidirectional) (None, 1, 2000) 8408000

lstm_4 (LSTM) (None, 1000) 12004000

batch_normalization_1 (BatchNormalization) (None, 1000) 4000

dense_2 (Dense) (None, 100) 100100

dense_3 (Dense) (None, 4) 404

=================================================================
Total params: 20,527,504
Trainable params: 20,525,504
Non-trainable params: 2,000

The second, more complicated model looks like this
Model: “sequential”

Layer (type) Output Shape Param #

lstm (LSTM) (None, 1, 50) 11000

bidirectional (Bidirectional) (None, 1, 2000) 8408000

bidirectional_1 (Bidirectional) (None, 1, 2000) 24008000

bidirectional_2 (Bidirectional) (None, 1, 2000) 24008000

bidirectional_3 (Bidirectional) (None, 1, 2000) 24008000

lstm_5 (LSTM) (None, 1000) 12004000

batch_normalization (BatchNormalization) (None, 1000) 4000

dense (Dense) (None, 100) 100100

dense_1 (Dense) (None, 4) 404

=================================================================
Total params: 92,551,504
Trainable params: 92,549,504
Non-trainable params: 2,000

Training (20k epochs) of both models ended up with similar loss figures
(first)
loss: 1.3517 - accuracy: 0.3077 - val_loss: 1.5252 - val_accuracy: 0.5000 - lr: 8.1873e-15

(second)
loss: 1.3517 - accuracy: 0.2308 - val_loss: 1.5252 - val_accuracy: 0.0000e+00 - lr: 6.7032e-15

I already have incorporated LearningRateScheduler and ReduceLROnPlateau callbacks

My questions are

Why did the accuracies never change after so many training epochs?
What can I do to raise the accuracy?
Why did the losses of both architecture end up with same loss figures?
I one-hot encoded the input sequence to ensure the prediction would produce an integer. Is one-hot encoding the right approach at all?

Thanks in advance.

SteveSun · June 15, 2022, 11:45pm

First of all, let me explain that the Bi-LSTM setting should not exceed two layers, otherwise it is very easy to overfit; the problems of accuracy and loss are caused by many aspects, you can consider using a smaller learning rate, or reduce the number of network layers. Another point, in general, time series do not need one-hot encoding.

gustavyeung · June 17, 2022, 9:41am

Thanks for your reply. I will try other architectures, but why do you think my model is overfitting? The loss practically didn’t change after 100 epochs, while the accuracy was not practically moving either.

But my main issue is, how can we improve the situation when the loss goes flat for a thousand epochs? I have used learning rate scheduler and the learning rate was in the order of 1e-15.

The other point is, if we are trying to predict a categorical sequence and not using one-hot encoder, what other options do we have?

gustavyeung · June 18, 2022, 3:43am

The reason I put up the performances of two models is that, no matter how complex (model 2) or simple (model 1) the model is, the resulting loss is more or less the same. Is that something we can improve by tweaking the model architecture?

As the loss has stabilised after 10000 training epochs, can we use the trained model to predict, even the accuracy is zero?

Topic		Replies	Views
C4W4 Assignment - model architecture Sequences, Time Series and Prediction week-module-4	9	801	August 21, 2022
C3W3_Assignment High training/validation accuracy after one epoch Natural Language Processing in TensorFlow	7	498	December 29, 2023
week3//A1//UNQ_C2 Sequence Models coursera-platform	1	608	September 24, 2021
Model arch: seq to seq or seq to 1? NLP with Sequence Models week-module-2	2	465	May 24, 2023
Sequences, Time Series and Prediction Course Sequences, Time Series and Prediction week-module-4	5	585	November 23, 2022

Building model for categorical time series

Related topics