My LSTM model isnt improving no matter what... (either overfitting or not improving)

I am currently doing a lab for a university.

There is 4 choices. each of these options have 4 associated properties (circles) inside them, and they are arranged in a 4x4 grid. (each column is a choice, and each row has a diff type of prop) the props are amount win, probability win, amount lose, and delay. there was participants who came in and use there eye to pick a choice. initially alll the circles associated to each prop (each circle has number on one side of value of prop and other side blank), and the people look at a certain prop to turn it over. Only one prop may be turned over at a time, so only one value can be revealed at a time. the people can take as much time as they want and can look at any of the prop values for any of the choices. At the end, they pick a choice based on what they see.

my data (x) essentialy consists of sequences of the different tiles they look at, and the (y) is the label the pick

i am trying to predict what label they pick, but my code is overfitting. i have changed some stuff before, but it either just doesn’t fit at all (stays at 25%, which isnt good considering there at 4 options). is my data just not possible to fit with a ml model? i have quite a bit of data.

here is my current model and epochs:

model:

from tensorflow.keras.optimizers import Adam

# Normalize the data
X_train = X_train / np.max(X_train)
X_test= X_test / np.max(X_test)

# Compute class weights
class_weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
class_weights = {i: class_weights[i] for i in range(len(class_weights))}

input_layer = Input(shape=(X_train.shape[1], X_train.shape[2]))
lstm = LSTM(128, return_sequences=True)(input_layer)
dropout1 = Dropout(0.4)(lstm)
lstm2 = LSTM(64, return_sequences=False)(dropout1)
dropout2 = Dropout(0.4)(lstm2)
dense1 = Dense(64, activation='relu')(dropout2)
dropout3 = Dropout(0.4)(dense1)
output_layer = Dense(4, activation='softmax')(dropout3)

model = Model(inputs=input_layer, outputs=output_layer)
optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6)

model.summary()

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_data=(X_test, y_test),
    class_weight=class_weights,
    callbacks=[early_stopping, reduce_lr]
)

epochs:

Have you tried using a much simpler model to begin with?

Have you attended Course 5 of the Deep Learning Specialization?

Yes, I agree this is not the easiest to start with, but the course is highly recommended. They also go into a lot more detail on RNN’s in course 3 of the NLP specialization

Also, realize, just from a quick glance, you have only worked out a train and test set (i.e. no validation set)-- Which okay, if you want to run rough and fast with it and have nothing to tune on.

But in your model you are using your test set for validation-- That’s not how that’s supposed to work.

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_data=(X_test, y_test),
    class_weight=class_weights,
    callbacks=[early_stopping, reduce_lr]
)

Could be a whole host of other issues too including optimizer choice, and your learning rate is already pretty low.

A lot, in the end, really depends on the data you have and the actual problem you are trying to solve (i.e. does an LSTM make sense on the problem/data ?).

DLS is good for learning how to frame all those problems.