Lip Reading Neural Network - Same Training and Validation Accuracy

I am currently working on a project focused on lip reading using neural networks and have encountered a peculiar issue that has left me scratching my head. Despite my best efforts, I am consistently getting the same training and validation accuracy. I believe there might be something wrong with my model architecture or training process.

I have designed a neural network for lip reading, and the architecture is as follows:

input_shape = (22, 80, 112, 3)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv3D(8, (3, 3, 3), activation='relu', input_shape=input_shape, kernel_regularizer=regularizers.l2(0.001)))
model.add(tf.keras.layers.MaxPooling3D((2, 2, 2)))
model.add(tf.keras.layers.Conv3D(32, (3, 3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(tf.keras.layers.MaxPooling3D((2, 2, 2)))
model.add(tf.keras.layers.Reshape((16, 3744), input_shape=input_shape))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(256, return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(50, activation='softmax'))

model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

The issue I am facing is that both training and validation accuracy seem to plateau at the same level beyond 6th epoch, indicating a potential problem. I have tried tweaking hyperparameters, adjusting layers, and even modifying the architecture, but the problem persists. I have low training and validation accuracy of 30%

Questions:

  1. Is there anything wrong with my model architecture?
  2. Are there specific hyperparameters that I should focus on adjusting for better convergence?

I appreciate your time and expertise in advance.Looking forward to your responses!

1 Like

That’s a very complicated model.
How many output categories do you have (I’m guessing it’s 50?).
Is that also how many labels are in your training data?
How much data do you have for training and validation?

Yeah I Do have 50 unique labels in entire dataset and the size of dataset is 2323 For training I am using 1858 and 465 for testing

1 Like

That’s not a lot of examples of each label, considering how many weights you have to learn.

1 Like

I am not able to collect more data though is there any other alternatives to improve my model accuracy

1 Like

You can try augmenting the data set.

Is your training set made up of video clips, or separate images?

2 Likes

Okay I will try that. Its made up of images

1 Like

You might also try a lot simpler model, get a baseline, and then only add as much complexity as you need to improve it.

Start with a larger layer of conv3D like 128 and also kernel size, also consider reducing dropout percentage and reduce size of test set