Training leads to overfitting

I have tried multiple models yet still can’t get over the overfitting situation. Can anyone suggest ways to solve the problem?

My code is:

def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
gru_dim = 32
dense_dim = 32
filters = 128
kernel_size = 5
lstm_dim = 32
### START CODE HERE

model = tf.keras.Sequential([ 
    # This is how you need to set the Embedding layer when using pre-trained embeddings
    tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False),
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(gru_dim,return_sequences = True)),
    tf.keras.layers.GlobalMaxPooling1D(),
    
    tf.keras.layers.Dense(dense_dim, activation='relu'),
    
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy',optimizer='adam',
              metrics=['accuracy']) 

### END CODE HERE

return model

Hey Stan!

Welcome to the Discourse!

The first thing I might suggest is going for simpler model architecture. If you need some ideas on where to get started with that, I would reference the previous labs from the week!

I hope that provides some insights, and if you have any further questions - please don’t hesitate to reach out! :smiley:

Hi Chris,

Thank you for quick response. I believe I have tried the suggested models, yet while close they never satisfy the gradient requirement.

For example as a starting point I have tried:

def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
gru_dim = 32
dense_dim = 40

model_gru = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False),
 tf.keras.layers.Bidirectional(tf.keras.layers.GRU(gru_dim)),
tf.keras.layers.Dense(dense_dim, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')

])

model_gru.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=['accuracy']) 


### END CODE HERE

return model_gru

Really not sure what am I missing?

So when I run the provided model, I am able to pass the assessment.

Are you not finding that to be the case - or is your question more generally: “How can we further reduce overfitting”?

Sorry for the potential confusion, Stan! :smile:

Thank you so much for looking into it! For some reason I am unable to pass the assignment with the model, for me the gradient is incorrect. All the other tests seem to be ok. I am uploading my full notebook. I have spent a lot of trying and can’t find any obvious errors
C3W3_Assignment (1).ipynb (75.2 KB)