Training leads to overfitting

Stan_Sokol · April 13, 2022, 1:15pm

I have tried multiple models yet still can’t get over the overfitting situation. Can anyone suggest ways to solve the problem?

My code is:

def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
gru_dim = 32
dense_dim = 32
filters = 128
kernel_size = 5
lstm_dim = 32
### START CODE HERE

model = tf.keras.Sequential([ 
    # This is how you need to set the Embedding layer when using pre-trained embeddings
    tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False),
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(gru_dim,return_sequences = True)),
    tf.keras.layers.GlobalMaxPooling1D(),
    
    tf.keras.layers.Dense(dense_dim, activation='relu'),
    
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy',optimizer='adam',
              metrics=['accuracy']) 

### END CODE HERE

return model

CSAlexiuk · April 13, 2022, 6:33pm

Hey Stan!

Welcome to the Discourse!

The first thing I might suggest is going for simpler model architecture. If you need some ideas on where to get started with that, I would reference the previous labs from the week!

I hope that provides some insights, and if you have any further questions - please don’t hesitate to reach out!

Stan_Sokol · April 13, 2022, 9:46pm

Hi Chris,

Thank you for quick response. I believe I have tried the suggested models, yet while close they never satisfy the gradient requirement.

For example as a starting point I have tried:

def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
gru_dim = 32
dense_dim = 40

model_gru = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False),
 tf.keras.layers.Bidirectional(tf.keras.layers.GRU(gru_dim)),
tf.keras.layers.Dense(dense_dim, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')

])

model_gru.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=['accuracy']) 


### END CODE HERE

return model_gru

Really not sure what am I missing?

CSAlexiuk · April 13, 2022, 10:32pm

So when I run the provided model, I am able to pass the assessment.

Are you not finding that to be the case - or is your question more generally: “How can we further reduce overfitting”?

Sorry for the potential confusion, Stan!

Stan_Sokol · April 14, 2022, 7:41am

Thank you so much for looking into it! For some reason I am unable to pass the assignment with the model, for me the gradient is incorrect. All the other tests seem to be ok. I am uploading my full notebook. I have spent a lot of trying and can’t find any obvious errors
C3W3_Assignment (1).ipynb (75.2 KB)

Topic		Replies	Views
Model always overfitting Natural Language Processing in TensorFlow	3	481	May 11, 2022
Week 3, Weekly Assignment, Unable to prevent overfitting of the model Natural Language Processing in TensorFlow week-3	7	256	August 23, 2023
Architecture has 20 minutes per epoch Natural Language Processing in TensorFlow week-3	12	593	August 30, 2023
C3W3 Exploring Overfitting in NLP - Question about model Natural Language Processing in TensorFlow	7	51	September 9, 2024
Week 3: Exploring Overfitting in NLP Natural Language Processing in TensorFlow week-3	4	333	May 23, 2023

Training leads to overfitting

Related topics