I have tried multiple models yet still can’t get over the overfitting situation. Can anyone suggest ways to solve the problem?
My code is:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
gru_dim = 32
dense_dim = 32
filters = 128
kernel_size = 5
lstm_dim = 32
### START CODE HERE
model = tf.keras.Sequential([
# This is how you need to set the Embedding layer when using pre-trained embeddings
tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False),
tf.keras.layers.Bidirectional(tf.keras.layers.GRU(gru_dim,return_sequences = True)),
tf.keras.layers.GlobalMaxPooling1D(),
tf.keras.layers.Dense(dense_dim, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',
metrics=['accuracy'])
### END CODE HERE
return model
Hey Stan!
Welcome to the Discourse!
The first thing I might suggest is going for simpler model architecture. If you need some ideas on where to get started with that, I would reference the previous labs from the week!
I hope that provides some insights, and if you have any further questions - please don’t hesitate to reach out!
Hi Chris,
Thank you for quick response. I believe I have tried the suggested models, yet while close they never satisfy the gradient requirement.
For example as a starting point I have tried:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
gru_dim = 32
dense_dim = 40
model_gru = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False),
tf.keras.layers.Bidirectional(tf.keras.layers.GRU(gru_dim)),
tf.keras.layers.Dense(dense_dim, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model_gru.compile(loss="binary_crossentropy",
optimizer="adam",
metrics=['accuracy'])
### END CODE HERE
return model_gru
Really not sure what am I missing?
So when I run the provided model, I am able to pass the assessment.
Are you not finding that to be the case - or is your question more generally: “How can we further reduce overfitting”?
Sorry for the potential confusion, Stan!
Thank you so much for looking into it! For some reason I am unable to pass the assignment with the model, for me the gradient is incorrect. All the other tests seem to be ok. I am uploading my full notebook. I have spent a lot of trying and can’t find any obvious errors
C3W3_Assignment (1).ipynb (75.2 KB)