I have been working on the assignment for week 3 for about a week now and can’t seem to get a model that doesn’t overfit! I have tried models with only 1 LSTM layer and Conv1D layer which run a little faster but seem to be increasing in validation loss by epoch 5 even with dropout!
Currently I am running this model with the hopes that a more complex model will avoid increasing validation loss, but my epochs take so long to run that the notebook session ends before I’m done with training
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
### START CODE HERE
lstm1_dim = round(embedding_dim/20)
lstm2_dim = round(lstm1_dim/2)
model = tf.keras.Sequential([
# This is how you need to set the Embedding layer when using pre-trained embeddings
tf.keras.layers.Embedding(vocab_size + 1, embedding_dim, input_length=maxlen),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm1_dim, return_sequences=True)),
#tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm2_dim, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm2_dim, return_sequences = True)),
tf.keras.layers.GlobalMaxPooling1D(),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dropout(.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss="binary_crossentropy",
optimizer="adam",
metrics=['accuracy'])
Any thoughts or suggestions on model architecture/increased training speed without increasing validation loss?
Hi,
I believe we use collab in these courses. Can you try using a GPU instead of a CPU. You can select this from a collab menu.
Let me know if it helps.
@ewanww
Use Colab GPU to reduce training time, to select GPU follow these steps.
Edit> Notebook Setting > Hardware Accelerator > GPU and then click save .
Hi Ewan! It seems you have a very large network. Try using dropout after each LSTM’s (or simply use the dropout parameter of the layer) and see if you get better results. From other exercise, I also notice that Bidirectional LSTMs tends to overfit more quickly as you train longer. I’m not sure if you’re allowed to reduce epochs so I suggest just using LSTM. That will also train a lot quicker per epoch than bidirectional. Another thing you can try is increasing the dropout rate. I recommend continuing to use Coursera Labs for a more seamless submission process. Hope these help!
Hi Chris,
I changed my model to the following:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
### START CODE HERE
lstm1_dim = round(embedding_dim/20)
lstm2_dim = round(lstm1_dim/2)
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size + 1, embedding_dim, input_length=maxlen),
tf.keras.layers.LSTM(5, return_sequences=True, dropout=.5),
tf.keras.layers.LSTM(lstm2_dim, dropout=.5),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss="binary_crossentropy",
optimizer="adam",
metrics=['accuracy'])
### END CODE HERE
return model
but there doesn’t seem to be any change in the 20 minute epoch time, if I use Conv1D it also seems to take around twenty minutes even with just one or two layers (besides the last two dense layers)… Maybe there’s something wrong with my computer or I need to try to figure out how to use the colab notebooks?
Sorry for the lag,
Ewan
Hi Ewan! The architectures you used looks good although I recommend adjusting the dimensionalities of the LSTM layers on the first model you presented. Try experimenting with powers of two starting at 32. That should take about 80 seconds per epoch. The 2nd model you showed takes around 10 seconds per epoch.
Another source of problem might be the way the Embedding layer is initialized. The boilerplate code uses pre-defined embeddings:
And the create_model() should be called like this in the next ungraded cell:
model = create_model(VOCAB_SIZE, EMBEDDING_DIM, MAXLEN, EMBEDDINGS_MATRIX)
Please check if you are implementing it this way. In case you want to refresh your notebook and start from scratch, you can do so by following the steps here. Hope these help!
I have exactly same problem. I have been worked on the assignment for 4 days. Tried all kinds layer combinations. I have to used colab to test it. Each epoch for coursera notebook takes about 20 minutes. Did you solve the problem?