Architecture has 20 minutes per epoch

ewanww · April 8, 2022, 6:58pm

Hello!

I have been working on the assignment for week 3 for about a week now and can’t seem to get a model that doesn’t overfit! I have tried models with only 1 LSTM layer and Conv1D layer which run a little faster but seem to be increasing in validation loss by epoch 5 even with dropout!

Currently I am running this model with the hopes that a more complex model will avoid increasing validation loss, but my epochs take so long to run that the notebook session ends before I’m done with training
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):

### START CODE HERE

lstm1_dim = round(embedding_dim/20)
lstm2_dim = round(lstm1_dim/2)

model = tf.keras.Sequential([ 
    # This is how you need to set the Embedding layer when using pre-trained embeddings
    tf.keras.layers.Embedding(vocab_size + 1, embedding_dim, input_length=maxlen),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm1_dim, return_sequences=True)),
    
    #tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm2_dim, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm2_dim, return_sequences = True)),
    tf.keras.layers.GlobalMaxPooling1D(),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dropout(.2),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=['accuracy'])

Any thoughts or suggestions on model architecture/increased training speed without increasing validation loss?

Thanks so much!
Ewan

vsnupoudel · April 9, 2022, 6:01am

Hi,
I believe we use collab in these courses. Can you try using a GPU instead of a CPU. You can select this from a collab menu.
Let me know if it helps.

rakibhossain1521 · April 9, 2022, 4:34pm

@ewanww
Use Colab GPU to reduce training time, to select GPU follow these steps.
Edit> Notebook Setting > Hardware Accelerator > GPU and then click save .

ewanww · April 19, 2022, 6:52pm

Ok, I will do that but how do i submit the pickle file for the assignment later?

ewanww · April 19, 2022, 7:11pm

Also, there is no badge to open in colab from the assignment notebook…

rakibhossain1521 · April 20, 2022, 9:34am

you can see this for submitting the pickle file for the assignment

chris.favila · April 20, 2022, 3:00pm

Hi Ewan! It seems you have a very large network. Try using dropout after each LSTM’s (or simply use the dropout parameter of the layer) and see if you get better results. From other exercise, I also notice that Bidirectional LSTMs tends to overfit more quickly as you train longer. I’m not sure if you’re allowed to reduce epochs so I suggest just using LSTM. That will also train a lot quicker per epoch than bidirectional. Another thing you can try is increasing the dropout rate. I recommend continuing to use Coursera Labs for a more seamless submission process. Hope these help!

ewanww · May 1, 2022, 10:53pm

Hi Chris,
I changed my model to the following:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):

### START CODE HERE

lstm1_dim = round(embedding_dim/20)
lstm2_dim = round(lstm1_dim/2)

model = tf.keras.Sequential([ 
    tf.keras.layers.Embedding(vocab_size + 1, embedding_dim, input_length=maxlen),
    tf.keras.layers.LSTM(5, return_sequences=True, dropout=.5),
     tf.keras.layers.LSTM(lstm2_dim, dropout=.5),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=['accuracy']) 

### END CODE HERE

return model

but there doesn’t seem to be any change in the 20 minute epoch time, if I use Conv1D it also seems to take around twenty minutes even with just one or two layers (besides the last two dense layers)… Maybe there’s something wrong with my computer or I need to try to figure out how to use the colab notebooks?
Sorry for the lag,
Ewan

ewanww · May 1, 2022, 10:57pm

For instance a model like this still has 20 minute epochs

model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size + 1, embedding_dim, input_length=maxlen),
tf.keras.layers.Conv1D(10, 5, activation=‘relu’),
tf.keras.layers.GlobalMaxPooling1D(),
tf.keras.layers.Dense(4, activation=‘relu’),
tf.keras.layers.Dense(1, activation=‘sigmoid’)

chris.favila · May 3, 2022, 12:09pm

Hi Ewan! The architectures you used looks good although I recommend adjusting the dimensionalities of the LSTM layers on the first model you presented. Try experimenting with powers of two starting at 32. That should take about 80 seconds per epoch. The 2nd model you showed takes around 10 seconds per epoch.

Another source of problem might be the way the Embedding layer is initialized. The boilerplate code uses pre-defined embeddings:

tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False),

And the create_model() should be called like this in the next ungraded cell:

model = create_model(VOCAB_SIZE, EMBEDDING_DIM, MAXLEN, EMBEDDINGS_MATRIX)

Please check if you are implementing it this way. In case you want to refresh your notebook and start from scratch, you can do so by following the steps here. Hope these help!

ytl · January 15, 2023, 8:56pm

Hi Ewan!

I have exactly same problem. I have been worked on the assignment for 4 days. Tried all kinds layer combinations. I have to used colab to test it. Each epoch for coursera notebook takes about 20 minutes. Did you solve the problem?

KarinM · August 30, 2023, 9:21pm

Hi,

I have the same issue. I tried to keep it simple. Each epoch for coursera notebook takes about 20 minutes. Did you solve the problem?

Could someone provide hints on how to accelerate this training?

KarinM · August 30, 2023, 9:31pm

In case this is useful to someone else: Removing the bidirectional layers adding dropouts decreased the time per epoch from 140 s to 70 s.

Topic		Replies	Views
Training leads to overfitting Natural Language Processing in TensorFlow	4	493	April 14, 2022
Model always overfitting Natural Language Processing in TensorFlow	3	481	May 11, 2022
Week 3, Weekly Assignment, Unable to prevent overfitting of the model Natural Language Processing in TensorFlow week-3	7	256	August 23, 2023
Any reasonable hints regarding the assignment? Natural Language Processing in TensorFlow week-3	3	250	July 23, 2023
T1_C3_Week 4 Assignment Natural Language Processing in TensorFlow week-4	3	329	January 9, 2023

Architecture has 20 minutes per epoch

Related topics