Architecture has 20 minutes per epoch

Hello!

I have been working on the assignment for week 3 for about a week now and can’t seem to get a model that doesn’t overfit! I have tried models with only 1 LSTM layer and Conv1D layer which run a little faster but seem to be increasing in validation loss by epoch 5 even with dropout!

Currently I am running this model with the hopes that a more complex model will avoid increasing validation loss, but my epochs take so long to run that the notebook session ends before I’m done with training :frowning:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):

### START CODE HERE

lstm1_dim = round(embedding_dim/20)
lstm2_dim = round(lstm1_dim/2)

model = tf.keras.Sequential([ 
    # This is how you need to set the Embedding layer when using pre-trained embeddings
    tf.keras.layers.Embedding(vocab_size + 1, embedding_dim, input_length=maxlen),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm1_dim, return_sequences=True)),
    
    #tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm2_dim, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm2_dim, return_sequences = True)),
    tf.keras.layers.GlobalMaxPooling1D(),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dropout(.2),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=['accuracy']) 

Any thoughts or suggestions on model architecture/increased training speed without increasing validation loss?

Thanks so much!
Ewan

Hi,
I believe we use collab in these courses. Can you try using a GPU instead of a CPU. You can select this from a collab menu.
Let me know if it helps.

@ewanww
Use Colab GPU to reduce training time, to select GPU follow these steps.
Edit> Notebook Setting > Hardware Accelerator > GPU and then click save .

Ok, I will do that but how do i submit the pickle file for the assignment later?

Also, there is no badge to open in colab from the assignment notebook…

1 Like

you can see this for submitting the pickle file for the assignment

Hi Ewan! It seems you have a very large network. Try using dropout after each LSTM’s (or simply use the dropout parameter of the layer) and see if you get better results. From other exercise, I also notice that Bidirectional LSTMs tends to overfit more quickly as you train longer. I’m not sure if you’re allowed to reduce epochs so I suggest just using LSTM. That will also train a lot quicker per epoch than bidirectional. Another thing you can try is increasing the dropout rate. I recommend continuing to use Coursera Labs for a more seamless submission process. Hope these help!

Hi Chris,
I changed my model to the following:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):

### START CODE HERE

lstm1_dim = round(embedding_dim/20)
lstm2_dim = round(lstm1_dim/2)

model = tf.keras.Sequential([ 
    tf.keras.layers.Embedding(vocab_size + 1, embedding_dim, input_length=maxlen),
    tf.keras.layers.LSTM(5, return_sequences=True, dropout=.5),
     tf.keras.layers.LSTM(lstm2_dim, dropout=.5),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=['accuracy']) 

### END CODE HERE

return model

but there doesn’t seem to be any change in the 20 minute epoch time, if I use Conv1D it also seems to take around twenty minutes even with just one or two layers (besides the last two dense layers)… Maybe there’s something wrong with my computer or I need to try to figure out how to use the colab notebooks?
Sorry for the lag,
Ewan

For instance a model like this still has 20 minute epochs

model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size + 1, embedding_dim, input_length=maxlen),
tf.keras.layers.Conv1D(10, 5, activation=‘relu’),
tf.keras.layers.GlobalMaxPooling1D(),
tf.keras.layers.Dense(4, activation=‘relu’),
tf.keras.layers.Dense(1, activation=‘sigmoid’)

Hi Ewan! The architectures you used looks good although I recommend adjusting the dimensionalities of the LSTM layers on the first model you presented. Try experimenting with powers of two starting at 32. That should take about 80 seconds per epoch. The 2nd model you showed takes around 10 seconds per epoch.

Another source of problem might be the way the Embedding layer is initialized. The boilerplate code uses pre-defined embeddings:

tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False), 

And the create_model() should be called like this in the next ungraded cell:

model = create_model(VOCAB_SIZE, EMBEDDING_DIM, MAXLEN, EMBEDDINGS_MATRIX)

Please check if you are implementing it this way. In case you want to refresh your notebook and start from scratch, you can do so by following the steps here. Hope these help!

Hi Ewan!

I have exactly same problem. I have been worked on the assignment for 4 days. Tried all kinds layer combinations. I have to used colab to test it. Each epoch for coursera notebook takes about 20 minutes. Did you solve the problem?

Hi,

I have the same issue. I tried to keep it simple. Each epoch for coursera notebook takes about 20 minutes. Did you solve the problem?

Could someone provide hints on how to accelerate this training?

In case this is useful to someone else: Removing the bidirectional layers adding dropouts decreased the time per epoch from 140 s to 70 s.