C3W3_Assignment High training/validation accuracy after one epoch

Hi everyone,

I’m not sure if I built the perfect model for this assignment but I am having around 100% accuracy after only one epoch for both the training and validation set. My loss and validation loss are also very small at the beginning of training.

Please take a look at the following plots. Am I missing something?

grafik

grafik

I’ve seen this issue before, but don’t recall the cause (I’m not a mentor for that course).

Maybe you can read back through the thread history for this course forum area. Or use the forum Search tool for “accuracy”.

So what does the graph indicates overfitting?? as the assignment name also indicates Exploring Overfitting in NLP

things to look into

  1. if your parse data from file has the below codes correctly defined
    csv.readerreturns an iterable that returns each row in every iteration. So the label can be accessed viarow[0]and the text viarow[5]`.
    The labels are originally encoded as strings (‘0’ representing negative and ‘4’ representing positive). You need to change this so that the labels are integers and 0 is used for representing negative, while 1 should represent positive.

  2. Next in Training validation split,
    if you have defined the len of the sentences in correct so the value is integer.

  3. Can you explain based on the below screenshot, how did you create model. Do not share codes. you could just explain how many layers you used, what activation you used, how many dense layers etc. Also in model.compile, loss, optimizer and accuracy.

  4. Another important point for this model algorithm is the below line
    This is how you need to set the Embedding layer when using pre-trained embeddings
    what vocab size, weights you used.

Regards
DP

I’m not sure if this is overfittig? I recall that overfitting occurs on the training set but in my case it’s overfitting on both sets?

  1. This part of my code is correct. The test function works and I can see the expected output.

  2. The variable train_size is an integer. Also here, the test function works.

  3. I tried two different architectures:

    • The first one is a model with Embedding, Conv1D, GlobalMaxPooling1D, and two Dense layers with a Dropout layer between them.

    • The second one is a model with Embedding, Dropout, Bidirectional LSTM and two Dense layers.
      The activation of the output layer is sigmoid, all other layers have a Relu activation function. The loss function is binary_crossentropy and the optimizer is adam.

  4. The Embedding layer is already provided and I haven’t changed the code here. It has the following arguments:

    • Input dimension is vocabulary size +1
    • Output dimension is embedding dimension
    • Input length is maximum length of all sequences
    • The weights are equal to the embedding matrix
    • trainable is set to false

I’ve already submitted my code and it did pass :sweat_smile:

Ok great. if you already solve, then kindly close the thread by choosing a comment which solved your issue or explaining how you solved a issue yourself.

I meant my code which produces these two graphs did pass with 100/100 after my submission. The slope of the val_loss is 0 and that’s why all tests are fine. But I still think that the trained model ist not correct.

this is correct

This is correct, but what was your unit for the last two dense layer.

  • The second one is a model with Embedding, Dropout, Bidirectional LSTM and two Dense layers.
    The activation of the output layer is sigmoid, all other layers have a Relu activation function. The loss function is binary_crossentropy and the optimizer is Adam.

in model compile, what accuracy you used?

Also did you notice after model training the below statement

To pass this assignment your val_loss (validation loss) should either be flat or decreasing.

Although a flat val_loss and a lowering train_loss (or just loss) also indicate some overfitting what you really want to avoid is having a lowering train_lossand an increasing val_loss.

Probably that’s why you cleared the assignment submission.

Explore your model with this pointer,

  • Try simpler architectures first to avoid long training times. Architectures that are able to solve this problem usually have around 3-4 layers (excluding the last two Dense ones)

Regards
DP

I found the mistake in my code. When changing the labels from 0 or 4 to 0 or 1, I was checking for integer values instead of string. Therefore, all my labels had one as a value. Now I check for string values and the results seem to be reasonable.

Thank you!

1 Like