I’m not sure if I built the perfect model for this assignment but I am having around 100% accuracy after only one epoch for both the training and validation set. My loss and validation loss are also very small at the beginning of training.
Please take a look at the following plots. Am I missing something?
So what does the graph indicates overfitting?? as the assignment name also indicates Exploring Overfitting in NLP
things to look into
if your parse data from file has the below codes correctly defined
csv.readerreturns an iterable that returns each row in every iteration. So the label can be accessed viarow[0]and the text viarow[5]`.
The labels are originally encoded as strings (‘0’ representing negative and ‘4’ representing positive). You need to change this so that the labels are integers and 0 is used for representing negative, while 1 should represent positive.
Next in Training validation split,
if you have defined the len of the sentences in correct so the value is integer.
Can you explain based on the below screenshot, how did you create model. Do not share codes. you could just explain how many layers you used, what activation you used, how many dense layers etc. Also in model.compile, loss, optimizer and accuracy.
Another important point for this model algorithm is the below line
This is how you need to set the Embedding layer when using pre-trained embeddings
what vocab size, weights you used.
I’m not sure if this is overfittig? I recall that overfitting occurs on the training set but in my case it’s overfitting on both sets?
This part of my code is correct. The test function works and I can see the expected output.
The variable train_size is an integer. Also here, the test function works.
I tried two different architectures:
The first one is a model with Embedding, Conv1D, GlobalMaxPooling1D, and two Dense layers with a Dropout layer between them.
The second one is a model with Embedding, Dropout, Bidirectional LSTM and two Dense layers.
The activation of the output layer is sigmoid, all other layers have a Relu activation function. The loss function is binary_crossentropy and the optimizer is adam.
The Embedding layer is already provided and I haven’t changed the code here. It has the following arguments:
Ok great. if you already solve, then kindly close the thread by choosing a comment which solved your issue or explaining how you solved a issue yourself.
I meant my code which produces these two graphs did pass with 100/100 after my submission. The slope of the val_loss is 0 and that’s why all tests are fine. But I still think that the trained model ist not correct.
This is correct, but what was your unit for the last two dense layer.
The second one is a model with Embedding, Dropout, Bidirectional LSTM and two Dense layers.
The activation of the output layer is sigmoid, all other layers have a Relu activation function. The loss function is binary_crossentropy and the optimizer is Adam.
in model compile, what accuracy you used?
Also did you notice after model training the below statement
To pass this assignment your val_loss (validation loss) should either be flat or decreasing.
Although a flat val_loss and a lowering train_loss (or just loss) also indicate some overfitting what you really want to avoid is having a lowering train_lossand an increasing val_loss.
Probably that’s why you cleared the assignment submission.
Explore your model with this pointer,
Try simpler architectures first to avoid long training times. Architectures that are able to solve this problem usually have around 3-4 layers (excluding the last two Dense ones)
I found the mistake in my code. When changing the labels from 0 or 4 to 0 or 1, I was checking for integer values instead of string. Therefore, all my labels had one as a value. Now I check for string values and the results seem to be reasonable.