Week 3, Weekly Assignment, Unable to prevent overfitting of the model

shujaan.azhar · July 25, 2023, 5:55am

I have tried LSTM, Conv1D with Dropout, Regularization, Normalization, GlobalAvgeragePooling1D and GlobalMaxPooling1D but nothing seems to work. Assistance required. The model code is below

def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
    """
    Creates a binary sentiment classifier model
    
    Args:
        vocab_size (int): size of the vocabulary for the Embedding layer input
        embedding_dim (int): dimensionality of the Embedding layer output
        maxlen (int): length of the input sequences
        embeddings_matrix (array): predefined weights of the embeddings
    
    Returns:
        model (tf.keras Model): the sentiment classifier model
    """
    ### START CODE HERE
    
    model = tf.keras.Sequential([ 
        # This is how you need to set the Embedding layer when using pre-trained embeddings
        tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False), 
        tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
#         tf.keras.layers.Conv1D(128, 5, activation = "relu"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(0.5),
#         tf.keras.layers.GlobalAveragePooling1D(),
        tf.keras.layers.Dense(64, activation = "relu"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(1, activation = "sigmoid")
    ])
    
    model.compile(loss="binary_crossentropy",
                  optimizer="Adam",
                  metrics=['accuracy']) 

    ### END CODE HERE

    return model


# Create your untrained model
model = create_model(VOCAB_SIZE, EMBEDDING_DIM, MAXLEN, EMBEDDINGS_MATRIX)

# Train the model and save the training history
history = model.fit(train_pad_trunc_seq, train_labels, epochs=20, validation_data=(val_pad_trunc_seq, val_labels))

saifkhanengr · July 25, 2023, 6:18am

Please follow these:

You can try different combinations of layers covered in previous ungraded labs such as:
- Conv1D
- Dropout
- GlobalMaxPooling1D
- MaxPooling1D
- LSTM
- Bidirectional(LSTM)
The last two layers should be Dense layers.
There multiple ways of solving this problem. So try an architecture that you think will not overfit.
Try simpler architectures first to avoid long training times. Architectures that are able to solve this problem usually have around 3-4 layers (excluding the last two Dense ones)
Include at least one Dropout layer to mitigate overfitting.

Try different architecture. For me, Embedding. Dropout, Conv1D, MaxPooling1D, LSTM, and 2 dense worked.

shujaan.azhar · July 25, 2023, 6:33am

I have tried this before but it didn’t work. Maybe its because of the hyperparameters I am using. Could you please share your hyperparameters?

saifkhanengr · July 25, 2023, 6:40am

Sorry, but I cannot share hyperparameters. This is your opportunity to try and learn. Start with the simple one.

darkimonus · August 23, 2023, 9:02am

I wanted to try architecture that you mentioned but I got problem with inputs between conv1d and lstm, can you please tell how you avoided that?

saifkhanengr · August 23, 2023, 10:26am

Please share your full error…

darkimonus · August 23, 2023, 10:42am

Input 0 of layer “lstm” is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128), I think maybe it’s because I’ve tried to use GlobalMaxPooling1D

saifkhanengr · August 23, 2023, 10:45am

Check the ungraded labs of this week. You will get the idea.

Topic		Replies	Views
Training leads to overfitting Natural Language Processing in TensorFlow	4	493	April 14, 2022
Model always overfitting Natural Language Processing in TensorFlow	3	481	May 11, 2022
Architecture has 20 minutes per epoch Natural Language Processing in TensorFlow week-3	12	593	August 30, 2023
Test c3w3_assignment: Assignment errored out and I wonder if the data changed Natural Language Processing in TensorFlow week-3	3	239	August 20, 2023
How to build a hybrid model with the LTSM and convolutional layer? ValueError: Input 0 of layer "conv1d_1" is incompatible with the layer AI Discussions	10	140	May 23, 2022

Week 3, Weekly Assignment, Unable to prevent overfitting of the model

Related topics