Week 3, Weekly Assignment, Unable to prevent overfitting of the model

I have tried LSTM, Conv1D with Dropout, Regularization, Normalization, GlobalAvgeragePooling1D and GlobalMaxPooling1D but nothing seems to work. Assistance required. The model code is below

def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
    """
    Creates a binary sentiment classifier model
    
    Args:
        vocab_size (int): size of the vocabulary for the Embedding layer input
        embedding_dim (int): dimensionality of the Embedding layer output
        maxlen (int): length of the input sequences
        embeddings_matrix (array): predefined weights of the embeddings
    
    Returns:
        model (tf.keras Model): the sentiment classifier model
    """
    ### START CODE HERE
    
    model = tf.keras.Sequential([ 
        # This is how you need to set the Embedding layer when using pre-trained embeddings
        tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=maxlen, weights=[embeddings_matrix], trainable=False), 
        tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
#         tf.keras.layers.Conv1D(128, 5, activation = "relu"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(0.5),
#         tf.keras.layers.GlobalAveragePooling1D(),
        tf.keras.layers.Dense(64, activation = "relu"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(1, activation = "sigmoid")
    ])
    
    model.compile(loss="binary_crossentropy",
                  optimizer="Adam",
                  metrics=['accuracy']) 

    ### END CODE HERE

    return model


# Create your untrained model
model = create_model(VOCAB_SIZE, EMBEDDING_DIM, MAXLEN, EMBEDDINGS_MATRIX)

# Train the model and save the training history
history = model.fit(train_pad_trunc_seq, train_labels, epochs=20, validation_data=(val_pad_trunc_seq, val_labels))

Please follow these:

  • You can try different combinations of layers covered in previous ungraded labs such as:

    • Conv1D
    • Dropout
    • GlobalMaxPooling1D
    • MaxPooling1D
    • LSTM
    • Bidirectional(LSTM)
  • The last two layers should be Dense layers.

  • There multiple ways of solving this problem. So try an architecture that you think will not overfit.

  • Try simpler architectures first to avoid long training times. Architectures that are able to solve this problem usually have around 3-4 layers (excluding the last two Dense ones)

  • Include at least one Dropout layer to mitigate overfitting.

Try different architecture. For me, Embedding. Dropout, Conv1D, MaxPooling1D, LSTM, and 2 dense worked.

I have tried this before but it didn’t work. Maybe its because of the hyperparameters I am using. Could you please share your hyperparameters?

Sorry, but I cannot share hyperparameters. This is your opportunity to try and learn. Start with the simple one.

1 Like

I wanted to try architecture that you mentioned but I got problem with inputs between conv1d and lstm, can you please tell how you avoided that?

Please share your full error…

Input 0 of layer “lstm” is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128), I think maybe it’s because I’ve tried to use GlobalMaxPooling1D

Check the ungraded labs of this week. You will get the idea.