How to build a hybrid model with the LTSM and convolutional layer? ValueError: Input 0 of layer "conv1d_1" is incompatible with the layer

I have this error with my model architecture for a sentiment analysis problem (binary classification).
It is a text corpus with an average review of review is 373 words - so each review consists of several lengthy sentences, and the model with the two LSTM layers is overfitting to the data failing to steadily decrease the validation loss.

After reading academic articles, I discovered that adding a 1D Convolutional layer in combination with a pooling layer can help mitigate the problem by selecting the most important features (Basiri et al., 2021; Xu et al., 2021).
So I am trying to implement this suggestion.

so my code is


# Hyperparameters
EMBEDDING_DIM = 50
MAXLEN = 500 #1000, 1400
VOCAB_SIZE =  33713

DENSE1_DIM = 64
DENSE2_DIM = 32

LSTM1_DIM = 32 
LSTM2_DIM = 16

WD = 0.001

FILTERS = 64  
KERNEL_SIZE = 5

# Model Definition 
model_lstm = tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE+1, EMBEDDING_DIM, input_length=MAXLEN,weights=[EMBEDDINGS_MATRIX], trainable=False),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM1_DIM, dropout=0.5, kernel_regularizer = regularizers.l2(WD), return_sequences=True)), 
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM2_DIM, dropout=0.5, kernel_regularizer = regularizers.l2(WD))),
    tf.keras.layers.Dense(DENSE2_DIM, activation='relu'),
    tf.keras.layers.Conv1D(FILTERS, KERNEL_SIZE, activation='relu'),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.GlobalAveragePooling1D(), 
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Set the training parameters
model_lstm.compile(loss='binary_crossentropy',
                   optimizer=tf.keras.optimizers.Adam(), 
                   metrics=[tf.keras.metrics.BinaryAccuracy()])

# Print the model summary
model_lstm.summary()


num_epochs = 35
history_lstm = model_lstm.fit(sent_tok_train, labels_train, epochs=num_epochs, validation_data=(sent_tok_val, labels_val), verbose =2)

....

File ~\.conda\envs\tf-gpu\lib\site-packages\keras\engine\input_spec.py:228, in assert_input_compatibility(input_spec, inputs, layer_name)
    226   ndim = x.shape.rank
    227   if ndim is not None and ndim < spec.min_ndim:
--> 228     raise ValueError(f'Input {input_index} of layer "{layer_name}" '
    229                      'is incompatible with the layer: '
    230                      f'expected min_ndim={spec.min_ndim}, '
    231                      f'found ndim={ndim}. '
    232                      f'Full shape received: {tuple(shape)}')
    233 # Check dtype.
    234 if spec.dtype is not None:

ValueError: Input 0 of layer "conv1d_1" is incompatible with the layer: expected min_ndim=3, found ndim=2. Full shape received: (None, 32)

how can I fix this error please? thank you.

Please read these pages keeping input shapes in mind:

  1. GlobalAveragePooling1D
  2. Conv1D

Consider Conv1D for instance. It takes input of 3 dimensions. Looking at your architecture, there are only 2 dimensions as inputs.

Input 0 of layer "conv1d_7" is incompatible with the layer: expected min_ndim=3, found ndim=2. Full shape received: (None, 32).

1 Like

yes thank you. that is what my question is about - I do not understand what I should be changing in the code to get expected min_ndim=3 instead of ndim=2. Do I need to I use the input_shape= argument to set the dimentions?

@bluetail How about you print(model.summary()) as you add layers? That way, the input and output shape of each layer will become clear.

1 Like

The conv1d input shape takes in 3 parameters while you are supplying 2 parameters in the shape
You can use a simple trick of reshaping the input to conv1d layer as (x,y,1) where x and y are the original dimensions and 1 just adds the 3rd dimension without altering the total number of elements in the array
Hope that helps

1 Like

I have summary like this before my Conv1D layer.

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_5 (Embedding)     (None, 500, 50)           1685700   
                                                                 
 bidirectional_9 (Bidirectio  (None, 500, 64)          21248     
 nal)                                                            
                                                                 
 bidirectional_10 (Bidirecti  (None, 32)               10368     
 onal)                                                           
                                                                 
 dense_6 (Dense)             (None, 32)                1056      
                                                                 
=================================================================
Total params: 1,718,372
Trainable params: 32,672
Non-trainable params: 1,685,700
____________________________________

can I try anything as input_shape ? can I try input_shape = (None, 16, 128) or input_shape = (16, 64, 1), for example?
is it just the problem of parameter tuning from there to get a better fit?
thank you very much.

input_shape should reflect the shape of actual data that’s fed into the model. Don’t use random values for this.

Can you please explain more about this? I have access to Andrew NG Coursera courses and also to the resources of the University of Edinburgh. if you could refer to a course or a book about this.
I still do not know how to progress from (None, 32) to a 3D (…, …, …) input for the Conv1D layer.
thank you very much.

No worries.

Please follow these steps:

  1. Create an input layer.
  2. Add an embedding layer.
  3. Create a model and observe the output shape keeping in mind the number of dimensions.

Can you continue to expand this model beyond these 2 layers by adding a Conv1D layer?

Do you know why in my example, I get the output of 3 dimensions after my first bidirectional layer, but then my second bidirectional layer outputs 2 dimentions?

thank you.

It’s because you don’t have return_sequences=True in the LSTM layer inside the 2nd bidirectional layer.

1 Like