How to build biLSTM for multiclass text classification? Shaping problem

bluetail · July 11, 2023, 12:41am

Hello. Can someone walk me through how to build a biLSTM model for multiclass classification (7 classes) using text data? the data is from a kaggle competition (News Category Dataset | Kaggle).
I have labelled it like this to get the following shapes, and then used embeddings to get the arrays of the following shapes:

label_dict = {'CRIME':0, 'BUSINESS':1, 'SPORTS':2 ,'WEDDINGS':3, 'DIVORCE':4, 'PARENTING':5}
        
df['label'] = df['category'].map(label_dict).fillna(6).astype(int)

X_train data shape - (171812, 384)
y_train data shape - (171812,)
X_test data shape - (37715, 384)
y_test data shape - (37715,)

I am trying to use a biLSTM model,

#    parameters
DENSE1_DIM = 64
DENSE2_DIM = 32
LSTM1_DIM = 32 
LSTM2_DIM = 16
WD = 0.001
FILTERS = 64

input_dim= 10000
output_dim =128
max_length =384

# Model Definition 
model_lstm = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim, output_dim, input_length=max_length),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM1_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)), 
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM2_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)),
    tf.keras.layers.Dense(DENSE1_DIM, activation='relu', kernel_regularizer = regularizers.l2(WD)), 
    tf.keras.layers.Dense(DENSE2_DIM, activation='relu'),    
    tf.keras.layers.Dense(7, activation='softmax')
])

# Set the training parameters
model_lstm.compile(loss='categorical_crossentropy',
                   optimizer=tf.keras.optimizers.Adam(), 
#                   metrics=[tf.keras.metrics.Accuracy()])
                    
                   metrics = [tfa.metrics.F1Score(average="macro", threshold=None,num_classes=7, name='f1_score', dtype=None)])

model_lstm.summary()
Model: "sequential_20"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_18 (Embedding)    (None, 384, 128)          1280000   
                                                                 
 bidirectional_30 (Bidirecti  (None, 384, 64)          41216     
 onal)                                                           
                                                                 
 dense_56 (Dense)            (None, 384, 64)           4160      
                                                                 
 dense_57 (Dense)            (None, 384, 32)           2080      
                                                                 
 dense_58 (Dense)            (None, 384, 7)            231       
                                                                 
=================================================================

Then I try to train it, and get the error, ValueError: Shapes (None, 1) and (None, 384, 7) are incompatible.

history = model_lstm.fit(X_train, y_train,
          epochs=epochs,
          validation_data=(X_test, y_test),
          batch_size=batch_size)

Can someone explain me in simple words how I can do the shapes correctly with my data?
I do not quite understand where (None, 1) comes from.

balaji.ambresh · July 11, 2023, 5:22am

Remove return_sequences from the last Bidirectional layer.

bluetail · July 11, 2023, 8:38am

Done, thank you. It still says that ‘ValueError: Shapes (None, 1) and (None, 7) are incompatible’

balaji.ambresh · July 11, 2023, 9:26am

True classes are encoded as integers. So, use the variation of the loss function that allows sparse encoding.

Topic		Replies	Views
Multi label multi class classification problem for nlp AI Discussions feedback , ai-discussions	3	164	June 30, 2023
My Master's Project Deep Learning Resources	4	640	August 17, 2023
How to predict with the ML model I created? (receiving Error) Natural Language Processing in TensorFlow week-2 , week-3 , week-4	6	641	June 1, 2022
[Week 4] Transformer Network Application: Named-Entity Recognition Sequence Models	11	794	July 21, 2021
Course 5: Week 1: Music inference model (LSTM) Sequence Models	25	3537	April 23, 2022

How to build biLSTM for multiclass text classification? Shaping problem

Related topics