Hello. Can someone walk me through how to build a biLSTM model for multiclass classification (7 classes) using text data? the data is from a kaggle competition (News Category Dataset | Kaggle).
I have labelled it like this to get the following shapes, and then used embeddings to get the arrays of the following shapes:
label_dict = {'CRIME':0, 'BUSINESS':1, 'SPORTS':2 ,'WEDDINGS':3, 'DIVORCE':4, 'PARENTING':5}
df['label'] = df['category'].map(label_dict).fillna(6).astype(int)
X_train data shape - (171812, 384)
y_train data shape - (171812,)
X_test data shape - (37715, 384)
y_test data shape - (37715,)
I am trying to use a biLSTM model,
# parameters
DENSE1_DIM = 64
DENSE2_DIM = 32
LSTM1_DIM = 32
LSTM2_DIM = 16
WD = 0.001
FILTERS = 64
input_dim= 10000
output_dim =128
max_length =384
# Model Definition
model_lstm = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim, output_dim, input_length=max_length),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM1_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM2_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)),
tf.keras.layers.Dense(DENSE1_DIM, activation='relu', kernel_regularizer = regularizers.l2(WD)),
tf.keras.layers.Dense(DENSE2_DIM, activation='relu'),
tf.keras.layers.Dense(7, activation='softmax')
])
# Set the training parameters
model_lstm.compile(loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
# metrics=[tf.keras.metrics.Accuracy()])
metrics = [tfa.metrics.F1Score(average="macro", threshold=None,num_classes=7, name='f1_score', dtype=None)])
model_lstm.summary()
Model: "sequential_20"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_18 (Embedding) (None, 384, 128) 1280000
bidirectional_30 (Bidirecti (None, 384, 64) 41216
onal)
dense_56 (Dense) (None, 384, 64) 4160
dense_57 (Dense) (None, 384, 32) 2080
dense_58 (Dense) (None, 384, 7) 231
=================================================================
Then I try to train it, and get the error, ValueError: Shapes (None, 1) and (None, 384, 7) are incompatible.
history = model_lstm.fit(X_train, y_train,
epochs=epochs,
validation_data=(X_test, y_test),
batch_size=batch_size)
Can someone explain me in simple words how I can do the shapes correctly with my data?
I do not quite understand where (None, 1) comes from.