Problem Summary
My model is built and compiled properly but gets the NaN validation loss on all epochs. The training set accuracy is also infinitesimally small and keeps decreasing. I couldn’t find a mistake in the tokenization, embedding, and model-building code.
I am using the training CSV file of the BBC article dataset:
- Some common words like a, as, are, at, and be are removed after loading the file into a variable.
- I was able to get a proper article out of the tokenized training data using the text vectorizer vocabulary.
- Manually reviewing the first sentence and labels, I found they match the file I stored in my Google Drive.
- The model outputs one array of five decimals per input using a soft-max activation. There are five output neurons, one for each article class. I used the SparseCategoricalCrossEntropy loss function.
- The labels are vectorized using a separate TextVectorization() instance. Here, I adapt the vectorizer to ALL labels and tokenize both the training and validation labels.
Text Vectorization Check
I turn the articles into sequences of integers based on a vocabulary using TextVectorization(). I adapt the vectorizer to the training split only and then use it to tokenize the training and testing articles.
- After calling
adapt()
the TextVectorization() layer ontrain_sentences
, I just call it:
tokenizer = fit_tokenizer(train_sentences, NUM_WORDS, MAXLEN)
train_padded_seq = tokenizer.call(train_sentences)
val_padded_seq = tokenizer.call(val_sentences)
The output shape looks fine: (1192, 256) and (298, 256). Each article is padded or truncated to be 256 integers long. The training and evaluation data are tensors.
train_padded_seq.shape
outputs TensorShape([1192, 256]).
print(train_padded_seq[0])
outputs a sequence of integers with shape=(256,), dtype=int64
.
- When tokenizing the labels, I tried subtracting one from every label to make sure they all started at 0, but that made the accuracy exponentially low on the first epoch instead of a low decimal.
I tokenize based on the full list of labels.
train_label_seq = tokenize_labels(labels, train_labels)
val_label_seq = tokenize_labels(labels, val_labels)
Model Building
- The model compiles and runs, but its accuracy is terrible. With accuracy, the best performance is during the first epoch when I get
loss: nan - accuracy: 0.0076 - val_loss: nan - val_accuracy: 0.0000e+00
.
It gets worse from there.
When I use categorical accuracy as my metric, the losses for the training and validation sets immediately go to NaN and both sets have an accuracy of one:
loss: nan - categorical_accuracy: 1.0000 - val_loss: nan - val_categorical_accuracy: 1.0000
.
In both cases, running model.predict()
gives me nothing but nans in the output.
model = create_model(NUM_WORDS, EMBEDDING_DIM, MAXLEN)
history = model.fit(train_padded_seq, train_label_seq, epochs=3, validation_data=(val_padded_seq, val_label_seq))
model.predict(train_padded_seq)[0].shape
gives (5,).
model.predict(train_padded_seq)[0, : 5]
gives array([nan, nan, nan, nan, nan], dtype=float32)
.
Changing the maximum sequence length, embedding dimensions, and vocabulary size (NUM_WORDS) did not change the results much. The results were always nan on a prediction and accuracy never went above very small decimals.