Why is my accuracy stuck at tiny decimals, categorical accuracy is one, losses are NaN, but predictions are all NaN?

Problem Summary

My model is built and compiled properly but gets the NaN validation loss on all epochs. The training set accuracy is also infinitesimally small and keeps decreasing. I couldn’t find a mistake in the tokenization, embedding, and model-building code.

I am using the training CSV file of the BBC article dataset:

  • Some common words like a, as, are, at, and be are removed after loading the file into a variable.
  • I was able to get a proper article out of the tokenized training data using the text vectorizer vocabulary.
  • Manually reviewing the first sentence and labels, I found they match the file I stored in my Google Drive.
  • The model outputs one array of five decimals per input using a soft-max activation. There are five output neurons, one for each article class. I used the SparseCategoricalCrossEntropy loss function.
  • The labels are vectorized using a separate TextVectorization() instance. Here, I adapt the vectorizer to ALL labels and tokenize both the training and validation labels.

Text Vectorization Check

I turn the articles into sequences of integers based on a vocabulary using TextVectorization(). I adapt the vectorizer to the training split only and then use it to tokenize the training and testing articles.

  • After calling adapt() the TextVectorization() layer on train_sentences, I just call it:
tokenizer = fit_tokenizer(train_sentences, NUM_WORDS, MAXLEN)
train_padded_seq = tokenizer.call(train_sentences)
val_padded_seq = tokenizer.call(val_sentences)

The output shape looks fine: (1192, 256) and (298, 256). Each article is padded or truncated to be 256 integers long. The training and evaluation data are tensors.
train_padded_seq.shape outputs TensorShape([1192, 256]).
print(train_padded_seq[0]) outputs a sequence of integers with shape=(256,), dtype=int64.

  • When tokenizing the labels, I tried subtracting one from every label to make sure they all started at 0, but that made the accuracy exponentially low on the first epoch instead of a low decimal.

I tokenize based on the full list of labels.

train_label_seq = tokenize_labels(labels, train_labels)
val_label_seq = tokenize_labels(labels, val_labels)

Model Building

  • The model compiles and runs, but its accuracy is terrible. With accuracy, the best performance is during the first epoch when I get

loss: nan - accuracy: 0.0076 - val_loss: nan - val_accuracy: 0.0000e+00.

It gets worse from there.

When I use categorical accuracy as my metric, the losses for the training and validation sets immediately go to NaN and both sets have an accuracy of one:

loss: nan - categorical_accuracy: 1.0000 - val_loss: nan - val_categorical_accuracy: 1.0000.

In both cases, running model.predict() gives me nothing but nans in the output.

model = create_model(NUM_WORDS, EMBEDDING_DIM, MAXLEN)
history = model.fit(train_padded_seq, train_label_seq, epochs=3, validation_data=(val_padded_seq, val_label_seq))

model.predict(train_padded_seq)[0].shape gives (5,).

model.predict(train_padded_seq)[0, : 5] gives array([nan, nan, nan, nan, nan], dtype=float32).

Changing the maximum sequence length, embedding dimensions, and vocabulary size (NUM_WORDS) did not change the results much. The results were always nan on a prediction and accuracy never went above very small decimals.

Please use the correct loss function for multiclass classification problem. categorical_crossentropy can be used as the loss function of a multiclass classification only if the labels are one-hot encoded. Do check tensorflow docs & select the correct function for a multiclass classification problem where labels are integers.

TextVectorizer is not covered in the course. Move your question to General Discussions category. Here’s the community user guide.

I used SparseCategoricalCrossEntropy for my loss function. It works with integers as true labels and arrays of probabilities as the model output. What else should I do?

If all else is correct, then check your model architecture.
If you’re stuck, click my name and message your notebook as an attachment.