Why is my accuracy stuck at tiny decimals, categorical accuracy is one, losses are NaN, but predictions are all NaN?

Joachim_Rives · October 26, 2023, 5:54am

Problem Summary

My model is built and compiled properly but gets the NaN validation loss on all epochs. The training set accuracy is also infinitesimally small and keeps decreasing. I couldn’t find a mistake in the tokenization, embedding, and model-building code.

I am using the training CSV file of the BBC article dataset:

Some common words like a, as, are, at, and be are removed after loading the file into a variable.
I was able to get a proper article out of the tokenized training data using the text vectorizer vocabulary.
Manually reviewing the first sentence and labels, I found they match the file I stored in my Google Drive.
The model outputs one array of five decimals per input using a soft-max activation. There are five output neurons, one for each article class. I used the SparseCategoricalCrossEntropy loss function.
The labels are vectorized using a separate TextVectorization() instance. Here, I adapt the vectorizer to ALL labels and tokenize both the training and validation labels.

Text Vectorization Check

I turn the articles into sequences of integers based on a vocabulary using TextVectorization(). I adapt the vectorizer to the training split only and then use it to tokenize the training and testing articles.

After calling adapt() the TextVectorization() layer on train_sentences, I just call it:

tokenizer = fit_tokenizer(train_sentences, NUM_WORDS, MAXLEN)
train_padded_seq = tokenizer.call(train_sentences)
val_padded_seq = tokenizer.call(val_sentences)

The output shape looks fine: (1192, 256) and (298, 256). Each article is padded or truncated to be 256 integers long. The training and evaluation data are tensors.
train_padded_seq.shape outputs TensorShape([1192, 256]).
print(train_padded_seq[0]) outputs a sequence of integers with shape=(256,), dtype=int64.

When tokenizing the labels, I tried subtracting one from every label to make sure they all started at 0, but that made the accuracy exponentially low on the first epoch instead of a low decimal.

I tokenize based on the full list of labels.

train_label_seq = tokenize_labels(labels, train_labels)
val_label_seq = tokenize_labels(labels, val_labels)

Model Building

The model compiles and runs, but its accuracy is terrible. With accuracy, the best performance is during the first epoch when I get

loss: nan - accuracy: 0.0076 - val_loss: nan - val_accuracy: 0.0000e+00.

It gets worse from there.

When I use categorical accuracy as my metric, the losses for the training and validation sets immediately go to NaN and both sets have an accuracy of one:

loss: nan - categorical_accuracy: 1.0000 - val_loss: nan - val_categorical_accuracy: 1.0000.

In both cases, running model.predict() gives me nothing but nans in the output.

model = create_model(NUM_WORDS, EMBEDDING_DIM, MAXLEN)
history = model.fit(train_padded_seq, train_label_seq, epochs=3, validation_data=(val_padded_seq, val_label_seq))

model.predict(train_padded_seq)[0].shape gives (5,).

model.predict(train_padded_seq)[0, : 5] gives array([nan, nan, nan, nan, nan], dtype=float32).

Changing the maximum sequence length, embedding dimensions, and vocabulary size (NUM_WORDS) did not change the results much. The results were always nan on a prediction and accuracy never went above very small decimals.

balaji.ambresh · October 28, 2023, 5:15am

Please use the correct loss function for multiclass classification problem. categorical_crossentropy can be used as the loss function of a multiclass classification only if the labels are one-hot encoded. Do check tensorflow docs & select the correct function for a multiclass classification problem where labels are integers.

TextVectorizer is not covered in the course. Move your question to General Discussions category. Here’s the community user guide.

Joachim_Rives · October 28, 2023, 9:55pm

I used SparseCategoricalCrossEntropy for my loss function. It works with integers as true labels and arrays of probabilities as the model output. What else should I do?

balaji.ambresh · October 29, 2023, 4:11am

If all else is correct, then check your model architecture.
If you’re stuck, click my name and message your notebook as an attachment.

Topic		Replies	Views
C3W2 Train accuracy high but validation accuracy very low Natural Language Processing in TensorFlow week-3	17	27	February 16, 2025
Course 5 Week 4: Named-Entity Recognition notebook Accuracy Metric maybe wrong Sequence Models coursera-platform	2	547	April 20, 2022
Tried following the labs but my own model is getting 0 accuracy and nan loss Natural Language Processing in TensorFlow	3	27	October 7, 2024
C3W3 ssignment grading in jupyter notebook fails NLP with Sequence Models week-3	2	242	April 10, 2024
Drastic Underfitting of NN (-300 Million loss in first epoch) Natural Language Processing in TensorFlow week-3	2	150	September 19, 2023

Why is my accuracy stuck at tiny decimals, categorical accuracy is one, losses are NaN, but predictions are all NaN?

Problem Summary

Text Vectorization Check

Model Building

Related topics