It seems like labels has the wrong dimensions. Could it be that your one_hot implementation isn’t correct? The ‘train_signs’ dataset has 6 labels:

From " Basic Optimization with GradientTape"

The dataset that you’ll be using during this assignment is a subset of the sign language digits. It contains six different classes representing the digits from 0 to 5.

Double check the correctness of the one_hot implementation.