It seems like labels has the wrong dimensions. Could it be that your one_hot implementation isn’t correct? The ‘train_signs’ dataset has 6 labels:
From " Basic Optimization with GradientTape"
The dataset that you’ll be using during this assignment is a subset of the sign language digits. It contains six different classes representing the digits from 0 to 5.
Double check the correctness of the one_hot implementation.