Hello @skinx.learning
no where it mentions to mask with 0, Lucas meant if you do not pass an axis argument, it takes the value default as 0, so check if you have chosen the correct value for axis.
I will explain you stepwise on how the code should have looked.
First I am hoping you have gone through the below instructions
Then in the masked_accuracy grader cell,

the first code is to find loss for each items in batch, and
You must always cast the tensors to the same type in order to use them in training. Since you will make divisions, it is safe to use tf.float32 data type.
So you should used tf.cast to the true labels with the mentioned data type in the instructions. 
Next to creak mask, we need to ignore values. this step was divided in two steps.
the first one needed to mask by using tf.not_equal to the true labels and the value is 1
then the mask need to tf.cast with the previous mask recall and using the same data type of tf.float32 
Now to get predicted values you again do it in two steps.
First you apply tf.math.argmax to the prediction logits(y_pred) with the axis being 1.
as the first step has been recalled as y_pred_class, you tf.cast the y_pred_class from the previous recalled code and apply the datatype tf.float32 
Now to compare true values with the predicted ones (again in two steps)
first you check if y_true values is tf.equal to y_pred_class, this being recalled as matches_true_pred
then you use the same tf.cast on the matches_true_pred while using the datatype of tf.float32 
Now you multiply the previous recalled code line matches_true_pred with mask

The last step to compute masked_acc(here tupling the numerator and denominator separately is important.
you divided tf.reduce_sum of matches_true_pred to the tf.reduce_sum of mask
Regards
DP