I found two possible problems in my code:
I think the expected “:” could be in the cell #GRADED FUNCTION: predict in the last for loop, which looks something like
for tag_idx in None #HERE IS A : missing!
pred_label = None
...
Regarding why my model did not train, I think the error was due to explicity choosing an axis with tf.reduce_sum() when calculating the masked_acc. But more likely than not, this was just another problem Hope this helps!