I no longer have current access to the notebook after certificate. But according to the copy I snapshot last year, it does not seem attention_mask was ever used in training the model:
the tf training dataset has only input_ids. However, I recently found huggingface notebook on training NER, and it is using the same distillbert model, and it does involve attention_mask in the input. In fact, the tf dataset input is a Dict with both input_ids and attention_mask. Using the attention mask make sense since you don’t want to build representation attended to padded stuff.
So I wonder why the notebook in this specialization never use attention mask?
@TMosh Thanks. I wonder if these optional notebooks are being also updated frequently. I copied mine locally last year after completing the specialization, so maybe there’s a chance they fixed it?
I also ran into another issue with this notebook concerning the metrics=[‘accuracy’], i will post a new thread on it.
I am actually trying out different way to train NER for my own project, so looking closely at this notebook than otherwise. I highly suspect the course instructors/TAs had based this off from huggingface’s own tutorial (maybe older version), with their own variation on how dataset is being prepared. I found some of the code and idea useful.
But the ungraded lab notebook is also deficient in explanation, i suspect those long winding parsing function have bugs… I think last week of course 5 has its fair share of complaints in the past, hopefully they will improve in the future.