Why does BERT trained on TyDi QA dataset is able to do a better job at answering the question on comics books?

I understand we fined-tune BERT using TyDi QA dataset and I assume the question about comics was NOT in TyDi QA dataset. How intuitively can I understand why fine-tuning BERT on TyDi QA improved his answer on the comics books?

Is pre-trained BERT actually not well trained about Q&A tasks so that, for any Q&A task, we SHOULD fine-tune BERT using TyDi QA dataset (or equivalent Q&A dataset)?

Hi @Francois_Ascani

If I understand your question correctly, then the answer would be: fine-tuning for specific sub-problem helps the models performance in that domain. In other words, if you have a problem for medicine domain, then fine-tuning on some medicine domain specific dataset helps with the fine-tuned model’s performance in that domain. Does that makes sense?


Thanks! Yes, it makes sense but I am still puzzled by the failure of BERT using the pre-trained checkpoint. After all, (1) pre-trained model are supposed to be “pretty good” already, (2) the task asked (answer the question given the context) does not seem so special that you would need to fine-tune, and, last but not least, (3) we give the entire context needed to answer the question.

Maybe the answer is the one who provided to my other question. BERT actually does not understand well dates? That would make sense and that would be interesting. I would then wonder of what other idiosyncratic things, BERT is not good at naturally.

Hi @Francois_Ascani

:slight_smile: the quotes are important here - it depends what one considers “pretty good”. Compared to old models BERT is pretty good.

The “task” is hard enough that fine-tuning alone would not solve it, but it might help.

I would argue that is not true - What we learned as humans (in life, in school and in evolution overall) is the context the model does not have. For us (humans) those “words” is all the context that is needed because we have prior knowledge (prior context of contexts of contexts etc.) that the model lacks.


Interesting. Thanks for the discussion! I will ponder your response :slight_smile: