Why does BERT trained on TyDi QA dataset is able to do a better job at answering the question on comics books?

Francois_Ascani · February 4, 2023, 8:11am

I understand we fined-tune BERT using TyDi QA dataset and I assume the question about comics was NOT in TyDi QA dataset. How intuitively can I understand why fine-tuning BERT on TyDi QA improved his answer on the comics books?

Is pre-trained BERT actually not well trained about Q&A tasks so that, for any Q&A task, we SHOULD fine-tune BERT using TyDi QA dataset (or equivalent Q&A dataset)?

arvyzukai · February 6, 2023, 3:00pm

Hi @Francois_Ascani

If I understand your question correctly, then the answer would be: fine-tuning for specific sub-problem helps the models performance in that domain. In other words, if you have a problem for medicine domain, then fine-tuning on some medicine domain specific dataset helps with the fine-tuned model’s performance in that domain. Does that makes sense?

Cheers

Francois_Ascani · February 6, 2023, 7:59pm

Thanks! Yes, it makes sense but I am still puzzled by the failure of BERT using the pre-trained checkpoint. After all, (1) pre-trained model are supposed to be “pretty good” already, (2) the task asked (answer the question given the context) does not seem so special that you would need to fine-tune, and, last but not least, (3) we give the entire context needed to answer the question.

Maybe the answer is the one who provided to my other question. BERT actually does not understand well dates? That would make sense and that would be interesting. I would then wonder of what other idiosyncratic things, BERT is not good at naturally.

arvyzukai · February 7, 2023, 8:17am

Hi @Francois_Ascani

the quotes are important here - it depends what one considers “pretty good”. Compared to old models BERT is pretty good.

The “task” is hard enough that fine-tuning alone would not solve it, but it might help.

I would argue that is not true - What we learned as humans (in life, in school and in evolution overall) is the context the model does not have. For us (humans) those “words” is all the context that is needed because we have prior knowledge (prior context of contexts of contexts etc.) that the model lacks.

Cheers

Francois_Ascani · February 7, 2023, 7:22pm

Interesting. Thanks for the discussion! I will ponder your response

Topic		Replies	Views
Why does BERT fail at answering the question about comics books? NLP with Attention Models week-module-3	4	529	February 7, 2023
Fine Tuning BERT collab NLP with Attention Models week-module-3	1	473	May 30, 2023
C4_W3_2 Fine tuning tydiqa NLP with Sequence Models week-module-3	1	454	May 31, 2023
Clarification about Course 4 Week 3 HW NLP with Attention Models week-module-3	2	610	May 6, 2022
Where is the training for "Question answering" part? NLP with Attention Models week-module-3	1	395	August 10, 2023

Why does BERT trained on TyDi QA dataset is able to do a better job at answering the question on comics books?

Related topics