Question regarding a quiz from "Bird recognition case study"

According to two of the questions, the distribution difference between training set and dev/test set doesn’t matter but the distribution difference between dev set and test set matters.
Can someone explain why?

Also, I wonder if adding samples from different distribution to the training set would even increase the performance of the model, since it has regularizing effect?

Lastly, does “distribution” indicate the same thing as “domain” in Machine Learning?

Thank you in advance.

The distribution it refers simply to the changes in data specifics from one set to the other set, or similarities if you wish within the set and with other sets.

Normally the distribution of train dev test sets should be same because if the ML model learns from one set than it should encounter similar data in other sets too so it can be accurate.

About the domain I am not sure but I think it refers to the specific application of ML model rather than distribution of data that the model learns from or tests on.