Multi-task training refereneces

In lecture, Andrew mentioned that multi-task learning performance is only similar when the network is big enough. Anyone has the reference/literature on that? Thanks!

Hey @derickwh ,

You may probably want to check out papers on multilingual translation (for example). Many of those models were trained using multi-task learning and include some analysis on training data.

