In lecture, Andrew mentioned that multi-task learning performance is only similar when the network is big enough. Anyone has the reference/literature on that? Thanks!
In lecture, Andrew mentioned that multi-task learning performance is only similar when the network is big enough. Anyone has the reference/literature on that? Thanks!