Hi,
Does the chinchilla paper guidelines of having a number of token =~20 times the number of parameters in pre-training apply for instruction fine tuning? Any recommendation on the size of the instruction fine tuning data size?
Thanks
Hi,
Does the chinchilla paper guidelines of having a number of token =~20 times the number of parameters in pre-training apply for instruction fine tuning? Any recommendation on the size of the instruction fine tuning data size?
Thanks