Question about optimal parameters and training dataset for Finetuning

The lecture - Scaling laws and compute-optimal models mentions an optimal training dataset in relation to the number of parameters the model is being trained (approx 20 times) in the context of pretraining. Do these rules apply to Finetuning as well?

It depends on the kind of fine tuning, if its full scale fine tuning changing the wights of the model probably yes, if its PEFT or LORA (you will learn about these later on) probably not.