Identify correct parameter values

The course gives default values for various parameters during the lab and in assignments - eg 50 or 400 for size of word embeddings, 0.003 for alpha, batch_size = 4/128, N = 2/3/4 in previous week’s N-gram lecture, and so on.

In real world, how do we identify the correct value for these parameters and how do we confirm that the values chosen are indeed the best values possible.

From experience in similar models or from trial and error (and here are included search algorithms that try different hyperameters automatically, essentially they are trial and error).

I had imagined that there would be a way to identify or atleast estimate good enough values. Trial and error approach might be good for small models. Imagine the cost of running LLMs again and again using just trial and error approach !

No there is not ready formula you can apply to find them, you can run automated searches of hyperameters such as grid search, bayesian search, random search, but its good that one would start with values used in similar models.