Hello Community! I hope everyone is having a good time. If not, do not worry. Good times will come!
I have a question regarding how the AI researchers/engineers choose the suitable number of parameters? For example, Llama2 has different versions like 7b, 13b, and 70b. How do researchers estimate that what number of parameters will be suitable? What I mean by my question is that there could be any number of parameters that one can choose to create any model. Maybe Llama2 7b could be 8b or 11b, Llama2 13b could be 15b or 17b or even in millions. That is my doubt about the selection of number of parameters.
I hope I was able to articulate my question. If my question is not clear, feel free to let me know and I will try to provide further clarification.
I believe they choose the number of parameters mostly from experimentation, given the amount of computing and data available. If you don’t have enough data or computers, then choosing a model with less parameters would be better. On the other hand, if you have a lot of data and computers, then choosing a bigger model tends to give you better results.
Some time ago, researchers showed that larger models (along with larger data) tends to perform better, and since then they have been experimenting with larger and larger models to see if there’s a limit to how well they perform.
It turns out that they do get better the bigger they are, and we still don’t really know if there’s really a limit to this.
However, we do have limits on the amount of (quality) data and computing budget, and that’s coming up to be a potential limiting factor for research.
It’s pretty common for companies to offer products with three levels of performance. This allows customers a choice of functional performance vs cost (or in this case, memory size and speed of execution).