When we are making a neural network, how do we decide on number of hidden layers and number of units inside of hidden layers? For example recently I was working on the loan dataset where I had to predict if the customer will be able to pay the loan. There were about 15+ features (Such as salary, age, previous loans, etc).
I am curious what is the deciding factor for number of hidden layers. I completed the ML Specialization recently, but did not found answer to this in any of its course.
welcome to the community!
Here you should find a similar thread where the question is answered: Dimensioning a neural network - #2 by AbdElRhaman_Fakhry
In summary:
there is no right or wrong, but you can think about if the dimensional space which is spanned by the neurons would be sufficient to solve your problem. Finally it’s your task to design a good architecture, also with trial and error. I have seen it often, that ML engineers rather tend to increase the feature dimensions (or number of neurons) in the first hidden layer compared to the input layer which can be interpreted as giving the net more opportunity to learn complex and abstract behaviour.
For example you can:
- check and quantify if these feature contain mutual information resp. remove redundant information (w/ Principal Component Analysis (PCA) or Partial Least Square transformation (PLS)) to enhance your ratio of data to features;
- also check the feature importance to focus on the most meaningful ones!
More details on PCA, PLS, importance calculation etc. can be found here: https://github.com/christiansimonis/CRISP-DM-AI-tutorial/blob/master/Classic_ML.ipynb
Please let me know if you have any open questions.
Best regards
Christian
Hello @Devarsh_Mavani
In addition to what @Christian_Simonis has said, DeepLearning.AI still has another interesting course that explains Hyperparameters tuning and searching for the best model architecture to use. Check it out Machine Learning Modeling Pipelines in Production
Thanks for your reply @Christian_Simonis.
I went through the links and what I understood is that there is no mathematical way of arriving at no. of layers and neurons. It is completely based on trial and error. If it is the case, Then won’t it be computationally expensive to make something as powerful as chatGPT? and also time consuming
@Devarsh_Mavani For such massive models, there is a technique used called Neural Architecture Search
Which utilizes KerasTuner
KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search. Easily configure your search space with a define-by-run syntax, then leverage one of the available search algorithms to find the best hyperparameter values for your models. KerasTuner comes with Bayesian Optimization, Hyperband, and Random Search algorithms built-in, and is also designed to be easy for researchers to extend in order to experiment with new search algorithms.
You can read more about it in this papers
@rmwkwok might be interested in this as NAS came up recently in a different thread
Thank you @ai_curious!