About how to determine the number of layers of the neural network and the number of neurons in each layer

AbdElRhaman_Fakhry · February 6, 2023, 6:33pm

Hi @_AisingioroHao
Welcome to the community!

Why Relu activation function in the hidden layer is discussed in this thread
Why number of nuerons decrease that’s because :

Suppose you have 2,000 inputs and 1 output.

Let’s look at a some potential architectures

The hidden layers are:

2000, 2000, 2000, 2000

a. ~16 million parameters
2000, 1000, 500, 250

b. ~6.63 million parameters
1000, 1000, 1000, 1000

c. ~5 million parameters

Obviously network 1 can do anything networks 2 or 3 can do and more, but at roughly triple the cost per evaluation/backprop and requires more data, so likely more than triple the training time

There are two things to consider here:

Is the performance increase in network 1 worth the increased cost?
Networks 2 and 3 have roughly the same number of parameters, but which is more effective?

The answer to both of these will depend on the particular problem, but most of the time, for a starting architecture, Network 2 will be a pretty safe choice.

Whether the increased performance is worth the cost is something you’ll have to judge for yourself, but if it is, then perhaps you can get similar performance for even less cost by training a narrowing network that simply starts out larger.

Our ultimate goal is to map 2000 dimensions of data into 1. Intuitively, it seems like doing it smoothly might be better than doing it all at once. At each layer, the network learns new features based on the previous layers. We would like to remove the “noise” in the data while keeping the important information. Narrowing the network forces it to give up some information, while desperately trying to keep as much relevant information as possible.

In practice, narrowing networks seem to work better. In theory, we can see a little bit of why they might, but it’s very hard to say anything conclusive.

Our ultimate goal is to map 2000 dimensions of data into 1. Intuitively, it seems like doing it smoothly might be better than doing it all at once. At each layer, the network learns new features based on the previous layers. We would like to remove the “noise” in the data while keeping the important information. Narrowing the network forces it to give up some information, while desperately trying to keep as much relevant information as possible.

In practice, narrowing networks seem to work better. In theory, we can see a little bit of why they might, but it’s very hard to say anything conclusive.

Topic		Replies	Views
Shallow vs Deep Neural Network Advanced Learning Algorithms week-3	3	644	February 12, 2023
Layers and Neurons Advanced Learning Algorithms week-1	7	669	December 18, 2022
Choosing Neural Network architecture Advanced Learning Algorithms week-1	3	382	November 13, 2023
How to decide number of Layers and Units in a Layer? Advanced Learning Algorithms week-1	20	1734	March 19, 2024
Neural Network Architecture - What factors in determining # of layers and units/nodes in each layer AI Discussions ai-discussions	2	92	February 12, 2024

About how to determine the number of layers of the neural network and the number of neurons in each layer

Related topics