In the labs, the w’s and b’s are generated automatically once the model with layers with units is created. I would like to know how the model knows how to create these w’s and b’s without knowing what the layers and their units are for.
Hi @William_Chau ,
Welcome to the community! This is your first post
Regarding W and b, these are randomly initialized at the beginning. W has to be specially initialized randomly and for that there are several methods. ‘b’ could be initialized randomly as well, or just with zeros.
The shape of W and b will depend on the number of units (neurons) of the layer, and the number of units of the previous layer. In the case of the 1st layer, it would take the shape from the number of units of the first layer and the number of features of the input.
Once the training starts, the W and b of each layer are updated by the back propagation process.
I hope this sheds light on your question!
Juan
How do we know how many layers and how many units per layer to create? Is that a way to specify what unit represents what? For example, can I ask unit 1 in layer 1 to learn whether an animal has a tail or not, and etc.? I thought that controlling what unit in what layer to do what is specified by human input. No? But I can’t find that in the example codes.
You, as the architect of your NN, will define how many layers, and how many units within each layer. As you advance in your course you’ll learn more about this.
You’ll also learn that every unit receives every input from the previous layer, and it is up to the model to learn what to represent on each unit of each layer. This is also something you’ll shortly learn.
I’d say that your questions are headed in the right direction as that’s exactly what’s coming in your next weeks, so stay tuned!
Juan
Thank you for your quick responses, Juan!
Hi Juan,
I just finish week 4 of Advanced Learning Algorithms. But I still haven’t had my questions answered. Can you point to me what sections of the course that can address my questions?
Thanks in advance!
–William
Not sure which questions are not answered. I think all the questions were answered. Can you check the thread again and share again your pending questions?
You said that:
I’d say that your questions are headed in the right direction as that’s exactly what’s coming in your next weeks, so stay tuned!
Can you point me out which week I can find that?
Ahhh ok. If you finished all MLS then you might want to take the next level which is Deep Learning Specialization. I may have been referring to that, sorry for the confusion.
Hello @William_Chau
Once the layers and the number of neurons in each layer have been decided, then we leave it to the learning algorithm to find the w and b of each of the neurons such that the cost function J is minimized.
When we decided on the number of layers and the number of neurons in each layer, we did not do it with the knowledge that a specific neuron will perform a specific function and will assume the role of a specific piece in the overall puzzle. This is also why the Neural network is sometimes referred to as a black box - Except for helping the learning algorithm by providing a few techniques like the Adam Optimizer, Dropout etc, the rest of it is controlled by the math with the sole goal of achieving the convergence condition, and we can remain merely a spectator to the whole process.
Hi @shanup,
Given that a neural network is a black box, why do we know how many layers and how many units in each layer we need? The Advanced Leaning Algorithms does not cover that.
So the truth is that, we do not know how many layers and units in each layer are needed - This choice of number of layers and units is arbitrary.
We could look at other models for inspiration; take an educated guess. Other than that, we really do not know much more.
We can try out several models, with different choices of the layers and neurons and then decide on the smallest topology that best meets the convergence conditions.
Yes, in the beginning that would be our reaction
But over time we realise that it’s not so bad after all.
Does the Deep Learning Specifications covers the general guidelines for setting the number of layers and number of units?
Yes, DLS covers all those issues. You’ll want to take at least C1 and C2 of the series to get all that information. Course 1 gives you an in depth introduction to fully connected feed forward Neural Networks, including how back propagation works. Course 1 will show you some examples of networks that can solve some types of image recognition problems. Then Course 2 will go into more detail about how to assess the performance of a network and how to systematically approach solving problems when your network does not perform as well as you want, which includes how to tune network architectures and deal with issues like regularization and the selection of hyperparameters in general.
Hello @William_Chau,
I do have my 2 cents regarding your question. I suppose not many people will like it when I say, “you may stop adding more layers or neurons until you achieve the best performance”, because it does not seem like a direct answer. Instead, it seems like I am asking you to try it yourself.
However, if you are willing to try, then I would suggest you to have in your left hand “how others set up their neural network for a problem” which will require you to read a lot.
In your right hand, you need to know “how to adjust your or others’ neural networks to make it better and better”. This is one of the many things that Paul told you that the DLS can give you. If you want to have a glimpse of that, “hyperparameter tuning” is the keyword to start searching from.
You will hear a lot about fighting overfitting and underfitting in DLS C2 when hyperparameter tuning comes, and actually you might have already heard of it in MLS C2 W3. I suggest you to look at this “fighting” in 2 ways. First, it is fixing an overfitting (/underfitting) problem that you observe, but second, introducing “overfitting” then suppressing it is opening up some new possibilities. Here is what Andew said in his weekly “The Batch”:
When supervised deep learning was at an earlier stage of development, experienced hyperparameter tuners could get much better results than less-experienced ones. We had to pick the neural network architecture, regularization method, learning rate, schedule for decreasing the learning rate, mini-batch size, momentum, random weight initialization method, and so on. Picking well made a huge difference in the algorithm’s convergence speed and final performance.
Thanks to research progress over the past decade, we now have more robust optimization algorithms like Adam, better neural network architectures, and more systematic guidance for default choices of many other hyperparameters, making it easier to get good results. I suspect that scaling up neural networks — these days, I don’t hesitate to train a 20 million-plus parameter network (like ResNet-50) even if I have only 100 training examples — has also made them more robust. In contrast, if you’re training a 1,000-parameter network on 100 examples, every parameter matters much more, so tuning needs to be done much more carefully.
Was Andrew saying that we need 20 million+ parameters for a problem of only 100 training samples? I think he didn’t. I suggest you to read the whole article in full. The link to it is here.
To conclude myself, I think if we want to get to a good model, we need to know all those hyper-parameters that we can tune, and with the experience in our left hands and skills in our right hands, we will be more able to answer ourselves how many layers and neurons we can handle.
Cheers,
Raymond
There isn’t a rule that control number of layers and number of neuron in each layer should be . it is optional by your self for example if you want to get better performance and the time didn’t matter to you you will built many layers like 100 and in each layer number of neuron will be big also but it may fall into overfitting and it take much and much time so many deep NN layers and neuron like VGG - 16 it made by experiment How does this model perform on different data?..so by yourself decide how the architecture of the model do you built but every thing has cost like time will increase if the NN is so big and it may lead to overfitting …but there are rules when you built NN like the batch normalization layer should be between the fully connected layer FC and the activation layer … etc and many rules that control how to built NN but not how the architecture should be(number of layers and neuron)
Thanks!
Abdelrahman