Hello,
If you define a layer with 25 neurons,
Does each neuron have different values for the w vector and b?
I mean during Course 1 you learn how to determine the best w vector and b based on the features (2 features, 2 elements in the vector w), you can have one neuron for that, but in a network model you could have 25 or more, and all the neurons receive the same input. Is there a relationship among them?
How does the network model work under the hood to calculate the loss and cost function?
How do you define how many neurons are required In each layer?
Thanks.
Gus
@gmazzaglia my understanding is, yes, each neuron will have its own weight (W) and corresponding bias term (b). But for simplicity in calculation these are basically stacked as a vector, one per layer.
As to optimization (i.e. update) and loss, well that is a little more complicated to just ‘gloss over’. But traditionally you’d work out the first derivative of your cost function for every preceding layer and thus perform what is called ‘back propagation’ to slowly move the weights (and thus cost of the loss function) toward a global minimum.
There are actually a number of ways to do this (including momentum, RMSProp, ADAM, etc), and the greater choice of what cost function to use…
But one you have probably heard of is gradient descent, or specifically stochastic gradient descent.
To explain all this in detail would… be pretty hard in one post… But rest assured it is all covered in great deal in the Deep Learning Specialization.
As to the size (number of neurons in a layer), I think this is both a little bit of an art… But also really depends on what type of architecture you are dealing with-- is it dense (i.e. fully connected), a convolutional net, etc ?
This is all covered in the DLS.
1 Like
Hi @Nevermnd, thanks for the explanation. I cannot wait for that course. But well, step by step.
Thanks.
Regards.
Gus
1 Like