I’m in week 1 of the class, where the example for T-shirt demand based on affordability, quality, etc. is used. A simple neural network is created, with an hidden layer accepting a vector of 4 values, and outputting a vector of 3 values. This is fed to the output layer.
I am confused: how is it that each neuron in the hidden layer - which accepts the same 4 values - output a different value? I get it that the output of each neuron is based on W and B, so am unclear how W and B would be different for each neuron since each neuron gets the same 4 inputs.
Hi @Michael_McCandless. Welcome to the community.
You posted this in AI discussions. I have moved it to the right course section. The mentors will attend to your question.
There are two aspects of NN implementation that make this possible:
The hidden layer activations always include some non-linear function.
The individual weights that feed into each hidden layer unit are randomly initialized to small random values.
This second point is critical - it’s called “symmetry breaking”, and sets each hidden layer unit on a separate path toward convergence.
The magic is that it actually works extremely well. The theoretical background is complicated and I personally I do not worry about it. You can probably find some papers about it online if you want further details.
Thank you. I agree that it feels like “magic” going on with NN, but you clearly answered my question.
Will we encounter - in advanced algorithms course - any discussion on how to determine number of hidden layers, and number of units within each hidden layer? I suspect those choices affect the success of the “magic” so would be good to know.
It’s based on guided experimentation. I believe it’s covered during MLS.
Essentially, there is a compromise between getting good-enough results, and creating a model that is too complex or difficult to train considering your specific goals in the project.
The “guided experimentation” comes in deciding what “good-enough” is for a specific situation.
Yes, you’re right that choices of that type are important in whether the effort succeeds or not. As Tom says, it is based on guided experimentation and on previous experience and comparing to other known solutions to similar problems. I have not taken MLS, so I don’t know how much they discuss such issues about design choices. The topic of how to approach making these design choices in a systematic way is covered in some detail in DLS, particularly in Course 2 and Course 3. A reasonable path would be to complete MLS and then take DLS to go “deeper” (pun fully intended).