Hello over there
I am currently doing the second course (advanced learning algorithms) in Machine Learning specialization on Coursera. I am wondering why didn’t Andrew explain how hidden layers get trained. I know that we need a set of inputs and their correct output to be able to train a typical machine learning model, but how do the weights of hidden layers get calculated without having a set of input and output for each neuron separately?
another question… How does the number of neurons in each layer differ? and how does each neuron calculate a different parameter knowing that all neurons receive the exact same input and have the same activation function and cost function?
Sorry if these questions are a bit silly, I have just started my path in AI
Not silly questions at all! This is all key issues to understand how things work here. There are several layers to the answer:
The number of neurons in each hidden layer and how many hidden layers you have is a design choice that you need to make. I’m from DLS and haven’t seen how he presents this in MLS yet, but in DLS he treats this as a more advanced topic (how to make that choice intelligently) and covers it in Course 2 of DLS. The idea basically is you try different architectures based on your previous experience or experience of what others have shown works on similar problems and then you do some experiments to fine tune the architecture based on how it performs. This type of design choice you have to make are what Prof Ng calls “hyperparameters”. The “parameters” are values you can learn through training, but the “hyperparameters” you must choose and then figure out if you have made good choices.
Then the question of how the neurons get trained when they all get the same inputs is also interesting. Every neuron at a given layer always gets all the outputs from the previous layer as inputs. But what we have to do is start with different random values for each of the weights on each of the neurons so that we “break symmetry” and each neuron can learn a unique behavior. Here’s a thread in DLS that explains Symmetry Breaking. The key first step before we start any training is “Random Weight Initialization” precisely so that we get different behavior from each neuron.
The actual learning in the hidden layers all happens through “back propagation”. Just as forward propagation takes all inputs through each neuron in each layer going forward, the gradients propagate backwards through the layers to “push” the weights at each neuron in the direction of a better result. The hidden layers all get touched by that process.
Hope that gives some background. But even if all that doesn’t totally crystallize based on what Prof Ng has said so far in MLS, just “hold that thought” and I’m sure things will get more clear as you go through the lectures and listen to what he has to say in the rest of the course.