If I understand the image above correctly, I assume that neuron_1 (I will call them units from now on cause I am not 100% sure of the proper terminology) only checks if the temperature is greater than 175 degrees. Unit 2 checks to ensure that time is greater than 12 minutes. And Unit 3 checks if the ratio of time and temperature does not exceed some function. Am I correct in that assumption?
What I am most confused about is how do the units organize themselves in this way? How does unit 1 say “hey, I’ll make sure that roasting is not too cold” while the others leave unit 1 to it and do their own thing? What stops all three from doing unit 1’s job at the same time? Did I miss something in the lab or previous lessons?
The units appear to organize themselves because they learn the weights which minimize the cost.
That’s kind of the remarkable thing about neural networks - you don’t have to tell them how to work. You just give them a cost function to optimize, set up a suiitable number of units in each layer, and then keep out of the way.
Each neuron tends to learn different things, because we initialize them randomly before we start the training. It is necessary to do this for “symmetry breaking”: if we started with all the neurons having equal weights, then they would all learn the same thing. But if we intialize them all differently, then they will take separate optimization paths during the Gradient Descent to minimize the cost function. That means that you can’t predict what a given neuron will learn: if you do the initialization differently, then a different neuron will (probably) learn the “make sure that roasting is not too cold” thing. What you hope is that the same things will be learned, but you don’t know which exact neuron will learn a given aspect of the pattern recognition in a given training starting from a particular random initialization.
It’s an important question, of course, but there is no easy answer. It requires some experience and there is no guaranteed way to get a correct answer on your first try. A good place to start is to be aware of other systems that solve similar or related problems. Starting with an architecture that was successful on a problem of similar type and complexity is good first guess. But then you have to try it and see how it works. Then you may have to tune or adjust depending on the results.
This is not an easy question and Prof Ng will discuss it at a number of points as you go through the courses here and again in the DLS specialization (particularly in DLS Course 2 and Course 3). So maybe the best idea is to “hold that thought” and watch the examples that Prof Ng shows of systems that are able to solve particular problems and listen for the cases in which he discusses how to make design choices like this.