Orchestration of units in a layer: Who tells each unit what to look for?

Hi, in the first lessons of Week 1, Andrew explained the general structure of a NN and stated that the features of a hidden layer are developed by the NN itself, regarding to the given training data.
He gave an example where each of the units in layer 1 looks for a different kind of line or edge (or attribute like “affordability”, “quality”).
Who orchestrates the units in a layer and tells each unit what to look for?
I believe they do not randomly decide what kind of pattern they “want to” look for?
Because if they would do so, wouldn’t some (e.g. 3) of them search for the same pattern accidentely and do the same job (e.g. 3 times affordability, no one looks for quality)?
Wouldn’t that make some (e.g. 2) of them useless or would weight the activation vector in a misleading way (e.g. affordability is more important than quality)?
Thanks in advance!

No one tells a layer what to learn.

“detecting an edge” or any other attribute that Andrew mentioned is just an intuition to help explain the overall method.

In practice, there’s no way to specify what each layer learns. Nor should there be.

As Tom says, nothing controls which layers and which neurons within a given layer learn which attributes to detect. The reason that they don’t all learn the same thing is precisely because we do “Symmetry Breaking”, by randomly initializing all the weights differently. Of course all this is statistical, so it’s still at least technically possible that two neurons could end up learning the same thing, but the probability of that happening is pretty low. Here’s a thread about Symmetry Breaking.

Speaking of low probability events, it is technically possible that 5 seconds from now, through Brownian Motion, all the molecules in the atmosphere of the room you’re in could be concentrated in one cubic centimeter up in one of the corners of the ceiling and that you would instantly be in a vacuum and your lungs would explode. The Laws of Physics do not prevent that from happening, but should you be worried about it? The probability that it could happen is literally non-zero, but it’s so close to zero that it’s not worth worrying about.

4 Likes

If you had to choose the conductor :slight_smile: then it would be the Loss function. The way this conductor orchestrates the units is by back propagation.

1 Like

Great, thanks for the info about Symmetry Breaking, I wasn’t aware of that.
Interesting, that different starting points will end in different cost-optimiziation results, sounds to me that each unit finds something like a different local minimum.

It’s interesting to think about the implications, but remember that the loss is not measured on a “per neuron” basis. They are using the housing price example here, but over in DLS Course 1, we deal with the case that the inputs are 64 x 64 RGB images and the network is trying to identify whether or not there is a cat in the picture. So the loss is measured only on the final distilled answer of “Yes it’s a cat” or “No there is no cat” that we get as the output of the final layer of the network and comparing that to the “Label” values that tell us the correct answer. So how that is affected by the behavior of individual neurons is what Arvydas discussed: back propagation of the gradients from the loss based on the final answer backwards through all the layers.

2 Likes

Hi Paul,
thanks for clarifying!
Backpropagation and cost function of an NN weren’t covered in week 1, so I guess I just need to be more patient.

Hi, Bernhard.

Sorry, I haven’t taken MLS yet, so I wasn’t aware of the order in which they cover the material, but you were asking more advanced questions. As you say, just “stay tuned” and Prof Ng will have much more to say about all this. And then DLS will cover it with yet more depth (pun intended) if you take that after MLS.

Regards,
Paul

1 Like