Question in intro video

s-dorsher · December 19, 2024, 10:45pm

Okay, I’ll be honest. I’m starting this course late in the week because I just finished the previous course. Fortunately, I have some familiarity with the material of the previous course already, and have heard talks about some of the content in this course. But I haven’t actually implemented a neural network ever… so, I am finding myself surprised as I think this through.

I only just finished the first practice quiz in week1.

But, I was thinking about it, and I’m not sure I understand something. If all the neurons in the first layer receive the same input vector x, then why don’t they compute the same weights and the same activation function (probability or f(z) or whatever)? Does this rely on the neural network using stochastic processes, such as stochastic gradient descent, or something like that? If it were deterministic, the minimum of the cost function should be the same, right? Or is this a way of finding different local minima starting from different seed values in the parameters in the w vector?

conscell · December 19, 2024, 11:51pm

Hi @s-dorsher,

Let’s suppose that {\bf x} is the input vector, {\bf W} is the matrix of weights, and {\bf b} is the bias. A single layer of the neural network performs the following computation {\bf a} = g({\bf x}^\top {\bf W}+ {\bf b}). Although the input {\bf x} is the same for all neurons in a layer (each column of {\bf W}), the outputs and activations {\bf a} differ because each neuron has different weights and bias. Neural networks initialize the weights and biases randomly. This randomness ensures that the neurons in a layer start with different parameters. Without this randomness (e.g., if all weights were initialized to the same value), all neurons in a layer would perform identical computations during forward propagation and would update identically during backward propagation. This would essentially collapse the neurons into a single effective neuron, defeating the purpose of having multiple neurons.
UPD: fixed the notation to match lectures.

s-dorsher · December 19, 2024, 11:58pm

Okay, so I guess this is maybe what was confusing me… I think what you’re saying is that the whole thing is “solved” at once rather than separately minimizing each neuron with gradient descent before moving “forward” in the network? I don’t think I actually know how forward propogation and back propogation work yet, maybe I should keep watching. But it sounds like there’s some sort of simultaneous process going on with the different values, stepping them at the same time.

TMosh · December 20, 2024, 12:04am

All of the weights in an NN are initialized to small random values. This is “symmetry breaking”, and if it isn’t performed, then you do in fact have very hidden layer unit learn exactly the same weight.

conscell · December 20, 2024, 12:18am

@s-dorsher, Please watch the Vectorization section. It shows how to implement neural networks effectively by using matrix and vector operations, so there is no need to minimize each neuron separately.

Topic		Replies	Views
How do units within the same layer end up with different weights? Advanced Learning Algorithms week-module-2	3	692	July 28, 2022
Activation Functions, Weights, and Biases of Each Layer Neural Networks and Deep Learning coursera-platform	2	1190	December 5, 2021
Open question: Regarding the working of a layer in the neural network Advanced Learning Algorithms week-module-1	3	487	March 6, 2023
Advanced Learning Algorithms: Neural Network Concept Question Advanced Learning Algorithms week-module-1	6	290	February 5, 2024
Weights in each layer Advanced Learning Algorithms	4	298	January 9, 2024

Question in intro video

Related topics