Sigmoid function within each layer

Fareedat_Bello · November 11, 2022, 12:56am

Hi everyone,
So I am a little bit confused on how the calculations within the layers work. First of all, when we pass in out matrix X into the first layer and it gets distributed to each unit, how come we are getting different parameters (.e w, and b’s) if it’s the same dataset that all three units share. Secondly, how is the parameter w and b being calculated if we just have X’s and the targets are not present? I thought the model needed the Y’s to do it’s computation. Would really appreciate some help with understanding this. Thanks so much.

TMosh · November 11, 2022, 1:04am

The trick is in how the gradients of the w matrices are learned. The method is “backpropagation”, and it works from the output (where we have the ‘y’ labels), backward through the hidden layer (where we don’t have labels).

The process is complicated and isn’t covered in this course, but you can find explanations online quite easily. The video series from “3Blue1Brown” on YouTube is quite good.

TMosh · November 11, 2022, 1:06am

Also if your drawing is of a neural network, you’re missing the depiction of the weight matrices, and of the hidden layer.

There is a weight matrix that connects each pair of adjacent layers.

Fareedat_Bello · November 11, 2022, 1:11am

Okay. I’ll check out the YouTube series. Thanks so much :). For the drawings, isn’t the weight matrices the “a[1]” in the screenshot, that shows it’s the output from layer one?

TMosh · November 11, 2022, 1:52am

This drawing is backwards from the normal presentation. Here the weights are the rectangles, and the layers are the arrows. Usually it’s shown the other way around (the units are the boxes and the weights are the arrows).

Juan_Olano · November 11, 2022, 1:59am

In addition to what @TMosh clearly explained, I’d like to add a couple of thoughts regarding your questions:

Regarding your first: “how come we are getting different parameters (.e w, and b’s) if it’s the same dataset that all three units share.”

You’ll learn that, at each node, you’ll have the following operation: W.T*X + b, which is a linear function, and then you’ll apply an activation function, like ‘Sigmoid’.

From this linear equation we have that, as you very well said, X goes to all units of the layer, so all are getting the same values of X, but the difference is in W and b. Both W and b are initialized with random values, and then, as the NN is trained, they will be affected by the ‘backward propagation’ or ‘backprop’ process, which you will soon learn about. This backprop process is a series of calculations that happen from the end of the NN down to the beginning of the NN, and at each step, the backprop updates the Ws and the b’s of each layer. And this is what causes that we get different Ws and b’s on each layer.

Regarding your second question: " how is the parameter w and b being calculated if we just have X’s and the targets are not present? I thought the model needed the Y’s to do it’s computation."

The W and b parameters are initialized with random values. Then, once the NN is being trained, there will be many iterations of forward and backward ‘propagations’. It is in the ‘backward’ propagations where the W and b are updated, and this is actually the magic on the NN. It is in this process where the NN learns.

So my big hint on this note is: When you get to forward and backward propagation, make sure you understand those two process perfectly, because that is where most of the magic happens.

Hope this sheds some more light to your questions!

Juan

Fareedat_Bello · November 27, 2022, 10:11am

This makes a lot of sense now. Thanks so much for explaining

Topic		Replies	Views
Doubt in neural network units Advanced Learning Algorithms week-1	1	341	September 13, 2023
How do units within the same layer end up with different weights? Advanced Learning Algorithms week-2	3	676	July 28, 2022
Week 3, lesson 5 Neural Networks and Deep Learning	5	530	June 29, 2022
C2_W3_Assignment - Exercise 4 - Implementing forward_propagation() Calculus for Machine Learning and Data Science week-3	3	571	March 10, 2023
Advanced Learning Algorithms: Neural Network Concept Question Advanced Learning Algorithms week-1	6	286	February 5, 2024

Sigmoid function within each layer

Related topics