C2_W2_Relu-Activation Lab

Thrasso00 · November 29, 2022, 10:04pm

Hi,
After having seen this optional lab I am a bit lost.
I thought I understood the usefulness of the sigmoid function as a tool to determine the probability of an input being one or not when we saw logistic regression during the previous weeks.
During this week new functions have been explained to us such as the ReLU function. In this lab this function is presented as a tool that allows “models to stitch together linear segments to model complex non-linear functions”.
We use a layer with three units to explain how it works.

I have several questions:

If x is a vector where why is there only one multiplying component?
In previous labs we had seen that each of the units provides a component to the activation vector of the next layer.

In this lab it seems that the operation is different. We are only seeing the first component of the activation vector?
Why is there a 0 at the end of the sum of each unit?

I still have many more questions but I don’t want to make this topic overloaded.
Could someone please help me?

balaji.ambresh · November 30, 2022, 4:39am

Please look at the community user guide to move your topic to the right category. I’m sure this topic belongs to MLS and not MLEP.

pastorsoto · November 30, 2022, 12:34pm

Hi @Thrasso00 Great questions, I will try my best to answer them the best I can.

First question:

I think it’s a matter of notation, if you want to represent the third element at layer 5 you could use a^{[5]}_{3} in this case it’s trying to represent vector X as the only element of the input layer.

Second question:

I didn’t understand the second question, do you mind asking again?

Third question:

The output layer calculates the sum of the weights and biases of the previous layer plus a constant value or intercept. So, in this case, the constant value will be 0.

Thrasso00 · November 30, 2022, 10:00pm

Hi @pastorsoto,
Thank you for helping.

I do admit that my questions are not clear enough.
First question
(please see image below) My question in green.

Second question
A previous lab shows how each unit creates the components of the next activation vector.

However, in the following image it seems that the approach is different.

Third question
Why the constant value is 0 in this case?
Fourth question
In this lab we are the ones who decide the slopes and intercept. We also decide at which point each section starts and ends. But how is the algorithm able to find these values automatically?

I hope my questions are a little clearer now

rmwkwok · December 1, 2022, 4:43am

Hello @Thrasso00,

My response based on just your screenshots:

Q1. A sample’s x is always a vector, and it can be a vector of one feature, which makes it look scalar. The bottom right equation is the general form of a sample and n features. The upper right equations is a simple case of having only one feature.

Q2. The upper screenshot is for the general case of n features, what the lower one is for the specific case of 1 feature.

Q3. Actually it isn’t just the bias is 0, all the weights are set to 1 as well. However, the slide also said “fixed weights”, and so they are fixed to 1’s and 0.

Q4. By gradient descent. Gradient descent is an optimization algorithm (optimizing for lowest cost) that updates weights and bias bit by bit and towards lower cost. At the lowest cost, the slopes and intercepts should be the same as those you decided.

Cheers,
Raymond

Thrasso00 · December 1, 2022, 9:32pm

Hi @rmwkwok,

Thank you for helping me again.
Please find my responses below in bold.

rmwkwok:

Q1. A sample’s x is always a vector, and it can be a vector of one feature, which makes it look scalar. The bottom right equation is the general form of a sample and n nn features. The upper right equations is a simple case of having only one feature. Ok. It is clearer for me now.

Q2. The upper screenshot is for the general case of n nn features, what the lower one is for the specific case of 1 feature. Ok. I see now

Q3. Actually it isn’t just the bias is 0, all the weights are set to 1 as well. However, the slide also said “fixed weights”, and so they are fixed to 1’s and 0. OK.

Q4. By gradient descent. Gradient descent is an optimization algorithm (optimizing for lowest cost) that updates weights and bias bit by bit and towards lower cost. At the lowest cost, the slopes and intercepts should be the same as those you decided. Ok. But then, does each neural network unit perform a gradient descent to find the minimum of the cost or loss function as we saw in the previous weeks?
I think it would be clearer to me if we could see an example showing how the gradient descent and the loss function are used in a neural network.
Could you please tell me where I can find such an example?

rmwkwok · December 2, 2022, 2:07am

Hello @Thrasso00,

Yes, at each update, each weight does a gradient descent in the objective to get to a smaller cost.

Let me first ask you a question. In linear regression, gradient descent updates each weight with this formula:

w := w - \alpha\frac{\partial{J}}{\partial{w}}

Linear regression is identical to a neural network in the following config:

only one layer, in which there is only one neuron
use linear as activation
use squared loss as the loss function

Therefore anything about gradient descent that you have learnt in C1 can apply in the above NN. My question is, I think the above formula is sufficient to explain how the weights (in the neuron) is updated by gradient descent, do you agree?

My question is important because this is how we can transit from to a boarder world with existing knowledge.

Cheers,
Raymond

Thrasso00 · December 7, 2022, 5:06pm

Sorry for my late response.

In course one there were optional labs that showed step by step what the code did.
In week one lab C2_W1_Lab02_CoffeeRoasting_TF you can see that there are three phases in a neural network, model building, model compile/fit and predictions. I don’t think we have seen what exactly happens in the “fit” phase.

Maybe it is not necessary or it is impossible. But showing an example with an input matrix with several rows and columns and layers with few units to see step by step how the gradient descent is applied would be positive in order to understand it better. Maybe I want to go too far in understanding or maybe I have not understood well what has already been explained.

Regards,

rmwkwok · December 8, 2022, 2:42am

Hello @Thrasso00,

There are 3 new videos and 2 new optional labs on backpropagation now available in the MLS. Backpropogation is the technique for calculating how each weight in a neural network is updated. Please check out course 2 week 2 for the new materials, or you may click this link to go to the first video.

There is also this video from Andrew which talks about forward and backward propagation in one video.

After you watch the videos, if you want to set up a simple NN and a simple dataset and write down step-by-step how forward and backward propagations are carried out, I can help check your maths.

Cheers,
Raymond

Thrasso00 · December 8, 2022, 11:19am

Hi @rmwkwok,

Thank you for your clarification. I will continue watching the videos and labs this week and see if it is clearer.

Topic		Replies	Views
Week 4, Last assignment / General question Neural Networks and Deep Learning	2	538	December 5, 2021
C2_W2_Relu Lab - "Why Non-Linear Activations?" Advanced Learning Algorithms week-2	3	275	February 26, 2024
Why non-linear activation function Advanced Learning Algorithms week-1	3	484	February 9, 2023
ReLU function as activation function Advanced Learning Algorithms week-2	3	421	July 11, 2023
Better Activation Functions (part 2) MLS Resources	1	166	May 20, 2023

C2_W2_Relu-Activation Lab

First question:

Second question:

Third question:

Related topics