Hi,
After having seen this optional lab I am a bit lost.
I thought I understood the usefulness of the sigmoid function as a tool to determine the probability of an input being one or not when we saw logistic regression during the previous weeks.
During this week new functions have been explained to us such as the ReLU function. In this lab this function is presented as a tool that allows “models to stitch together linear segments to model complex non-linear functions”.
We use a layer with three units to explain how it works.
I have several questions:
If x is a vector where why is there only one multiplying component?
In previous labs we had seen that each of the units provides a component to the activation vector of the next layer.
In this lab it seems that the operation is different. We are only seeing the first component of the activation vector?
Why is there a 0 at the end of the sum of each unit?
I still have many more questions but I don’t want to make this topic overloaded.
Could someone please help me?
Hi @Thrasso00 Great questions, I will try my best to answer them the best I can.
First question:
I think it’s a matter of notation, if you want to represent the third element at layer 5 you could use a^{[5]}_{3} in this case it’s trying to represent vector X as the only element of the input layer.
Second question:
I didn’t understand the second question, do you mind asking again?
Third question:
The output layer calculates the sum of the weights and biases of the previous layer plus a constant value or intercept. So, in this case, the constant value will be 0.
However, in the following image it seems that the approach is different. Third question
Why the constant value is 0 in this case? Fourth question
In this lab we are the ones who decide the slopes and intercept. We also decide at which point each section starts and ends. But how is the algorithm able to find these values automatically?
Q1. A sample’s x is always a vector, and it can be a vector of one feature, which makes it look scalar. The bottom right equation is the general form of a sample and n features. The upper right equations is a simple case of having only one feature.
Q2. The upper screenshot is for the general case of n features, what the lower one is for the specific case of 1 feature.
Q3. Actually it isn’t just the bias is 0, all the weights are set to 1 as well. However, the slide also said “fixed weights”, and so they are fixed to 1’s and 0.
Q4. By gradient descent. Gradient descent is an optimization algorithm (optimizing for lowest cost) that updates weights and bias bit by bit and towards lower cost. At the lowest cost, the slopes and intercepts should be the same as those you decided.
Yes, at each update, each weight does a gradient descent in the objective to get to a smaller cost.
Let me first ask you a question. In linear regression, gradient descent updates each weight with this formula:
w := w - \alpha\frac{\partial{J}}{\partial{w}}
Linear regression is identical to a neural network in the following config:
only one layer, in which there is only one neuron
use linear as activation
use squared loss as the loss function
Therefore anything about gradient descent that you have learnt in C1 can apply in the above NN. My question is, I think the above formula is sufficient to explain how the weights (in the neuron) is updated by gradient descent, do you agree?
My question is important because this is how we can transit from to a boarder world with existing knowledge.
In course one there were optional labs that showed step by step what the code did.
In week one lab C2_W1_Lab02_CoffeeRoasting_TF you can see that there are three phases in a neural network, model building, model compile/fit and predictions. I don’t think we have seen what exactly happens in the “fit” phase.
Maybe it is not necessary or it is impossible. But showing an example with an input matrix with several rows and columns and layers with few units to see step by step how the gradient descent is applied would be positive in order to understand it better. Maybe I want to go too far in understanding or maybe I have not understood well what has already been explained.
There are 3 new videos and 2 new optional labs on backpropagation now available in the MLS. Backpropogation is the technique for calculating how each weight in a neural network is updated. Please check out course 2 week 2 for the new materials, or you may click this link to go to the first video.
There is also this video from Andrew which talks about forward and backward propagation in one video.
After you watch the videos, if you want to set up a simple NN and a simple dataset and write down step-by-step how forward and backward propagations are carried out, I can help check your maths.