I want to ensure I understand the w.T @ X + b equation when doing logistic regression and Neural network calculations.
X sub-1 represents all of the input values for the first set of inputs. Therefore, if I have the sample (3, 64, 64) then X sub-1 will be a vector of (12288, 1).
w sub-1 is to consist of a vector initially of all zeros, being a column vector and this goes for b as well.
The learning rate is applied to incrementally descend to the optima.
With this is mind, we would do the same thing with X sub-2 and so forth.
How are the hidden layers going to be different if the inputs are X1, X2, and X3 for example. Isn’t each hidden layer node going to come away with the same values, since I have the same inputs and starting at the same points for each node?
As a final comment; it would be extremely helpful if the lectures weren’t always so abstract. How about doing some demonstrations along the way? This would help drive the points home and provide the attendee and better understanding of what is happening under the covers.
That’s not in the actual equation. Let’s say I have the following inputs, X1, X2, X3. Three separate inputs; therefore, I have to calculate Z for each one and then eventually A which provides y-hat. For each output I have to separate the results, thus I was using X1 with a subscript. I hope this provides some clarification.
w = the slope or gradient and b = the y-intercept. Are all of these assumptions correct?
Looking at the inputs and the hidden layers it would appear to me that each node in the neural network will all have the same output. Am I missing something?
Let’s go with programming exercise 1. Training examples → 209.
What do you mean by features in each example? How many were in the programming exercise 1? I don’t recall the number of features ever being discussed.
Is that “Logistic Regression with a Neural Network Mindset”?
I ask because your thread has tags for both Week 2 and Week 3, but doesn’t identify which assignment specifically by name. Each week has two assignments.
Does it matter? I’m actually in week 3 and looking at the neural network representation. It shows X1, X2, X3 all as input into each node in the hidden layer. Each node in the hidden layer uses the w.T * X + b and then the sigmoid function. The output from hidden layer 1 for all nodes eventually goes to the hidden layer 2 and eventually output.
Anyhow, how will the outputs from each of the nodes in layer 1 be different from node 1 to node 2, in layer 1 etc? They are all processing the exact same data and features.
For the cat, no-cat exercise what features are being evaluated?
These videos really need to be improved. Just talking at the abstract does not do much to help the student understand the material. You need some examples as you go along.
This is explained in the notebooks in both Week 2 and Week 4. The inputs there are RGB images. Each one has 64 x 64 pixels and each pixel has three color values. So each image is a 64 x 64 x 3 array. Before we can process the images, we must “flatten” or “unroll” them into vectors. If we do that, we end up with vectors with 64 * 64 * 3 = 12288 values, which are the “features”. Each such value is an 8 bit unsigned integer, so it is a number between 0 and 255. Before we use them, we divide the pixel values by 255. to convert them to floating point values between 0 and 1. That turns out to be necessary for our training to work efficiently.
Here’s a thread which talks about how the flatten operation works.
The point there is that the input of each layer is different. The input of the first layer is the actual input data (e.g. images in the case of the “cat” or “not cat” case). In the Logistic Regression case, there is only one layer. But as soon as we get to Week 3 and real Neural Networks, then the input to layer 2 is the output of layer 1. That’s the whole point of having multiple layers in a Neural Network: each one has its own weights and takes the previous inputs and learns how to modify them appropriately and pass them along to the following layers. So each layer adds another function to the network. Prof Ng does discuss this in the lectures in Week 3 and Week 4 in a number of places.