# w.T @ X + b clarification

I want to ensure I understand the w.T @ X + b equation when doing logistic regression and Neural network calculations.

1. X sub-1 represents all of the input values for the first set of inputs. Therefore, if I have the sample (3, 64, 64) then X sub-1 will be a vector of (12288, 1).

2. w sub-1 is to consist of a vector initially of all zeros, being a column vector and this goes for b as well.

3. The learning rate is applied to incrementally descend to the optima.

With this is mind, we would do the same thing with X sub-2 and so forth.

How are the hidden layers going to be different if the inputs are X1, X2, and X3 for example. Isnâ€™t each hidden layer node going to come away with the same values, since I have the same inputs and starting at the same points for each node?

As a final comment; it would be extremely helpful if the lectures werenâ€™t always so abstract. How about doing some demonstrations along the way? This would help drive the points home and provide the attendee and better understanding of what is happening under the covers.

What do you mean by â€śsub-1â€ť?

X sub-1 is short for X subscript 1

But thereâ€™s no subscripts in the (x.T @ X) + b equation.

Thatâ€™s not in the actual equation. Letâ€™s say I have the following inputs, X1, X2, X3. Three separate inputs; therefore, I have to calculate Z for each one and then eventually A which provides y-hat. For each output I have to separate the results, thus I was using X1 with a subscript. I hope this provides some clarification.

w = the slope or gradient and b = the y-intercept. Are all of these assumptions correct?

Looking at the inputs and the hidden layers it would appear to me that each node in the neural network will all have the same output. Am I missing something?

Are your X1, X2, and X3 individual features? Or are they separate examples?

Simplifying questions (avoiding notation confusion):

• How many features are in each example?
• How many examples do you have?

Letâ€™s go with programming exercise 1. Training examples â†’ 209.

What do you mean by features in each example? How many were in the programming exercise 1? I donâ€™t recall the number of features ever being discussed.

Is that â€śLogistic Regression with a Neural Network Mindsetâ€ť?

I ask because your thread has tags for both Week 2 and Week 3, but doesnâ€™t identify which assignment specifically by name. Each week has two assignments.

Does it matter? Iâ€™m actually in week 3 and looking at the neural network representation. It shows X1, X2, X3 all as input into each node in the hidden layer. Each node in the hidden layer uses the w.T * X + b and then the sigmoid function. The output from hidden layer 1 for all nodes eventually goes to the hidden layer 2 and eventually output.

Anyhow, how will the outputs from each of the nodes in layer 1 be different from node 1 to node 2, in layer 1 etc? They are all processing the exact same data and features.

For the cat, no-cat exercise what features are being evaluated?

These videos really need to be improved. Just talking at the abstract does not do much to help the student understand the material. You need some examples as you go along.

This is explained in the notebooks in both Week 2 and Week 4. The inputs there are RGB images. Each one has 64 x 64 pixels and each pixel has three color values. So each image is a 64 x 64 x 3 array. Before we can process the images, we must â€śflattenâ€ť or â€śunrollâ€ť them into vectors. If we do that, we end up with vectors with 64 * 64 * 3 = 12288 values, which are the â€śfeaturesâ€ť. Each such value is an 8 bit unsigned integer, so it is a number between 0 and 255. Before we use them, we divide the pixel values by 255. to convert them to floating point values between 0 and 1. That turns out to be necessary for our training to work efficiently.