Question about Neural Network input and output vectors

cwebster · August 1, 2024, 12:43pm

I have completed the MLS and begun week one of the DLS and now have a question going back to this module from week one of the Advanced Learning Algorithms course.

Please tell me if this is correct:

When one feeds a feature vector X, as input to a dense hidden layer with the activation function ReLu, each node will fit a ReLu function through every feature in vector X and provide a prediction. Each of these predictions are organized into a new vector a[1], which is used as input for the next hidden layer, and so on.

If this is correct wouldn’t the output vector a[1], be a vector of length equal to the number of nodes in hidden layer one, and each element in vector a[1] be the same number?

In a dense hidden layer, every feature from vector X, was used in every node therefore every node will produce the same output value in vector a[1]. Specifically elements a[1]_1, a[2]_2 … a[j] are all the same number.

That just feels like it can’t be correct. Can someone help me understand better? Thanks for the help!

TMosh · August 1, 2024, 5:38pm

No, that’s not what a hidden layer does. Predictions are only formed at the output layer. Every unit has an activation value, but only in the output layer is that a prediction.

TMosh · August 1, 2024, 5:40pm

Each pair of adjacent layers in a neural network are connected by a weight matrix.

The size of the weight matrix is {outputs by inputs}, where outputs is the number of units in the next layer, and inputs is the number of units in the previous layer.

The number of units in each hidden layer is independent. You select that as part of designing the model.

The number of units in the input and output layers are determined by the number of features (the input), and the number of outputs (i.e. the number of labels).

cwebster · August 1, 2024, 6:17pm

This is correct, thank you.

cwebster · August 1, 2024, 6:27pm

@TMosh You’ve lost me a bit, here. Allow me to rephrase my question and include a visual reference:

Suppose each hidden layer in this NN diagram is a dense hidden layer with the same activation function, ReLu. Also, suppose that vector X, has 100 features.

Please tell me if these statements are correct:

Vector a[1] is the output of hidden layer 1 and will have length 25. Correct.
Vector a[2] is the output of hidden layer 2 and will have length 15. Correct.
Elements a[1]_1, a[1]_2, … a[1]_25 are all the same value. Incorrect.
Elements a[2]_1, a[2]_2, … a[2]_15 are all the same value. Incorrect.

Thanks for your patience and willingness to help me out.

TMosh · August 1, 2024, 6:30pm

That’s a really bad diagram, and I wish they’d replace it with something better.

I’ll see if I can find a better one and post it here.

Your notes 3 and 4 are incorrect. Every unit has a unique output value. Otherwise there’s no point in having multiple units per layer.

cwebster · August 1, 2024, 6:33pm

Thank you! That is exactly my point, it makes no sense to have multiple layers if points 3 and 4 are correct.

Can you explain in more detail how the hidden layers compute the output vectors a[1] and a[2]?

TMosh · August 1, 2024, 6:34pm

Here’s a simple depiction of the connections in a neural network. This one is from the “coffee roasting” lab, I’m not sure which course has that.

This NN has two input units, three hidden layer units, and one output.
The units are the circles, every straight line represents a weight value.

The size of W1 is (3 x 2).
The size of W2 is (1 x 3)

TMosh · August 1, 2024, 6:39pm

The weight (and bias) values are learned which minimize the cost at the output layer. The method is called “backpropagation of errors”. The math involves a lot of calculus, which most ML courses assume is already proven and taken as fact.

This diagram doesn’t show the bias value that is included with each unit.

Each unit computes the sum of the product of its weight and the input values, plus a bias, with some activation g(…) applied. In vector algebra, it gives you the whole result as a vector for each example.

For the hidden layer A1, it’s A1 = g(W1 * X + b).

So what you actually get at A1 is a matrix of size (m x 3), since ‘m’ is the number of examples (rows) in the training set X.

A similar process happens to compute A2.

TMosh · August 1, 2024, 6:42pm

At the output layer, there will be an additional process to compute predictions. For example, if you’re doing classification, then A2 will be turned into logical true/false values by applying some threshold operation >= 0.5. This happens because sigmoid() has a range of 0 (False) to 1 (True), and 0.5 is the boundary right in the middle.

cwebster · August 1, 2024, 6:46pm

This is helpful. Thank you for taking the time to explain this to me.

Can you point me to a good resource that I could use to read more myself? Beyond the MLS and DLS, which I’m already working through myself.

TMosh · August 1, 2024, 6:47pm

Sorry, I don’t have any other references.

Wendy · August 2, 2024, 5:55pm

@cwebster, one more key point is that the weights and biases need to be randomly initialized before training. Otherwise, if they are all the same and the inputs are the same, then each node in a layer will learn the same thing. It sounds like this may be the missing piece that was puzzling you.

cwebster · August 11, 2024, 10:24pm

This is the key that I miss understood! Thank you for pointing that out.

For anyone else, this is the video that answers my question: https://www.coursera.org/learn/neural-networks-deep-learning/lecture/XtFPI/random-initialization

Topic		Replies	Views
The output of a neural network layer Advanced Learning Algorithms week-module-1	8	534	September 28, 2023
Course 2-Advanced Learning Algorithms: Question about the y-vector in neural networks Advanced Learning Algorithms week-module-2	5	605	July 27, 2022
W 3 \| What if neural network layers were applied to Week 2 coding assignment Neural Networks and Deep Learning coursera-platform	6	649	October 18, 2022
Understanding how neural network work Neural Networks and Deep Learning coursera-platform	8	574	May 28, 2023
W3_Computing a Neural Network's Output Neural Networks and Deep Learning coursera-platform	1	491	April 10, 2023

Question about Neural Network input and output vectors

Related topics