Deep Learning questions

ChuckWalsh · September 11, 2022, 5:04pm

Week 2 has z=w(T) x+b (see notes on Logistic Regression Cost Function). This is also the case for Week 3 Neural Networks Overview)

However, Week 4 shows z=Wx+b (See Vectorized Implementation and other places in week 4)

Why the difference?

ghaith_meftah · September 17, 2022, 9:15am

The parameters of one neuron in one of the layers models are (wi(T),b).
He generalized the function by vectorizing all wi(T) parameters into a single vector W.
The vector W contains all of the parameters wi(T) of a single layer.

W=(w1(T), w2(T), w3(T),… wn(T), and vectorizing B is unnecessary because B will be equivalent to:

B=(b,b,b…,b), you will add the same b to every neuron wi(T)x+b.
so in general Wx+b generalise all equation of wi(T)*x+b
This is a culture of linear algebra.

ChuckWalsh · September 17, 2022, 11:35am

I understand the vectorizing of the equation. What I do not understand is why we are using W.T in the calculation for the single sample and W for the vectorizing. I would have expected W.T or W be used for both. The focus of the equation is the move from W.T to W.

paulinpaloalto · October 1, 2022, 11:27pm

The definitions of the weights are different in the two cases:

For Logistic Regression, the weights are a vector w with the same dimension as each input sample. It is a choice, but Prof Ng chooses to define all vectors as column vectors. So if we have a vector w of dimension n_x x 1 and a vector x of dimension n_x x 1 and we want to compute:

z = \displaystyle \sum_{i = 1}^{n_x} w_i * x_i + b

as a vector computation, it requires a transpose to get the dot product to work:

z = w^T \cdot x + b

Dotting 1 x n_x with n_x x 1 gives a 1 x 1 or scalar output, which is what we want (for a single sample input).

Once we graduate to real neural networks in Week 3, he gets to redefine things. The weights are now a matrix, because we have a separate weight vector for each output neuron of the layer. He could have chosen to define the W matrix such that a transpose is required, but why make things more messy? Here’s a thread which discusses the portion of the lecture that explains the structure of the W matrix in Week 3. With Prof Ng’s new definition of the weight matrix, the linear activation becomes:

z = W \cdot x + b

for a single input sample x. Note that b, the bias term, is now a vector, not a scalar, with one value per output neuron of the layer.

ChuckWalsh · October 6, 2022, 5:42pm

It sounds a bit arbitrary. I prefer the latter, but why confuse the matter by being inconsistent. Why not use the latter from the start?

Chuck Walsh

paulinpaloalto · October 6, 2022, 8:30pm

I explained the rationale, but you are of course entitled to your own opinion on the subject. You’re right that it’s arbitrary, but notation is always arbitrary. Sorry, but Prof Ng is the teacher here, so he gets to be the arbiter and we just have to read what he says and “deal with it”.

Topic		Replies	Views
Question regarding dimensions of w in logistic regression Neural Networks and Deep Learning	3	332	October 13, 2023
Cant understand a matrix Neural Networks and Deep Learning	5	1219	March 8, 2024
Want to confirm the formula for z Neural Networks and Deep Learning	1	501	April 14, 2022
Week 3, Quiz , question 9 and 10 Neural Networks and Deep Learning	6	759	November 30, 2021
Question regarding week 3 video 3 "computing a nn's output" Neural Networks and Deep Learning week-3	2	17	October 21, 2024

Deep Learning questions

Related topics