Week 2 - Logistic Regression Model

Ketyeth · February 26, 2022, 10:03pm

I need some help to understand y (hat) = sigma * (w (T) * x + b) model. What does w transpose mean? I understand x is a feature vector, why do we use w transpose?

Thanks!

paulinpaloalto · February 26, 2022, 11:49pm

The transpose is needed because Prof Ng chooses the convention that all standalone vectors are formatted as column vectors. So both the weights vector w and the input vector x are n_x x 1 column vectors, where n_x is the number of features (elements) in each input vector.

Then the key is that we need to implement the following mathematical formulas in two steps. First there is the linear combination of w, x and the bias b:

z = \displaystyle \sum_{i = 1}^{n_x} (w_i * x_i) + b

Then we apply the non-linear sigmoid activation function to get the final output of logistic regression:

\hat{y} = \sigma(z)

So then the question is how to express that first linear combination (really “affine” transformation) to compute z using vector operations for efficiency. The easiest way is to write that sum of the products formula as a dot product:

z = w^T \cdot x + b

The way dot products work is that the inner dimensions need to agree. Both w and x are n_x x 1, so if we transpose w we have w^T is 1 x n_x vector. If you then dot 1 x n_x with n_x x 1, you end up with a 1 x 1 or scalar result, which is what we want. If you think about what “dot product” means, it is exactly that sum of the products of w_i * x_i for each pair of elements in the two vectors that is shown in the math formula above. But we need the transpose in order for the operation to work when the vectors have those dimensions.

But notice that then we can take one more step in vectorizing by concatenating m input x vectors to make an input matrix X which is now n_x x m (one column for each sample). Now you can compute all the individual \hat{y} values at once by doing this:

Z = w^T \cdot X + b

So we have 1 x n_x dot n_x x m, which gives us a 1 x m output. Then we get:

\hat{Y} = \sigma(Z)

paulinpaloalto · February 26, 2022, 11:53pm

There is one other thing worth saying here: Prof Ng first shows us Logistic Regression with the idea that you can consider it to be a “trivial” Neural Network that only has an output layer. Next week, he’ll show us how to add more layers to get a real Neural Network. In that case, the weights become matrices W^{[l]} with dimensions n^{[l]} x n^{[l-1]}, where n^{[l]} is the number of output neurons in layer l of the network. In that case, he gets to define the format of the W matrices and for simplicity chooses to orient them such that the transpose is no longer required.

Ketyeth · February 27, 2022, 2:09am

Thank you! So is b in Z (capitalized) formula a 1xm row vector?

paulinpaloalto · February 27, 2022, 2:52am

No, b (the bias term) is always a scalar in Logistic Regression. That will no longer be true once we get to real Neural Networks in Week 3. Adding a scalar to a 1 x m row vector simply adds the same value to each element of the vector. This is a trivial example of what is called “broadcasting” in numpy. Here’s a thread which gives examples of that.

Ketyeth · February 27, 2022, 3:18am

if a scalar is added to a matrices calculation in an equation, does the scalar need to be added m times?

paulinpaloalto · February 27, 2022, 3:44am

The meaning of adding (or subtracting or multiply or dividing) a scalar to a matrix or vector is that you perform the operation “elementwise”. The result is a matrix or vector of the same shape with the scalar value added (or whatever the operation is) to each element of the original matrix or vector.

paulinpaloalto · February 27, 2022, 4:12am

Python is an interactive language. You don’t have to wonder what something does: you can try it and watch what happens.

np.random.seed(42)
A = np.random.rand(3,4)
print("A = " + str(A))
b = 1.
print("b = " + str(b))
C = A + b
print("C = " + str(C))
b = -2.
print("b = " + str(b))
D = A * b
print("D = " + str(D))

Running that gives this result:

A = [[0.37454012 0.95071431 0.73199394 0.59865848]
 [0.15601864 0.15599452 0.05808361 0.86617615]
 [0.60111501 0.70807258 0.02058449 0.96990985]]
b = 1.0
C = [[1.37454012 1.95071431 1.73199394 1.59865848]
 [1.15601864 1.15599452 1.05808361 1.86617615]
 [1.60111501 1.70807258 1.02058449 1.96990985]]
b = -2.0
D = [[-0.74908024 -1.90142861 -1.46398788 -1.19731697]
 [-0.31203728 -0.31198904 -0.11616722 -1.73235229]
 [-1.20223002 -1.41614516 -0.04116899 -1.9398197 ]]

Ketyeth · February 27, 2022, 5:55pm

Hi Paul, thank you for the example. It’s very helpful!

Topic		Replies	Views
Transpose of the weight matrix Neural Networks and Deep Learning	6	992	August 9, 2021
Week 2 :Explanation of Logistic Regression Cost function Neural Networks and Deep Learning	2	629	October 18, 2022
Question regarding dimensions of w in logistic regression Neural Networks and Deep Learning	3	332	October 13, 2023
Logistic_Regression_with_a_Neural_Network_mindset What is T Neural Networks and Deep Learning week-2	3	212	April 15, 2024
Cant understand a matrix Neural Networks and Deep Learning	5	1223	March 8, 2024

Week 2 - Logistic Regression Model

Related topics