Week 2 Video 2. What does Linear Regression have to do with the classification problem?

I am very new to this whole topic, so forgive me if this question is beyond basic, but I am just starting into the videos on Logistic regression. Professor Ng is talking about how, given a vector x, we want to know the probability that the picture is a cat. Then, he pulls out the wTx + b. I am wondering what that has to do with figuring out whether or not the picture is a cat, and why we transposed w, or even where w came from. As you can tell I am pretty confused, any help pointing in the right direction would be a huge help.

Cheers

I think the best idea is just to continue listening to the lectures or perhaps start over and watch them again with the following thoughts in mind:

The w value are the “weights” which are the coefficients that are used to multiply by each element of each input vector (the elements are pixel values from the image in our particular case here). Those “weights” are learned through the process of back propagation (that’s where the “machine learning” happens).

So we compute the output in two steps. First we do a linear transformation that looks exactly like linear regression:

z = w^T \cdot x + b

But then here is the key point about a “binary” (yes/no) classification: we then take the z value that is the “linear output” and feed it to the sigmoid function which converts that value into a number between 0 and 1, so we define that output to be the probability of a “yes” answer. If the sigmoid output is > 0.5, then we interpret that as “yes, there is a cat in the image”. The learning happens by comparing the actual outputs just described to the “labels” on the training data, which are the correct answers for our training images. Then we use “back propagation” to keep adjusting the weight values (w) and the bias value (b) to get better and better answers from our model. Prof Ng explains how learning works through back propagation in the lectures.

The reason that the z formula involves a transpose is that Prof Ng chooses to use the convention that all standalone vectors are defined to be column vectors. So both w and x are vectors of dimension n_x x 1, where n_x is the number of elements in each input vector. If you just write it as a math formula, what we are defining is this:

z = \displaystyle \sum_{i = 1}^{n_x} w_i * x_i + b

When you express that as vector operations with w and x as column vectors, then you end up with:

z = w^T \cdot x + b

w^T will have dimensions 1 x n_x so the dot product output has dimension 1 x 1 or a scalar, which is what we want.

3 Likes

Fantastic. Thanks so much!