Difference between y and y hat

brakenseddik · July 8, 2021, 10:54am

Hello everyone,
I’d like to know each of the terms and the difference between them
w, b, y and y hat=a
Thank you

paulinpaloalto · July 9, 2021, 2:54am

It sounds like you are asking about Logistic Regression in Week 2.

The goal is to make predictions based on input data. In order to train the “model”, we are given a collection (dataset) full of pairs of values: x and y.

x is the input data sample and is formatted as a vector with dimensions n_x x 1, where n_x is the number of “features” or elements in each input vector. For example the image data that we use in the Logistic Regression exercise has 12288 elements because it is “unrolled” from an RGB image that is 64 x 64 x 3 (64 x 64 pixels, each of which has 3 color values).

y is the “label” that corresponds to x. So it is the “correct answer”, which is either 0 (no) or 1 (yes). In our case here, 1 means the image x is a picture of a cat and 0 means x is not a picture of a cat.

Now that we understand the input data, here is how we make a prediction:

We have another vector w of “weights” that is the same size as x. We first perform the “linear combination” of w and x and then we add the bias value b, which is just a scalar:

z = \displaystyle \sum_{i = 1}^{n_x} w_i * x_i + b

That will give us a scalar real number as the output. Then we want to convert that into a number between 0 and 1 that looks like a probability. To do that, we feed the output to the sigmoid function:

\sigma(z) = \displaystyle \frac {1}{1 + e^{-z}}

The final output or prediction of the model is called either a or \hat{y}:

\hat{y} = a = \sigma(z)

The way we interpret \hat{y} is that if it is >= 0.5, then the model is predicting that the input sample is a “yes” (a picture of a cat in our particular case). If \hat{y} < 0.5, then the model is predicting that the input sample is classified as a “no” (not a cat in our example).

The goal is that for a given input x, the \hat{y} is as close as possible to the corresponding correct answer given by the label y for that x. The really big question now that we have defined all that is how do we find the w and b values such that the computations described above give accurate predictions. That is what Training using Back Propagation and Gradient Descent is all about.

The one additional point to make here is that if we express the linear combination formula above as a vector operation, it is the following:

z = w^T \cdot x + b

The reason we need the transpose there is that both w and x are formatted as n_x x 1 column vectors. It is thus necessary to transpose w so that it becomes 1 x n_x in order for the dot product to work. Note that Prof Ng could have chosen to define w as a row vector, but he uses the convention that all standalone vectors are column vectors.

Also please note as a general matter that everything I said above was covered in the lectures, although maybe it was spread out over several lectures. If what I said above does not make sense, you might want to watch the Week 2 lectures again.

Topic		Replies	Views
C1 Week 3 Programming Assignment - Terminology Linear Algebra for Machine Learning and Data Sc... week-2	5	291	December 18, 2023
C1_W3 Non-linear decision boundaries Supervised ML: Regression and Classification week-3	7	274	February 26, 2024
Week 2 Video 2. What does Linear Regression have to do with the classification problem? Neural Networks and Deep Learning	2	511	October 3, 2022
W4 Lec5: Siamese Network with Binary Classification Convolutional Neural Networks	5	399	September 20, 2023
Machine Learning Specialization: Regression and Classification Logistic Regression Decision Boundary Supervised ML: Regression and Classification week-3	32	140	February 12, 2025

Difference between y and y hat

Related topics