[CS229] probability interpretation of Linear Regression

Full Screenshot:

how to get the part that is circled by red below? 1. there is “This implies that” on the note but i dont quite get how it works.

It is from the page 16 of the CS229 main note.

But the steps are already on the notes, what don’t you get?


(Follow my steps 1, 2, 3, and 4)

Hi Raymond, thanks for your reply. Let me clarify a bit, the point i actually dont get why is it converted to $$p(y^{(i)}|x^{(i)};\theta)$$.
p(epsilon of i)

Because we have replaced epsilon with y, x, and theta. What is the problem? Perhaps you can try to explain something you have got from that? Maybe then I can make some comments or understand you a bit better?

CS229 is not a DLAI course.

I know it is replacing epsilon with y^(i)-theta^T x^(i), but why in the format of p(y(i) | x(i) ; theta) ?

Like it can be p(y(i),x(i) ; theta), why does y^(i)-theta^T x^(i) give this particular form of the probability which is P of y given x and theta?

Hello @WONG_Lik_Hang_Kenny

Because you can evaluate the probability of y given x, and the equation is parameterized by theta. However, you probably won’t be satisfied with my above answer though I am trying to make it very straightforward.

Put it this way, y is probabilistic because of epsilon, and the normal distribution assumption is for epsilon and, as a result, “transferred” to y due to y = thetaT x + epsilon. x is considered given, and we have not made any assumption on the probabilistic nature of x itself (e.g. we have not assumed any probability distribution on x), so that equation will not evaluate the probability of x, not the joint probability of x and y, but only y.


1 Like