Why use N(0, 1) for Lesson 2 - MLE: Linear Regression

William_Chau · June 5, 2023, 3:00pm

Why did the instructor use N(0, 1) to estimate the likelihood of generating the distances the sample points from the the line of best fit? Is there any theoretical reason for that?

lucas.coutinho · June 12, 2023, 4:26pm

Hi @William_Chau,

Welcome to our community!

I think there are two layers in your question. I will consider them here, because they have quite distinct explanations.

The two parts of your question are:

“Why do we assume that the errors have a normal distribution?”
“Why we assume the normal distribution has mean 0 and standard deviation 1?”

First, note that this is an assumption, this is not a natural or canonic choice. However, there are some justifications about this choice.

Addressing the normality of the error, we assume it is normal because of mathematical convenience - for instance, the linear regression solution is quite easy with this assumption. One other thing that justifies the assumption of a normally distributed error is the Central Limit Theorem, as you can think that there are lots of underlying effects that affects the process and the sum of these individual errors will tend to behave like a normal distribution.

The second question, regarding the chosen normal to be N(0,1). This is because in a Linear Regression, we usually standardize the features, so they have mean 0 and standard deviation 1, therefores it is natural to think that, in this new set of coordinates, the errors will also have mean 0 and standard deviation 1.

I hope that this answers your questions.

Thanks,
Lucas

Topic		Replies	Views
C3_W3 (MLE: Linear Regression): Why do we assume points around the line are in normal distribution while calculating likelihood Probability & Statistics for Machine Learning &... week-module-3	2	29	August 4, 2024
Content error in Critical Value video@2:01min Probability & Statistics for Machine Learning &... week-module-4	1	424	July 5, 2023
Content Error in Power of Test Video@ 2:11min Probability & Statistics for Machine Learning &... week-module-4	1	418	July 5, 2023
Why normal distribution with specific stdev in Dense Layer? NLP with Sequence Models week-module-1	3	408	July 30, 2023
Why do we select randn instead of rand and why do we multiply by 0.01? Neural Networks and Deep Learning week-module-3 , coursera-platform	2	302	January 15, 2024

Why use N(0, 1) for Lesson 2 - MLE: Linear Regression

Related topics