Why use N(0, 1) for Lesson 2 - MLE: Linear Regression

Why did the instructor use N(0, 1) to estimate the likelihood of generating the distances the sample points from the the line of best fit? Is there any theoretical reason for that?

1 Like

Hi @William_Chau,

Welcome to our community!

I think there are two layers in your question. I will consider them here, because they have quite distinct explanations.

The two parts of your question are:

  1. “Why do we assume that the errors have a normal distribution?”
  2. “Why we assume the normal distribution has mean 0 and standard deviation 1?”

First, note that this is an assumption, this is not a natural or canonic choice. However, there are some justifications about this choice.

Addressing the normality of the error, we assume it is normal because of mathematical convenience - for instance, the linear regression solution is quite easy with this assumption. One other thing that justifies the assumption of a normally distributed error is the Central Limit Theorem, as you can think that there are lots of underlying effects that affects the process and the sum of these individual errors will tend to behave like a normal distribution.

The second question, regarding the chosen normal to be N(0,1). This is because in a Linear Regression, we usually standardize the features, so they have mean 0 and standard deviation 1, therefores it is natural to think that, in this new set of coordinates, the errors will also have mean 0 and standard deviation 1.

I hope that this answers your questions.

Thanks,
Lucas

1 Like