In the video “Sample Variance” from Week-3 of the course P & S,
The reason for dividing with n-1 is not mentioned.instead, it is explained as some means to get to the answer by taking two examples.
Can someone please explain the mathematical reason behind this?
Also, in the video MLE: Gaussian Example (from lesson 2 of the same week),
I didn’t understand what these lines wanted to convey.
“Here, Luis is dividing by 2 using the formula for population variance, not the sample variance as you saw in previous lectures. This is because, as mentioned, the population variance is biased, hence to obtain an unbiased estimator, it should be divided by n-1 instead of n. However, the maximum likelihood estimator (MLE) for the variance is biased and equals the population variance.”
someone please help me get grasp of these two things.
The concepts are whether your set of data is the entire population, or if your set of data is a sample from a larger population.
The formulas to compute the variance are slightly different for the two cases.
The key reason the values are different is that if you have a small sample, but you want a very accurate result, you have to use “sampling with replacement”. That is, you pick a member of the sample at random, record its value, then put it back, and repeat this many times. If you didn’t replace the values after you pick them, you will run out of data in the sample.
The data proof runs quite some length, but you can find it in the Wikipedia article on “Variance” if you want the details.
The end result is that you divide by (N) if you are testing the whole population, but you divide by (N-1) if you only have a sample of the population.