Hey guys,
I just wanted to confirm if my understanding of the lab is correct or not, and I also wanted to ask an query regarding the CLT theorem.

I believe that the lab demonstrates us how the CLT theorem holds true for several distributions, but not for every distribution, and under what conditions it holds true. Would that be correct?

Now, let’s come to the query. Let’s say we don’t know the original distribution at all, and we use the CLT theorem:

In order to know whether the CLT theorem holds true or not, we can always rely on the QQ plot (for the sampling distribution of sample means), am I correct to assume this?

How exactly do we get to know anything about the original distribution, from the sampling distribution of sample means (SDSM)?

Let’s say I plot the SDSM, and compute it’s mean & standard deviation from the plot. But since I don’t know what kind of distribution the population follows, how do I know which transformation formulae should I apply, to get the means and standard deviation of the population?

For instance, it could be sigma_x_bar = sigma / sq_root(n), but it could also be sigma_x_bar = sq_root(sigma). So, how exactly am I supposed to use the CLT theorem?

Hi Elemento, I’m a stats noob so please be critical when reading my reply.

In order to know whether the CLT theorem holds true or not, we can always rely on the QQ plot (for the sampling distribution of sample means), am I correct to assume this?

I think it is correct to assume this. The QQ plot’s purpose is to examine the distribution of the data relative to guassian distribution. Therefore, it would be a good tool to test the CLT, which assumes that the SDSM will be guassian in nature.

How exactly do we get to know anything about the original distribution, from the sampling distribution of sample means (SDSM)?

I believe that the CLT is strictly limited to only infer qualities about the means of the data and therefore cannot tell us anything about the original distribution and so on. The same way that knowing just the average income of a country does not enable one to calculate its lowest/highest incomes, quantiles, and so on.

Let’s say I plot the SDSM, and compute it’s mean & standard deviation from the plot. But since I don’t know what kind of distribution the population follows, how do I know which transformation formulae should I apply, to get the means and standard deviation of the population?

Would you mind elaborating on the population term in this question? Shouldn’t the SDSM follow a guassian distribution with standard μ and σ, according to the CLT?

So, how exactly am I supposed to use the CLT theorem?

I also had this exact question, and the CLT appear to not have a purpose other than stating the fact that the SDSM has a guassian distribution.

Hey @Tom_Pham,
Thanks a lot for your detailed answer.

Good to know about this.

I would like to disagree with this. Consider this example. Let’s say we know that the population follows a Gaussian Distribution, or let’s say that it’s safe to assume that the populations follows a Gaussian Distribution, but we don’t know the parameters, and we want to estimate those. In this case, we know the relationship between the means & variances of SDSM and the population distribution. So, we can compute the mean & variance of SDSM, and using the formulae, we can derive the mean & variance of the population distribution.

And if for a gaussian distribution, we have the mean and variance, I believe, we can derive a whole lot of other things as well. However, my question concerns about the scenario, when we don’t know the distribution of the population, or when we can’t assume the same. In that case, how are we supposed to use the CLT?

“Population” refers here to the data (or dataset) from which samples are drawn.

Yes, you are correct. When certain conditions are met, CLT states that the SDSM will follow a gaussian distribution

I believe I have already defined one of the use-cases of the CLT theorem, i.e., when we know which kind of distribution the population follows, and we want to estimate the parameters for the population’s distribution, we can use the CLT theorem to achieve that.

Please do let me know if you have any queries regarding this.

Hey @Tom_Pham,
I can’t even recall if I saw any mentions of these formulae in the lecture videos. However, I did find the same for 3 distributions in the ungraded lab for Week 3. Let me quote them here, for your reference.