From the law of large numbers it says that if n is larger the better. However, during lectures, it says that the sample size should be a 20 times fraction from the population to guarantee independence. So, then what is a good n size?

I know that in practice is expensive to have a large n and for that we seek the smallest n possible value, but if collecting data is not a problem then what?

Hello @Jessica84 , your question is excellent and addresses one of the most challenging aspects when starting a statistical study. In the lectures, I understood that the value for the sample size (n) can vary, but it is crucial to keep in mind that the sample must be representative. If collecting the data is not an issue, in principle, we could consult the entire population, but in practice, this rarely happens due to limitations of time, resources, and feasibility.

However, in situations like the example of dice rolling, when we want to determine if a die is biased, we cannot analyze all possible rolls, as this quantity can be infinite. In this case, we resort to the Law of Large Numbers, which allows us to obtain a more precise approximation of the real information as we accumulate more information from rolls.

This sampling approach is an efficient way to deal with the practical impossibility of examining all possibilities while seeking reliable statistical evidence to make decisions or infer about a phenomenon.

The more data is collected, the more confident we are about its statistics. So even in the case of infinite population like @bruno_ramos_martins mentioned, ideally n should be as high as we can justify it to be. In other words, would a further increase in n make sense if the confidence gained in its statistics is miniscule?

Hi Bruno,

In the Reading ‘Two sample test for proportions’ from week 4, within the conditions for the calculations to be satisfied it says the following:

- Each population size needs to be at least 20 times bigger than the sample size. This is necessary to ensure that all samples are independent.

That tells me that getting n as big as I can’t is not desirable. Could I say that my n should not be larger than the 5% (or 10%) of the population size then?

Hi @Tom_Pham ,

In the Reading ‘Two sample test for proportions’ from week 4, within the conditions for the calculations to be satisfied it says the following:

- Each population size needs to be at least 20 times bigger than the sample size. This is necessary to ensure that all samples are independent.

That tells me that getting n as big as I can’t is not desirable. Could I say that my n should not be larger than the 5% (or 10%) of the population size then?

@Jessica84 Now I think I understand your question better.

When we talk about the law of large numbers, using the example of rolling dice, we can observe that in this case, the larger the sample size, the better. This is because we are dealing with a random experiment, and assuming the dice is not biased, collecting more data provides us with more information about the probability distribution.

When conducting a statistical study and aiming to make inferences about a finite population, it is crucial to consider sample independence to obtain reliable results. The guideline that each population should be at least 20 times larger than the sample size is often suggested to ensure this independence.

However, it’s important to note that the specific size of the sample is not rigidly fixed at 5% or any other specific value. The ideal sample size depends on various factors, including the type of study, research objectives, and desired level of precision in estimations.

Selecting an excessively large sample carries the risk of altering the population’s characteristics by overrepresenting specific groups. Thus, the focus should be on ensuring the sample is representative, allowing for precise inferences and avoiding biases in the study.

To ensure sample independence with respect to the population, it is recommended to choose an appropriate sample size that suits the study’s requirements. This way, we can ensure that the results obtained in our statistical analyses will be valid and applicable to the entire population, leading to more robust and reliable conclusions. A representative and independent sample forms a solid foundation for drawing relevant conclusions about the population under study without distorting the results.

1 Like