Hi All,
I have a question related to a piece of code in C3_W3_Lab_1_Central_Limit_Theorem:
def sample_means...
# Get a sample of the data WITH replacement
sample = np.random.choice(data, size=sample_size)
In the comments it is stressed that “WITH replacement” is very important to ensure independence.
- You take random samples out of the population (the sampling is done with replacement, which means that once you select an element you put it back in the sampling space so you could choose a particular element more than once). This ensures that the independence condition is met.
The way this numpy piece of code works is for example if we have [1,2,3,4,5] then [1, 2, 2] is a possible sample of size 3 where “2” is selected twice. In the lectures wasn’t it all about allowing repeating values in different samples to ensure the independence of the samples?
If I change the code with np.random.choice(…, replace=False) the distribution also looks pretty Gaussian to me (it takes more time to do the sampling).
What am I missing here?
Thanks
Pavel