CDF of a random variable X(discrete or continuous) follows uniform distribution

In the final programming assignment of “week 1” of the course probability & statistics for ML and data science, it is stated that,
“In the lectures, you learned that if 𝑋
is a random variable with CDF 𝐹, then 𝐹(𝑋)
follows a uniform distribution between 0 and 1. In other words, the new random variable 𝐹(𝑋) will be uniformly distributed between 0 and 1”.

I had re-watched the video “Sampling from distribution” as per response to
Same doubt posted by other student”.

What I am concerned about and where I am getting confused is the following line
with which the video begins:
“Computers generate random numbers from uniform distribution in the given interval.”

Specifically, Why should we take random numbers that are distributed uniformly?

Reference in the lab/assignment:
The following line is the beginning of the cell " Exercise 1: Uniform Generator" in the assignment.
“The natural first step is to create a function capable of generating random data that comes from the uniform distribution
Also, I might understand this better if someone could highlight the issues that I shall face if I don’t take random numbers from a “uniform distribution”.

Keeping that aside for a while, what’s happening in the video is, instructor takes the random numbers (probabilities between 0 and 1) and finding corresponding ‘x’ values on horizontal axis (which is termed as finding F^-1(Y) in assignment) and finally saying that the values obey the distribution that we take.

Someone please help me understand this. I had no doubts at all until this point.
Thanks.

Hi @Devarapalli_Vamsi great question. When we Take random samples from a distribution we want numbers that obey from that distribution. In others words you want to have numbers that had equal chance of being selected, like the outcome of a dice, but you want to imitate probabilities no outcomes, this is a way to mimic real wolrd behavior, if you count the outcomes of a dice in 6 trials it is unlikely to have the same outcomes in each posibility, but if you have 100000 the caunce of having the same outcome in each posibility increase

I hope this helps

sorry @pastorsoto didn’t get you. I request you to elaborate.

Your question touches on a fundamental concept in statistics and computer simulations: the use of uniformly distributed random numbers as a basis for generating random samples from other distributions.

Firstly, the reason we often start with uniformly distributed random numbers is due to the way computers generate randomness. Most computer-generated random numbers are actually pseudorandom, meaning they are produced by deterministic algorithms. These algorithms are designed to produce numbers that behave as if they are random. For practical purposes, these numbers are generated uniformly across a specific range, typically between 0 and 1.

Now, why is this uniform distribution useful? It’s because of its simplicity and the fact that it covers the entire range of probabilities (from 0 to 1). This uniformity allows us to map these random values to any other probability distribution using various techniques.

Regarding the specific concept you mentioned, the Cumulative Distribution Function (CDF), let’s delve a bit deeper:

  • The CDF, 𝐹(𝑋), of a random variable 𝑋, gives the probability that 𝑋 will take a value less than or equal to 𝑥. It’s a function that grows from 0 to 1 as 𝑥 moves across the range of possible values.

  • If you take a random variable 𝑌 that is uniformly distributed between 0 and 1, and apply the inverse CDF of 𝑋 to it (denoted as 𝐹^(-1)(𝑌)), you effectively transform 𝑌 into a random variable that follows the distribution of 𝑋.

This transformation is crucial in statistical simulations. It allows us to generate random samples from any probability distribution, given that we can generate uniformly distributed random numbers and we know the CDF of the target distribution.

If you don’t start with a uniform distribution, the transformation becomes much more complex and less general. For example, if you start with a distribution that is not uniform, you would first need to understand and mathematically characterize that distribution before you could map it to another distribution. This is often impractical or impossible with arbitrary non-uniform distributions.

In summary, starting with uniformly distributed random numbers is a matter of practicality and simplicity. It allows for a standardized way to simulate random samples from virtually any probability distribution, which is a cornerstone technique in statistics, machine learning, and data science.

Let me know if this helps

2 Likes

Thanks for your lucid explanation @pastorsoto, your response cleared my doubt and It’s crystal clear now. thanks again.

1 Like

Hey @Deepti_Prasad ,thanks for your additional example oriented explanation!!.
Got a good clarity now.
And yeah, you didn’t confuse me. :blush:

1 Like

Attaching the response that I got in stack exchange. Thought this might be helpful for the community.

I had explained the same way :slight_smile:

Happy that you understood the doubt you had.

Happy Learning!!!

Regards
DP