What even is a PDF Week # 1

Learning about distributions has never come easy to me. I fully understand a PMF. But when it comes to a PDF, here are my questions:

  1. I understand that on the y axis, we have probability density rather than probability. But even is a probability density? If the graph has more height somewhere meaning it has more density, what does it mean?

and then working on this my 2nd question is:

  1. Why is the area under PDF equal to 1. I know the area represents the probability but why? I suppose what I am asking is that what is the relationship between area(probability), density, and the values of X(the Random variable)

Hi @i200660_Mirza_Ubaidu

Think of probability density as a measure of how likely it is for a random variable to take on a particular value within a given range.

When the graph of a PDF has more height at a certain point, it shows that the probability of the random variable falling within a small interval around that point is higher compared to other points. However, the height of the PDF alone does not represent the probability. Instead, it represents the relative likelihood of observing different values of the random variable. The actual probability is obtained by considering the area under the curve.

The total area under the PDF curve represents the total probability of all possible outcomes, which must sum to 1. This concept is similar to the idea that the sum of probabilities of all possible outcomes in a probability distribution must equal 1. In the case of continuous random variables, since there are infinitely many possible outcomes within any interval, we use integration to calculate the total probability.

The relationship between the area, density, and the values of the random variable is such that the density function assigns higher density (height) to regions where the probability of observing the random variable is higher, and the total probability (area) under the curve is normalized to 1.


Hope this helps, if you have any further questions feel free to ask.

2 Likes

Hello @i200660_Mirza_Ubaidu,

Besides @Alireza_Saei’s excellent explanation, let me also share a slightly different view of that.

We can use probability for discrete variables, but we can only use probability density for continuous variables.

For discrete variables like “coin’s side”, we can list all possible sides - head and tail, and assign a probability value to each of them. However, we can’t do the same for continuous variables like “height”, because we can’t list all possible heights - there are just infinitely many heights.

If you think further on the height example, how would you assign a probability value for the height of exactly 1.712000 meter? Between 1.712000m and 1.713000m, there are infinitely many height values, how could we assign a probability value to each of them? We know there should be some people who have those heights, because anyone who will be taller than 1.713m in the future could be having one of those heights now.

Therefore, we take it in a different way. Instead of asking the probability of each height value, we ask for the probability of being in an interval of height.

For example, we say 10% people in the interval of 1.712m → 1.713m. Now, instead of presenting an infinite number of probability values, we present just one.

However, there is always finer structure within the interval, which means that if the interval is too large, it loses some precision in describing the variation within the interval:

So, what is the best size for the interval? We go finer. We go smaller. We approach to a zero size.

We approach to zero, but it is still non-zero. In other words, we expect intervals like this:

1.7120(…many zeros…)00 → 1.7120(…many zeros…)01 = 0.0(…many zeros…)01%
1.7120(…many zeros…)01 → 1.7120(…many zeros…)02 = 0.0(…many zeros…)02%


1.7129(…many nines…)01 → 1.7130(…many zeros…)00 = 0.0(…many zeros…)005%

You see, the closer to zero the interval approaches, the smaller the probability value it is, because each interval covers less counts. This starts to make the probability values not comprehensible.

How could we make it more comprehensible? We “normalize” each interval by its size! In other words, we divide the probability values by the size of each interval – a very small number divided by another very smaller number gives us a large number. For example: \frac{0.0(...\text{many zeros}...)01%}{1.7120(...\text{many zeros}...)01 - 1.7120(...\text{many zeros}...)00} = \frac{0.0(...\text{many zeros}...)01%}{0.0000(...\text{many zeros}...)01} = \frac{\text{probability in the interval}}{\text{interval size}} = 10000. The final value 10000 depends on actually how many zeros are there in the numerator and the denominator, so this 10000 is just an example.

Now, this 10000 is more comprehensible than 0.0(...\text{many zeros}...)01\% .

Just like you get the population density when you divide the population of a city by the city’s area, you get the probability density when you divide the probability of a interval by the interval’s size.

In conclusion, probabiliy is not feasible for continuous variable, but probabilities of close-to-zero intervals are not comprehensible, we therefore turn to probability density which is properly normalized to get a more comprehensible description!

More density → More probability in the corresponding interval. Since all intervals have the same close-to-zero size, when you compare the densities, you are effectively comparing the probabilities.

From the above, we know that probability density is probability in a close-to-zero interval divided by the interval’s size, so in reverse, we have

\text{Area} = \text{probability density} \times \text{interval size} = \text{probability in the interval}.

This relates area back to probability.

If you sum the areas of all intervals up, you get the total probability which is 1.

Cheers,
Raymond

2 Likes