Reading: Bayesian inference and MAP_Query

The term “normalizing constant” is mentioned in the bayesian formula, and i had some questions about this object:

  1. how to calculate it?
  2. why is it exists in the formula?

Thanks for any help or references regarding this topic!

Hello @Yuhan_Chen,

I don’t mentor this course, so not sure how the course teaches it, but I can try to give some simple responses to your questions.

  1. Considering the discrete case, the formula in the denominator tells you how to compute it.


    We should see that the numerator is just one of all the possible cases that are summed over in the denominator. Numerator is for the case of k=i, and the denominator is the sum of all the cases. Therefore, the denominator is interpreted as normalization, because with it, when you sum P(B=b_i|A) up for all i, it gives you a total probability of one.

  2. It’s the theorem itself, and the theorem can be just about that you can condition P(A, B) in two different ways: P(A, B) = P(B|A)P(A) = P(A |B)P(B) , so you keep only the middle and the right parts, move P(A) to the other side, and you get P(A) in the denominator.

We can base our discussions on my response, but if you want to focus on the lectures, just let us know and I will pass it back to the M4ML mentors.

Cheers,
Raymond

Thanks for your explanation! it does helped me understanding this topic!

Hi Yuhan,

Let me explain your second question: Why is the normalizing constant there?
To clarify the need for the ‘normalizing constant’ on the right-hand side of Bayes’ theorem, let’s see what happens if we ignore it.

Consider the equation for Bayes’ theorem without the normalizing constant:
image

Now, let’s take the sigma on both sides of the above equation:
image

You can easily see that it does not make sense. Why?
On the left side, it’s clear that the sum equals 1 since we’re covering all possible B scenarios that could lead to A. on the right side, if we sum over all the products of P(A|B_i) and P(B_i), we might not get 1. This is because we’re multiplying two probabilities, which will often result in a smaller number. That is why you need a ‘normalizing constant’ on the right-hand side.

Let’s go back to your first question: How to calculate it? The easiest way to explain it (even though not the smartest way) is like this:

I hope this will help you.