Reading: Bayesian inference and MAP_Query

Yuhan_Chen · November 4, 2023, 11:48am

The term “normalizing constant” is mentioned in the bayesian formula, and i had some questions about this object:

how to calculate it?
why is it exists in the formula?

Thanks for any help or references regarding this topic!

rmwkwok · November 4, 2023, 12:04pm

I don’t mentor this course, so not sure how the course teaches it, but I can try to give some simple responses to your questions.

Considering the discrete case, the formula in the denominator tells you how to compute it.

image1240×441 38.6 KB

We should see that the numerator is just one of all the possible cases that are summed over in the denominator. Numerator is for the case of k=i, and the denominator is the sum of all the cases. Therefore, the denominator is interpreted as normalization, because with it, when you sum P(B=b_i|A) up for all i, it gives you a total probability of one.
It’s the theorem itself, and the theorem can be just about that you can condition P(A, B) in two different ways: P(A, B) = P(B|A)P(A) = P(A |B)P(B) , so you keep only the middle and the right parts, move P(A) to the other side, and you get P(A) in the denominator.

We can base our discussions on my response, but if you want to focus on the lectures, just let us know and I will pass it back to the M4ML mentors.

Cheers,
Raymond

Yuhan_Chen · November 6, 2023, 6:22am

Thanks for your explanation! it does helped me understanding this topic!

taekyo_lee · November 8, 2023, 8:36am

Hi Yuhan,

Let me explain your second question: Why is the normalizing constant there?
To clarify the need for the ‘normalizing constant’ on the right-hand side of Bayes’ theorem, let’s see what happens if we ignore it.

Consider the equation for Bayes’ theorem without the normalizing constant:

Now, let’s take the sigma on both sides of the above equation:

You can easily see that it does not make sense. Why?
On the left side, it’s clear that the sum equals 1 since we’re covering all possible B scenarios that could lead to A. on the right side, if we sum over all the products of P(A|B_i) and P(B_i), we might not get 1. This is because we’re multiplying two probabilities, which will often result in a smaller number. That is why you need a ‘normalizing constant’ on the right-hand side.

Let’s go back to your first question: How to calculate it? The easiest way to explain it (even though not the smartest way) is like this:

I hope this will help you.

Topic		Replies	Views
Question about the Bayesian inference and MAP reading Probability & Statistics for Machine Learning &... week-3	5	426	December 7, 2023
Have I correctly understood the Naïve Bayes' inference formula? NLP with Classification and Vector Spaces week-2	8	246	March 26, 2024
Need help understanding Bayesian inference and MAP Probability & Statistics for Machine Learning &... week-3	2	519	July 6, 2023
W3L2 - structure and content Probability & Statistics for Machine Learning &... week-3	5	441	June 22, 2023
Typo in Reading: Bayesian Inference and MAP Probability & Statistics for Machine Learning &... week-3	1	414	July 12, 2023

Reading: Bayesian inference and MAP_Query

Related topics