Question about the Bayesian inference and MAP reading

Hi, I have a quick question concerning the calculation of the constant when calculating the posterior. When getting 7 heads out of 10 tosses, why P_X((1,1,1,1,1,1,1,0,0,0)) is 11!/7!3! rather than 10!/7!3!? Similarly, after 5 more tosses with 3 more heads, the constant is said to be 16!/10!5! rather than 15!/10!5!. Please help me figure this out, thanks!

screenshot 1:

screenshot 2:

1 Like

Hey @Eddy_J!

You are correct. In both cases there is one being added for no reason.

@magdalena.bouza, can you take a look at it when you have a chance?


Hi @lucas.coutinho
Why haven’t we included the complete probability of getting 7 heads out of 10 flips.
i.e. 10C7 / 2^10 ?

The expression in the note is correct because the normalizing constant for that beta distribution is B(8, 4) = 7!3!/11!. See video The Beta distribution in 12 minutes! in Serrano.Academy:

In the case of the example for continuous parameter and continuous data, in the section called “Continuing the example”, after grouping the exponents it is shown that the exponents are (theta, 1 - theta) = (10, 5). This implies a beta distribution with normalizing constant B(10 + 1, 5 + 1) = B(11, 6) = 10!5!/16! (using Gamma function). See: 1) Beta distribution - Wikipedia, and, 2) Gamma function - Wikipedia.

@acamachodepass linked video simply mentions Beta distribution but doesn’t explain what’s behind B(a+1,b+1) notation (just mentions it’s the area).

So in the end it doesn’t explain why pX((1,1,1,1,1,1,1,0,0,0)) is substituted by “7!3!/11!” ?

Notation for beta is B(a+1,b+1), and in this reading we have “a!b!/(a+b+1)!”, it’s totally not clear what lead to that.

I’ve asked also ChatGPT4 (which ofc may not be correct and hallucinate), and first it stated there must be an error in article, but when I pushed that “Beta” distribution is most likely involved, then it still remained sceptical:

When thinking about the normalizing constant, you can think of it like accounting for every possible outcome that could happen (in other words, what’s the probability we saw the data regardless?). In the previous case when the prior and the data was discrete, we only had 2 possible options.

Either we picked the the fair coin with probability .75 AND we got 7 heads and 3 tails, OR we picked the biased coin with probability .25 AND got 7 heads and 3 tails. Our normalizing constant in this case would then be

p_X(x) = .5^7(1-0.5)^3•0.75 + .8^7(1-0.8)^3•0.25

Now for our current problem, we have infinitely many options for p (or theta). Our p_X(x) would begin to look something like

p_X(x) = .01^7(1-0.01)^3 + .02^7(1-0.02)^3 .03^7(1-0.03)^3 + .04^7(1-0.04)^3 + infinitely many more of these values for theta

Now when we see infinite sums we can start thinking about integrals. The integral of this would be

Integral from 0 to 1 for θ^7(1-θ)^3 dθ


This gives us 1/1320 which is 7!3!/11!

I don’t know the exact reason or strategy used to get these factorials, but this should give us the same solution. So these numbers are NOT incorrect if I am thinking about this correctly.