Hi, I have a quick question concerning the calculation of the constant when calculating the posterior. When getting 7 heads out of 10 tosses, why P_X((1,1,1,1,1,1,1,0,0,0)) is 11!/7!3! rather than 10!/7!3!? Similarly, after 5 more tosses with 3 more heads, the constant is said to be 16!/10!5! rather than 15!/10!5!. Please help me figure this out, thanks!

The expression in the note is correct because the normalizing constant for that beta distribution is B(8, 4) = 7!3!/11!. See video The Beta distribution in 12 minutes! in Serrano.Academy: https://youtu.be/juF3r12nM5A?si=D9kMEX9JMidMg-Gt.

In the case of the example for continuous parameter and continuous data, in the section called â€śContinuing the exampleâ€ť, after grouping the exponents it is shown that the exponents are (theta, 1 - theta) = (10, 5). This implies a beta distribution with normalizing constant B(10 + 1, 5 + 1) = B(11, 6) = 10!5!/16! (using Gamma function). See: 1) Beta distribution - Wikipedia, and, 2) Gamma function - Wikipedia.

@acamachodepass linked video simply mentions Beta distribution but doesnâ€™t explain whatâ€™s behind B(a+1,b+1) notation (just mentions itâ€™s the area).

So in the end it doesnâ€™t explain why pX((1,1,1,1,1,1,1,0,0,0)) is substituted by â€ś7!3!/11!â€ť ?

Notation for beta is B(a+1,b+1), and in this reading we have â€śa!b!/(a+b+1)!â€ť, itâ€™s totally not clear what lead to that.

Iâ€™ve asked also ChatGPT4 (which ofc may not be correct and hallucinate), and first it stated there must be an error in article, but when I pushed that â€śBetaâ€ť distribution is most likely involved, then it still remained sceptical:

When thinking about the normalizing constant, you can think of it like accounting for every possible outcome that could happen (in other words, whatâ€™s the probability we saw the data regardless?). In the previous case when the prior and the data was discrete, we only had 2 possible options.

Either we picked the the fair coin with probability .75 AND we got 7 heads and 3 tails, OR we picked the biased coin with probability .25 AND got 7 heads and 3 tails. Our normalizing constant in this case would then be

Now for our current problem, we have infinitely many options for p (or theta). Our p_X(x) would begin to look something like

p_X(x) = .01^7(1-0.01)^3 + .02^7(1-0.02)^3 .03^7(1-0.03)^3 + .04^7(1-0.04)^3 + infinitely many more of these values for theta

Now when we see infinite sums we can start thinking about integrals. The integral of this would be

Integral from 0 to 1 for Î¸^7(1-Î¸)^3 dÎ¸

This gives us 1/1320 which is 7!3!/11!

I donâ€™t know the exact reason or strategy used to get these factorials, but this should give us the same solution. So these numbers are NOT incorrect if I am thinking about this correctly.