Smoothed probability for Naive Bayes


Can someone offer an explanation (preferably proof) as to why the 2nd term in the denominator should be “V_class”, which is the number of unique words in the vocabulary ?

Here’s the equation -
P(w_i/class) = (freq(w_i, class) + 1)/(N_class + V_class)

Any insight would be helpful.


Okay I think I got it. You are essentially adding “1” to every word in the probability table so that just gives an additional value of "V_class " over all the rows.