Wrong number of "thee"

code pass up to w1_unittest.test_get_count(get_count, word_l).
The tests confirm that “thee” count is 240.
my P( ‘thee’)=240/6116=0.0392. the test says the value should be P(‘thee’) =0.0045 and 0.1267. the value should be not change if the corpus is the same.
anyone ran into this or can shed light on possible reason

the test changes the size of word_l . so it make sense now that the P(‘thee’) changes. Now the question why my code is not addressing this. will dig in.

get_count able to process different corpus in the test cases . gets the the numbers of “thee” correctly in test case.
the problem is get probs. Get the right number of words correctly 6116. but the probability of ‘the’ wrong.
frustrating at this looks easy enough.

phew. figured out. Thankfully did not waste anyone’s time.

1 Like