A doubt on how exactly do we calculate the tf-idf score for a document. As per the the course slide we calculate the tf for each doc and normalize them. We calculate the idf of each word as mentioned. All good. But my question is whether we multiply the idf of each word to the tf of each (word, doc) pair. But that doesn’t factor in the normalization, right? Do we normalize after multiplying the idf scores? Please help.
TF is normalized by dividing the word count by the total number of words in that document, and then idf checks the rarity in the complete corpus.
then tf-idf score is calculated by tf x idf (please note there is no normalisation at this step.)