Why there isn't a maxsim for the first word?

billyboe · December 20, 2025, 1:25pm

In slide 71 of Module 3, why there isn’t a MaxSim for the first word?

Deepti_Prasad · December 20, 2025, 1:38pm

MaxSim is the maximum similarity score calculated between a query token and all tokens in a separate document within information retrieval models.

The reason there is no MaxSim value for the first text (or token) in a given context is that MaxSim is an asymmetric, comparative metric that requires two distinct sets of tokens (a query and a document) to compare against each other.

MaxSim requires comparison- MaxSim operates by taking a single token from a query text and finding its single most similar token within an entirely separate document text. This process is then repeated for every token in the query, and the maximum similarity scores are summed up to get a final relevance score for the document.

A standalone first token has nothing to compare to - A first text in a token (likely referring to the very first token in a single given input) has no other text to serve as a separate document for comparison.

Lastly, MaxSim is a cross-text metric - The metric is designed to measure the relevance of one piece of text (a document) to another piece of text (a query), not a property of an individual token in isolation within its own document.

Hope this clears your doubt.

Regards

DP

billyboe · December 20, 2025, 2:39pm

Sorry it doesn’t, and honestly sound like a AI generated answer

The reason there is no MaxSim value for the first text (or token) in a given context is that MaxSim is an asymmetric, comparative metric that requires two distinct sets of tokens (a query and a document) to compare against each other.

What does this phrase mean? The reason is the definition of the function?

A standalone first token has nothing to compare to - A first text in a token (likely referring to the very first token in a single given input) has no other text to serve as a separate document for comparison.

Yes it does, the token “The” of the document can be compared with all the other prompt tokens like the other tokens of the document have been compared to

Deepti_Prasad · December 20, 2025, 3:08pm

it can be compared but MaxSim ignores the stop words like the, a, is as it adds noise to text analysis especially in techniques like TF-IDF where weighting scheme gives high scores to words that appear often in a specific document (high TF) but rarely across the entire collection (high IDF). Stop words have low IDF, thus low overall score.

in preprocessing step, before vectorisation (converting text into numbers), a standard step is to filter out predefined list of common stop words.

If the were included, its high frequency might skew the vector’s direction, even if the core meaning (nouns, verbs) is similar, making the similarity score less accurate for topic relevance. By removing the, the vectors align better based on meaningful terms, resulting in a more reliable MaxSim score for content.

billyboe · December 21, 2025, 2:15am

now it makes more sense. Thanks

Topic		Replies	Views
C4_W1 UNQ_C10 1 test failed NLP with Attention Models week-module-1	6	660	November 9, 2022
Week 2 Programming Assignment: Operations on Word Vectors - Debiasing Sequence Models coursera-platform	1	542	August 21, 2021
UNQ_C8 - Rouge similarity NLP with Attention Models week-module-1	2	672	May 21, 2022
ConversationSummaryBufferMemory LangChain for LLM Application Development ai-discussions	0	51	August 27, 2024
Sentence Windows Retrieval - sentence Window 1 has most tokens used instead of Sentence window 5. Why is this? Building and Evaluating Advanced RAG Applications	0	156	April 8, 2024

Why there isn't a maxsim for the first word?

Related topics