How cross-encoders can be used to determine relevance?

0Ddumas0 · August 5, 2024, 8:54pm

In the second video, titled Introduction to embedding models at about the 3:00 mark, the discussion is on “algorithms for optimal retrieval”, where the speaker talks about using a Cross-Encoder to determine relevance.

One of the drawbacks is stated as : “requires you to run the classification operation for every text chunk in your dataset, so this doesn’t scale”

This Cross-Encoder approach is then juxtaposed to “sentence embedding models” and an ingestion flow.

The speaker goes on to contrast the “sentence embedding approach” with the “cross-encoding approach”, as though these are competing approaches with efficiency / accuracy tradeoffs.

My question, and what confuses me is how the cross-encoding approach can be done in isolation, as a wholistic solution. What would this look like, without a vector database to retrieve from?

Is the speaker suggesting that what an implementation could do is to
(1) iterate over the source documents / every “text chunk” in your dataset, concatenating: Question, Separator, Answer for every text chunk in the source dataset and computing relevancy
(2) taking top relevancy to send to LLM for inference?

What confuses me is that the cross-encoding is presented as an independent approach to retrieval here, which I’ve never heard of before. I’ve only heard of cross-encoders being used for “re-ranking” (which implies they’ve already been ranked, presumably by something like “sentence embedding models”).

Are there any sources anyone can link where a cross-encoder only is used for retrieval?

Deepti_Prasad · August 6, 2024, 7:22am

While the encoder-decoder architecture can handle sequential data effectively, it struggles with long-range dependencies and may fail to capture relevant information from distant parts of the input sequence. This is where the attention mechanism comes into play.

I don’t know if you have Natural Language processing Specialisation, the approach of cross-encoder would be more from a perspective of having attention mechanism where the attention mechanism is a crucial component that allows the decoder to selectively focus on different parts of the input sequence when generating each output word.

It computes a context vector, which is a weighted sum of the encoder’s hidden state vectors, where the weights are dynamically calculated based on the relevance of each input word to the current decoding step.

This context vector, along with the decoder’s hidden state and the previously generated word, is used to predict the next word in the target sequence.

I am attaching a file which could help one understand how this works.

Transformer.pdf (2.1 MB)

Regards
DP

ofermend · August 9, 2024, 1:06pm

Hi @0Ddumas0
Yes, that is exactly what I meant. You can (hypothetically) store each chunk of text (it won’t be in a vector database per-se, rather just in a text database). Then during retrieval, you could use the cross-encoder with each question/chunk pair and rank it that way, then select the top chunks to send to the LLM for generation.
You are correct to note that “I’ve never heard of this before” because it’s not really something that is practical. I only mention it to explain why we use reranking as a 2nd step and not to replace embedding (it’s too slow and not practical).

Deepti_Prasad · August 10, 2024, 9:01am

Hi @ofermend

I was just going through the course, and I have a query or more like a discussion, don’t you feel the cross-encoder is still dependent on the answer fed by coder to be able to work detecting the right embedding.

for example

what if the
answer fed for place where its name was changed

Like what is capital of Karnataka?

Answer fed:
Bengaluru is the capital of Karnataka
Mysuru is the capital of Karnataka
Bangalore is the capital of Karnataka

So when I run down this code, this was the output: [0.9996363 0.999736 0.9997397]
The most relevant passage is: Bengaluru is the capital of Karnataka.

Although it is totally correct related to the current update, but why could it also include the pretext that Bengaluru was renamed from Bangalore, to be more precise as the model also uses cosine similarity and the Glove Embedding of contextual relevance.

This is a doubt!!!

Thank you in advance

Regards
DP

ofermend · August 11, 2024, 4:17pm

Hey @Deepti_Prasad - apologies I’m a bit confused about the question. Can you please clarify?

Deepti_Prasad · August 11, 2024, 4:44pm

I am sorry I should have explained more in detail about the reason I chose the question.

So before independence the capital of Karnataka used to be Mysore(Mysuru)
Currently the capital of Karnatak is Bengaluru (which was previously called Bangalore)

So the output score again seem to scoring higher for Mysuru and Bangalore and yet choosing Bangalore as best answer, unlike the RAG is not able to understand the difference between is and was** which clearly shows according to the output score it is not able to as the score is higher for Mysuru too.

But cross-encoder doesn’t seem to hold significance provided all the above details, as I still didn’t get the reasoning behind choosing the best answer? If see the output scores, Bangalore seems to be the right answer!!!

I am just trying to make the Model more perfect

A embedding model should be able to correlate the answers and give the best response

Topic		Replies	Views
Crossencoder re-ranking Advanced Retrieval for AI with Chroma	2	580	January 11, 2024
What's the point of an RNN encoder in seq2seq models with attention? Sequence Models	3	563	June 7, 2022
Comparing the models for W2 and W3 NLP with Attention Models week-3	3	390	November 18, 2023
Transformer Architecture NLP with Sequence Models week-4	2	220	May 22, 2024
Video: NMT Model with Attention NLP with Attention Models week-1	5	374	December 21, 2023

How cross-encoders can be used to determine relevance?

Related topics