Lesson 5 using embedding in RAG

Deepti_Prasad · August 10, 2024, 11:03am

I ran down the question-answer model, here is what seemed a bit misleading or more correlative answer

My input

answers = [
“What are the colors in a rainbow?”,
“There are seven colors in rainbow.”,
“Violet-Indigo-Blue-Green-Yellow-Orange-Red are rainbow colors”,
“Indian flag has orange, white and green color”,
“White color signifies peace”
]

question = ‘What are the colors in a rainbow?’

The output I got
[0.4508, 0.4945, 0.4285, 0.416, 0.3439]
Question = What are the colors in a rainbow?
Best answer = There are seven colors in rainbow.

The question stated what are the colors in rainbow and yet the best answer comes number of colors in a rainbow?

is the model more relating the embedding only with colors and rainbow?? words and subwords? why didn’t the model consider the start word or I should say meaning of the whole question?

Regards
DP

Nevermnd · August 10, 2024, 11:42am

@Deepti_Prasad,

I have not taken this class, but I thought, what Deepti is producing a question by herself !?!

Unfortunately I cannot answer it, yet your contributions are always appreciated.

*I feel anyone that can answer her question should consider it a serious, not ‘trivial’ request.

ofermend · August 11, 2024, 4:14pm

Hi @Deepti_Prasad
Is this using the facebook_DPR embedding model?
I just ran this example and I get a different set of scores ([0.6339, 0.7427, 0.6784, 0.5929, 0.4869]). Nevertheless, the best answer is “There are seven colors in rainbow”, so let me address that anyways: embedding models are not perfect and they really depend on the dataset they were trained on. It’s difficult to say why exactly this happened without having visibility into DPR and its training data + training process. In this case the two answers that discuss “colors in a rainbow” are first, second and third and as you can see they all have relatively high scores (0.63 - 0.74 in my run), and the other answers are lower. If the response to “there are seven colors in the rainbow” would get a really high score like 0.95 I would be surprised.

Deepti_Prasad · August 11, 2024, 4:27pm

I cannot confirm this, but it is from lesson 5, I have short-term memory even if I did this assignment yesterday

But as you stated you too got the same best answer as I got, can the reason be because of the priority of sentence used “There are seven colors in a rainbow” being mentioned first?

But the whole meaning of being able to understand contextual meaning still doesn’t signifies priority of mentioning an answer.

I just noticed one thing the score output in mine and your’s as well has the highest score for the right statement i.r.t. Violet-Indigo-Blue-Green-Yellow-Orange-Red are rainbow colors”

So actually RAG is getting context with the right score but is not choosing it as best answer?!!! Probably that is key where it need to be addressed on not prioritising which sentences is mentioned first or second? and get the difference between what and how in a sentence??

I will recheck on the embedding model and update you!!

Thank you for the response.

Regards
DP

Topic		Replies	Views
Course 1 Week 3 Quiz #9 and #10 Neural Networks and Deep Learning	3	612	May 12, 2022
C2_W2_Multiclass_TF 2nd plot Advanced Learning Algorithms week-2	9	262	March 10, 2024
[Week 2] - Embedding and Transfer Learning Sequence Models	6	613	May 24, 2021
Mistake in the quiz at the end of the video "Word embeddings" NLP with Probabilistic Models week-4	3	517	November 12, 2021
Great Course, real bootstrap to LLM chats. One question Advanced Retrieval for AI with Chroma	0	139	January 8, 2024

Lesson 5 using embedding in RAG

Related topics