# Lesson 5 using embedding in RAG

I ran down the question-answer model, here is what seemed a bit misleading or more correlative answer

My input

“What are the colors in a rainbow?”,
“There are seven colors in rainbow.”,
“Violet-Indigo-Blue-Green-Yellow-Orange-Red are rainbow colors”,
“Indian flag has orange, white and green color”,
“White color signifies peace”
]

question = ‘What are the colors in a rainbow?’

The output I got
[0.4508, 0.4945, 0.4285, 0.416, 0.3439]
Question = What are the colors in a rainbow?
Best answer = There are seven colors in rainbow.

The question stated what are the colors in rainbow and yet the best answer comes number of colors in a rainbow?

is the model more relating the embedding only with colors and rainbow?? words and subwords? why didn’t the model consider the start word or I should say meaning of the whole question?

Regards
DP

I have not taken this class, but I thought, what Deepti is producing a question by herself !?!

*I feel anyone that can answer her question should consider it a serious, not ‘trivial’ request.

Is this using the facebook_DPR embedding model?
I just ran this example and I get a different set of scores ([0.6339, 0.7427, 0.6784, 0.5929, 0.4869]). Nevertheless, the best answer is “There are seven colors in rainbow”, so let me address that anyways: embedding models are not perfect and they really depend on the dataset they were trained on. It’s difficult to say why exactly this happened without having visibility into DPR and its training data + training process. In this case the two answers that discuss “colors in a rainbow” are first, second and third and as you can see they all have relatively high scores (0.63 - 0.74 in my run), and the other answers are lower. If the response to “there are seven colors in the rainbow” would get a really high score like 0.95 I would be surprised.

I cannot confirm this, but it is from lesson 5, I have short-term memory even if I did this assignment yesterday

But as you stated you too got the same best answer as I got, can the reason be because of the priority of sentence used “There are seven colors in a rainbow” being mentioned first?

But the whole meaning of being able to understand contextual meaning still doesn’t signifies priority of mentioning an answer.

I just noticed one thing the score output in mine and your’s as well has the highest score for the right statement i.r.t. Violet-Indigo-Blue-Green-Yellow-Orange-Red are rainbow colors”

So actually RAG is getting context with the right score but is not choosing it as best answer?!!! Probably that is key where it need to be addressed on not prioritising which sentences is mentioned first or second? and get the difference between what and how in a sentence??

I will recheck on the embedding model and update you!!

Thank you for the response.

Regards
DP