Does exercise actually verify no duplicates within batches?

The lectures (e.g. computing-the-cost-i) say that the batches should be constructed with no duplicate questions in different rows. In the lab notebook, after constructing the train and test words, the markdown says that q1i and q2k are duplicates if and only if i = k. I see how the construction guarantees duplicate questions for i = k, but I don’t see any code which actually ensures that there are no duplicates if i != k. Am I missing something, or are we simply assuming that it is unlikely to have two pairs of duplicate questions which are also duplicates of each other (at least in the same batch)?

Hi @David_Fox

In this lecture the duplicate question is considered a question with the same meaning and not the literal duplicate (exact copy). For example, “How are you?” and “How are you doing?” should be duplicate questions and checking with code for i != k would not help much.

But, the dataset should not have duplicate questions in different rows.