NLP Specialization C1 W4 Conclusion Video

In the 4th-week conclusion video, Kaiser mentioned that having more regions (planes as I understand) should result in higher search time. How can this be the case when higher regions will result in fewer data points in each region? My understanding is that we initially perform the locality-sensitive hashing algorithm for all the vectors in the french vector space to tell in which bucket each vector falls and after that, for each query, I compare the query vector to only the P vectors to determine in which bucket to search into and then only compare the query vector to the ones that fall in that bucket. What is my misunderstanding here?

Hi melhamamsy,

It is a matter of the number of hashes to compute and the number of hash collisions. These depend on the number of different randomized hash functions used (i.e. the number of sets of planes).

For an elaborate discussion see this post.

Also note that the text in the assignment states the following:

“Given a vector, you then identify the buckets in all the tables. You can then iterate over the buckets and consider much fewer vectors. The more buckets you use, the more accurate your lookup will be, but also the longer it will take.”

So, it is a matter of how much time it takes to iterate over all relevant buckets.