NLP Specialization C1 W4 Conclusion Video

melhamamsy · January 27, 2022, 2:19pm

In the 4th-week conclusion video, Kaiser mentioned that having more regions (planes as I understand) should result in higher search time. How can this be the case when higher regions will result in fewer data points in each region? My understanding is that we initially perform the locality-sensitive hashing algorithm for all the vectors in the french vector space to tell in which bucket each vector falls and after that, for each query, I compare the query vector to only the P vectors to determine in which bucket to search into and then only compare the query vector to the ones that fall in that bucket. What is my misunderstanding here?

reinoudbosch · May 29, 2022, 2:36am

Hi melhamamsy,

It is a matter of the number of hashes to compute and the number of hash collisions. These depend on the number of different randomized hash functions used (i.e. the number of sets of planes).

For an elaborate discussion see this post.

Also note that the text in the assignment states the following:

“Given a vector, you then identify the buckets in all the tables. You can then iterate over the buckets and consider much fewer vectors. The more buckets you use, the more accurate your lookup will be, but also the longer it will take.”

So, it is a matter of how much time it takes to iterate over all relevant buckets.

Topic		Replies	Views
Question on locality-sensitive hashing NLP with Classification and Vector Spaces week-4	17	423	January 23, 2024
C1 W4 quiz, question 10 NLP with Classification and Vector Spaces week-4	3	589	July 19, 2024
C1_W4_Assignment 3.6 Creating all hash tables NLP with Classification and Vector Spaces week-4	1	553	July 10, 2022
[C1 W4] [Video: Locality sensitive hashing] Possible misprint in the Question part NLP with Classification and Vector Spaces week-4	2	541	July 11, 2022
Efficient Semantic Search in a vector space LangChain: Chat with Your Data	0	82	July 13, 2023

NLP Specialization C1 W4 Conclusion Video

Related topics