in the videos of the course, it is mentioned that to find the best nearest neighbor approximation, groups of several random planes are tested. but either in the lab and assignment all of the random generated planes pass through the origin(0,0). i.e. the offset is not considered. is this enough for partitioning the whole space? I mean shouldn’t other planes with different values of offset(intercept) be also considered in this algorithm as well?
In the context of NLP, most often the data (embeddings) are centered around 0 and cosine similarity is about the angle of vectors, so the offset should not add additional benefit.
In any case, a good question
thanks for the quick reply . But, I still can’t understand this as I tried to plot several centered vectors so that the total mean is equal to zero. but here to appropriately separate similar vectors, 3 hyperplanes (p1,p2 and p3) are needed and they do not necessarily cross the origin. i.e. zero offset hyperplanes can’t well partition the space in this case
If you had this toy dataset then of course the offset would benefit you a lot (in your case, the three planes ideally separate the four classes (I assume the colors are classes and you use Euclidean distance).
But if we use cosine similarity the two data-points with red arrows are similar:
And the real data look more like (but in more dimensions):
Forgive me for my picture being nowhere near as nice as yours (even the one I ruined) but I tried to paint normally distributed data (the biggest mass of the data lie near the 0 and spreads outward with further dilution).
In context of Locality Sensitive Hashing, having random hyper-planes reduces the amount of compute needed to compare each data point with each other data point. (with each additional data point the amount of compute needed rises exponentially. If we had the 12 data-points we would not have needed the LSH at all.) So slicing the space like that helps the three black arrows (which are similar in cosine similarity) to be only compared efficiently (without worrying about other data-points). We might miss the ones near the boundaries, but for that we have other “universes” (attempts/random planes).
thanks a lot for this elaborate explanation yes, in case of cosine similarity the exact thing that you mentioned applies. the reason why I was a bit confused was thinking about the Euclidean similarity and of course, the slides of the course in which this triangular shaped planes where plotted during the whole explanation around the LSH and it’s algorithm:
and this picture in the last assignment of course one, in which the hyperplanes have offsets:
and when I googled the algorithm I saw the same plots in which there were several planes all with offsets!
Yeah… I know… I was confused too. But don’t worry, it will get more confusing in Course 4 Week 4 when you will learn about random rotations which in principle is the same or similar as random planes in this assignment - so it’s good for you to understand them better now.
oh, thanks for the preparation alert :))) so, get ready for the upcoming questions from me in course 4