Why we use rot_size//2 as the bucket number?

I kind of confused why we use rot_size//2 as bucket numbers but we define bucket number: rot_size = n_buckets

Each vector will be hashed into a hash table and into rot_size//2 buckets

According to the instructions: Instructions

Step 1 create an array of random normal vectors which will be our hash vectors. Each vector will be hashed into a hash table and into rot_size//2 buckets. We use rot_size//2 to reduce computation. Later in the routine we will form the negative rotations with a simple negation and concatenate to get a full rot_size number of rotations.

  • use fastmath.random.normal and create an array of random vectors of shape (vecs.shape[-1],n_hashes, rot_size//2)

Hi @Amazing_Patrick

As @gent.spah pointed out - to reduce computation. Later in step 3 rotation size becomes full rot_size.
Check this post for more details.

Cheers

when concatenating the negative rotation results, is it because the bucketing assumes similarity using the maximum absolute value and regardless of direction? What does the larger value mean? it means the projection of one vector on the other has the largest length, which indicates similarity?

What exactly does the random normal vectors do? they are performing rotations on the embedding of each Query words? Why hashing the maximum values of after rotating with random vectors allow us to achieve hashing of the entries in the Query, or the Query words?