Where is the LSHAttention model?

cmosguy · March 26, 2023, 2:32pm

In the C4_W4, we learned the intricacies of implementing a Local Sensitive Hashing (LSH) into the self-attention to save memory. However, what I cannot see is how it was actually used in the ReformerLM.

shape11 = trax.shapes.ShapeDtype((1, 1), dtype=np.int32)

def attention(*args, **kwargs):
    kwargs['predict_mem_len'] = 120  # max length for predictions
    kwargs['predict_drop_len'] = 120  # never drop old stuff
    return tl.SelfAttention(*args, **kwargs)

model = ReformerLM(
    vocab_size=33000,
    n_layers=6,
    mode='predict',
    attention_type=attention,
)

All I see is a tl.SelfAttention. Please help me understand if I may be missing something here.

Thanks!

Elemento · March 27, 2023, 5:57am

Hey @cmosguy,
Indeed, the ReformerLM that is used in the assignment, doesn’t use the LSH Self-Attention, that was implemented in the Ungraded Lab 1: Reformer LSH. However, trax offers different implementations of LSH Self-Attention, which you can easily swap out in the below line of codes (can be found here):

def attention(*args, **kwargs):
    kwargs['predict_mem_len'] = 120  # max length for predictions
    kwargs['predict_drop_len'] = 120  # never drop old stuff
    return tl.SelfAttention(*args, **kwargs)

The only other things you need to change are the hyper-parameters and the pre-trained model (since it might be trained using tl.SelfAttention only). Feel free to do it as a self-exercise, and do share your results with the community. I hope this helps.

Cheers,
Elemento

cmosguy · March 27, 2023, 3:45pm

Thanks @Elemento

I’ll go and try to do this. Is the main reason why the LSH was not used, was because the amount of data chunks is so small it made no sense for the authors of the notebook to implement LSH? The LSH is designed to be able to manage datasets of 1M tokens as an input. This dataset for the input sentences are much smaller. Am I am correct?

Elemento · March 27, 2023, 4:25pm

Hey @cmosguy,
Indeed that could be one of the reasons. Another reason I believe, which is more or less related to what you have mentioned, might be absence of pre-trained models using LSH Self-Attention, which have been trained on smaller datasets like the one used in the assignment. And lastly, perhaps the developers wanted the learners to focus more towards the Reversible Layers, instead of the LSH Self Attention, since it was already discussed in a considerable depth in UGL 1, and hence, they avoided the use of LSH Self Attention. I hope this resolves your query.

Cheers,
Elemento

Topic		Replies	Views
Attention in the assgnment NLP with Attention Models week-4	1	399	July 24, 2023
Locality-sensitive hashing in Reformer model NLP with Attention Models week-4	1	578	May 8, 2022
C4_W4_Assignment NLP with Attention Models week-4	5	493	July 23, 2023
Issue in C4W4 UNQ 4 ReformerLM NLP with Attention Models week-4	4	639	June 18, 2023
Issue in C4 W4 UNQ_C4 NLP with Attention Models week-1	1	618	May 21, 2022

Where is the LSHAttention model?

Related topics