Want to understand multi head latent self attention (MLSA) in transformer, which course to pick?

I’m trying to write an essay in school about MLSA, which is used in deepseek r1 model. I have basic knowledge of a transformer, but when I look at the research papers, I can not understand :confused:
is there a course on Coursera that talks about MLSA, general MSA or transformer architecture? I think I need to be more prepared.
Thanks for your time!

Deep Learning Specialization and Natural Language Specialization from DeepLearning.AI explain the transformer architechture.

1 Like

Also, there is a free Short Course that discusses how Transformers work.

2 Likes

Thanks for the suggestions! I’ll look it up.

1 Like

Thanks for the link!