Want to understand multi head latent self attention (MLSA) in transformer, which course to pick?

I’m trying to write an essay in school about MLSA, which is used in deepseek r1 model. I have basic knowledge of a transformer, but when I look at the research papers, I can not understand :confused:
is there a course on Coursera that talks about MLSA, general MSA or transformer architecture? I think I need to be more prepared.
Thanks for your time!

Deep Learning Specialization and Natural Language Specialization from DeepLearning.AI explain the transformer architechture.

Also, there is a free Short Course that discusses how Transformers work.

Thanks for the suggestions! I’ll look it up.

Thanks for the link!