C4W2 ungraded attention lab question

I have a question about dimensions.
First, d_k = K.shape[-1]. But it is also equal to Q.shape[-1], right? This is the size of embedding which for Q and K should match? Are those statements correct?
Second question is about this code:


mask_size=q.shape[-2]. Why not just say q.shape[0]? For matrix Q this is the case.
In general, I find that the most complicated part of AI work is dealing with dimensions and tensor manipulations, just hope to get to the bottom of it. Thank you very much!

Hi @Dennis_Sinitsky

Yes

Depending on the dimension of q, they could be the same, or they could be different. In this case, q.shape[-2] is the same as q.shape[0].
Why they chose -2 in this case is not very obvious.

It is definitely not the most complicated part but very important. Understanding the shapes that go through the model are absolutely crucial (one of the reasons why I prefer the PyTorch way of explicitly defining what are the size of the inputs and what are the size of the outputs when creating layers). Make sure you’re comfortable with dimensions and tensor manipulations before trying to understand the calculations (activations, normalizations, etc.).

Cheers

Thank you. Any recommendation for good PyTorch courses on Coursera or Udemy? I took two TF specializations from Lawrence Moroney already: TF professional and TF advanced. But I was also told that PyTorch is taking over from TF.

I have not done any searching for PyTorch courses, but there is one specialization from DeepLearning.AI that uses PyTorch: the GANs Specialization. That’s an interesting topic in its own right. If you take GANs, you get a nice introduction to PyTorch as a useful side effect.

2 Likes

Thank you, Paul. I am planning to enroll in this course after complete NLP. If you or anyone else know of a nice advanced transformer/diffusion course also with PyTorch, please let me know. Does not have to be from OpenAI or even from Coursera.
Thanks in advance!
DS

1 Like

That sounds like a great plan. My take is that once you’ve done the 3 courses of GANs, you’ll have a pretty solid understanding of how to use PyTorch. You probably won’t need to take an explicit PyTorch course after that, but can just google new topics and find StackExchange articles or the PyTorch documentation and tutorials for the “finer points” or more advanced techniques. pytorch.org has lots of documentation and a discussion forum that’s been in operation for quite a few years, so there’s a good body of knowledge to search there. But see how you feel when you finish GANs and want to apply torch to something new.

2 Likes

Hi @Dennis_Sinitsky

I don’t really know the current state of PyTorch courses, since I learned a while ago and things change fast (the course couple of years ago on PyTorch would not be the best recommendation, and also, the most recent course on PyTorch is not necessarily the best either). In other words, research what courses are there and choose according to your current understanding.

I would recommend one excellent overview of what you already learned but from another perspective “Neural Networks: Zero to Hero” by Andrej Karpathy. The PyTorch knowledge required is minimal but the content is great in explaining a lot of details covered and not covered in the NLP Specialization.

Cheers

2 Likes

Thanks for the link! Yet another example of the general principle that anything by Andrej Karpathy is worth reading or watching. :grinning:

2 Likes