C4W2 ungraded attention lab question

Dennis_Sinitsky · March 15, 2024, 4:00pm

I have a question about dimensions.
First, d_k = K.shape[-1]. But it is also equal to Q.shape[-1], right? This is the size of embedding which for Q and K should match? Are those statements correct?
Second question is about this code:

mask_size=q.shape[-2]. Why not just say q.shape[0]? For matrix Q this is the case.
In general, I find that the most complicated part of AI work is dealing with dimensions and tensor manipulations, just hope to get to the bottom of it. Thank you very much!

arvyzukai · March 15, 2024, 4:38pm

Hi @Dennis_Sinitsky

Yes

Depending on the dimension of q, they could be the same, or they could be different. In this case, q.shape[-2] is the same as q.shape[0].
Why they chose -2 in this case is not very obvious.

It is definitely not the most complicated part but very important. Understanding the shapes that go through the model are absolutely crucial (one of the reasons why I prefer the PyTorch way of explicitly defining what are the size of the inputs and what are the size of the outputs when creating layers). Make sure you’re comfortable with dimensions and tensor manipulations before trying to understand the calculations (activations, normalizations, etc.).

Cheers

Dennis_Sinitsky · March 15, 2024, 5:01pm

Thank you. Any recommendation for good PyTorch courses on Coursera or Udemy? I took two TF specializations from Lawrence Moroney already: TF professional and TF advanced. But I was also told that PyTorch is taking over from TF.

paulinpaloalto · March 16, 2024, 1:08am

I have not done any searching for PyTorch courses, but there is one specialization from DeepLearning.AI that uses PyTorch: the GANs Specialization. That’s an interesting topic in its own right. If you take GANs, you get a nice introduction to PyTorch as a useful side effect.

Dennis_Sinitsky · March 16, 2024, 2:34pm

Thank you, Paul. I am planning to enroll in this course after complete NLP. If you or anyone else know of a nice advanced transformer/diffusion course also with PyTorch, please let me know. Does not have to be from OpenAI or even from Coursera.
Thanks in advance!
DS

paulinpaloalto · March 16, 2024, 3:54pm

That sounds like a great plan. My take is that once you’ve done the 3 courses of GANs, you’ll have a pretty solid understanding of how to use PyTorch. You probably won’t need to take an explicit PyTorch course after that, but can just google new topics and find StackExchange articles or the PyTorch documentation and tutorials for the “finer points” or more advanced techniques. pytorch.org has lots of documentation and a discussion forum that’s been in operation for quite a few years, so there’s a good body of knowledge to search there. But see how you feel when you finish GANs and want to apply torch to something new.

arvyzukai · March 18, 2024, 7:20am

Hi @Dennis_Sinitsky

I don’t really know the current state of PyTorch courses, since I learned a while ago and things change fast (the course couple of years ago on PyTorch would not be the best recommendation, and also, the most recent course on PyTorch is not necessarily the best either). In other words, research what courses are there and choose according to your current understanding.

I would recommend one excellent overview of what you already learned but from another perspective “Neural Networks: Zero to Hero” by Andrej Karpathy. The PyTorch knowledge required is minimal but the content is great in explaining a lot of details covered and not covered in the NLP Specialization.

Cheers

paulinpaloalto · March 18, 2024, 3:27pm

Thanks for the link! Yet another example of the general principle that anything by Andrej Karpathy is worth reading or watching.

Topic		Replies	Views
Wrong comments in the assignment of C4W2 NLP with Attention Models general	3	76	June 19, 2024
[Week 4] - Lab - Self Attention Sequence Models	1	624	June 4, 2021
C5 W4 A1 E3 help me I don't understand the dimensions of scaled_dot_product_attention Sequence Models week-4	3	257	February 5, 2024
Week 2 assignment. Encoder dimensions NLP with Attention Models week-2	7	31	October 8, 2024
Scaled_dot_product_attention q, k, and v dimensions not correct Sequence Models	4	445	July 21, 2023

C4W2 ungraded attention lab question

Related topics