Not sure the first unit test is right in week 4 lab

s-dorsher · May 23, 2025, 2:02pm

The following information is provided prior to the first routine we are asked to implement in the week 4 lab on transformers.

I am going to show first an error I get when I implement the routine in a way I believe is wrong, because it provides some useful debugging info, then an error I get when I implement it in a way I believe is correct. I implemented it in the correct way first, in reality.

The unit test is below

from public_tests import *

get_angles_test(get_angles)

# Example
position = 4
d_model = 8
pos_m = np.arange(position)[:, np.newaxis]
dims = np.arange(d_model)[np.newaxis, :]
get_angles(pos_m, dims, d_model)

The debugging output

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-20-2c76efaf3fad> in <module>
      1 from public_tests import *
      2 
----> 3 get_angles_test(get_angles)
      4 
      5 # Example

~/work/W4A1/public_tests.py in get_angles_test(target)
     11 
     12     assert type(result) == np.ndarray, "You must return a numpy ndarray"
---> 13     assert result.shape == (position, d_model), f"Wrong shape. We expected: ({position}, {d_model})"
     14     assert np.sum(result[0, :]) == 0
     15     assert np.isclose(np.sum(result[:, 0]), position * (position - 1) / 2)

AssertionError: Wrong shape. We expected: (4, 16)

Okay, well, that doesn’t surprise me. But what that does inform me, is that the position is supposed to form dimension zero.

result[pos,dims]

Is the desired dimension

Okay, I’m done with the definitely wrong way, that I knew was wrong from the start, now that I’ve proved the point that the position is along the column.

I’ve transposed it (actually, removed a transpose, since I had the dimensions right initially)

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-22-2c76efaf3fad> in <module>
      1 from public_tests import *
      2 
----> 3 get_angles_test(get_angles)
      4 
      5 # Example

~/work/W4A1/public_tests.py in get_angles_test(target)
     12     assert type(result) == np.ndarray, "You must return a numpy ndarray"
     13     assert result.shape == (position, d_model), f"Wrong shape. We expected: ({position}, {d_model})"
---> 14     assert np.sum(result[0, :]) == 0
     15     assert np.isclose(np.sum(result[:, 0]), position * (position - 1) / 2)
     16     even_cols =  result[:, 0::2]

AssertionError:

Here I’m seeing that if I set the position equal to zero and sum along the rows, it is supposed to sum to zero. That’s a Discrete Fourier sum in terms of sines and cosines of the following segment of the table

A DFT sums to zero if and only if the signal being transformed is zero. However, a DFT can be a dirac delta function (zero everywhere but at one frequency). This is a DFT where the amplitude of every component is one. In other words, a constant signal. So I actually computed this sum analytically, to check, and it is approximately one, not approximately zero. Handwritten calculation attached.

Could someone please let me know what you think? Either if something is going wrong conceptually, with this calculation, or with the unit test?

I’m very tired right now, but fortunately there are also a couple of days to look into this.

Thanks,
Steven Dorsher

paulinpaloalto · May 23, 2025, 4:19pm

It worked for me. I added some print statements in the body of get_angles to show what is going on and here’s what I see:

pos [[0]
 [1]
 [2]
 [3]]
k [[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]]
angles [[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.00000000e+00 1.00000000e+00 3.16227766e-01 3.16227766e-01
  1.00000000e-01 1.00000000e-01 3.16227766e-02 3.16227766e-02
  1.00000000e-02 1.00000000e-02 3.16227766e-03 3.16227766e-03
  1.00000000e-03 1.00000000e-03 3.16227766e-04 3.16227766e-04]
 [2.00000000e+00 2.00000000e+00 6.32455532e-01 6.32455532e-01
  2.00000000e-01 2.00000000e-01 6.32455532e-02 6.32455532e-02
  2.00000000e-02 2.00000000e-02 6.32455532e-03 6.32455532e-03
  2.00000000e-03 2.00000000e-03 6.32455532e-04 6.32455532e-04]
 [3.00000000e+00 3.00000000e+00 9.48683298e-01 9.48683298e-01
  3.00000000e-01 3.00000000e-01 9.48683298e-02 9.48683298e-02
  3.00000000e-02 3.00000000e-02 9.48683298e-03 9.48683298e-03
  3.00000000e-03 3.00000000e-03 9.48683298e-04 9.48683298e-04]]
All tests passed
pos [[0]
 [1]
 [2]
 [3]]
k [[0 1 2 3 4 5 6 7]]
angles [[0.e+00 0.e+00 0.e+00 0.e+00 0.e+00 0.e+00 0.e+00 0.e+00]
 [1.e+00 1.e+00 1.e-01 1.e-01 1.e-02 1.e-02 1.e-03 1.e-03]
 [2.e+00 2.e+00 2.e-01 2.e-01 2.e-02 2.e-02 2.e-03 2.e-03]
 [3.e+00 3.e+00 3.e-01 3.e-01 3.e-02 3.e-02 3.e-03 3.e-03]]

So you can see that (at least this time ) the docstring of the function is actually correct: pos is a column vector and k is a row vector. So you don’t need to do any manipulation of the shapes or orientations: just write the code as if they were scalars and it all “just works” through the magic of broadcasting.

To see a demonstration of what is happening there, watch this:

A = np.arange(4)[:, np.newaxis]
B = 2 * np.ones((1,6))
print(f"A.shape {A.shape}")
print(f"A {A}")
print(f"B.shape {B.shape}")
print(f"B {B}")
quotient = A / B
print(f"quotient.shape {quotient.shape}")
print(f"quotient {quotient}")
A.shape (4, 1)
A [[0]
 [1]
 [2]
 [3]]
B.shape (1, 6)
B [[2. 2. 2. 2. 2. 2.]]
quotient.shape (4, 6)
quotient [[0.  0.  0.  0.  0.  0. ]
 [0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1. ]
 [1.5 1.5 1.5 1.5 1.5 1.5]]

The “/” operation is “elementwise”, so broadcasting just expands both operands to get a compatible shape.

Also note that we aren’t doing Fourier Transforms yet. That doesn’t come until the next function. The reason the first row sums to zero is that it’s all zeros, right?

s-dorsher · May 24, 2025, 3:10am

Up until this sentence, I agreed with everything.

The problem is that we’re supposed to calculate the angles. And those are given by alternating sines and cosines of

\frac{pos}{10000^\frac{2i}{d}}

Let’s consider what the first row of this matrix SHOULD be.

pos=0

so

OH THE ANGLES ARENT GIVEN BY THE SINES AND COSINES THE ANGLE IS THE

\frac{pos}{10000^\frac{2i}{d}}

Okay that was the concept I was missing. I wasn’t sure what the function was supposed to return.

It’s actually not quite a fourier sum, as it turns out, because

10000^\frac{2i}{d} is a factor that doesn’t scale linearly with i

But it certainly is inspired by one

Steven

paulinpaloalto · May 24, 2025, 4:55pm

Fair enough, but note that I was not the one who brought up Fourier Transforms here.

Here are the instructions for that function:

### Exercise 1 - get_angles

Implement the function `get_angles()` to calculate the possible angles for the sine and cosine positional encodings

s-dorsher · May 24, 2025, 6:47pm

It’s been right, according to the unit tests, for at least 12 hours. The first three exercises within the lab have all been right according to the unit tests for at least 12 hours. I don’t know why I got a zero on the homework. The unit tests say three parts out of eight are 100% right.

paulinpaloalto · May 24, 2025, 11:06pm

I explained it on this thread.

There is no point in submitting to the grader until you pass all the tests in the notebook and of course that is only a necessary condition for passing the grader, not a sufficient condition.

s-dorsher · May 24, 2025, 11:50pm

Then why is the pass condition 80/100 instead of 100?

paulinpaloalto · May 25, 2025, 12:21am

It’s possible that you can fail a grader check without throwing an exception.

paulinpaloalto · May 25, 2025, 12:24am

But given that they seem to have written a lot of the tests in this notebook using assertions, maybe it literally isn’t possible to get 80/100 on this assignment. I would bet you all the beer you can drink in one sitting that the course staff set that threshold, but never actually ran any tests to make sure it was possible to get a score between 80 and 100 strictly.

Topic		Replies	Views
Course 5 week 4 Sequence Models coursera-platform	1	772	June 27, 2021
C5W4A1 Ex1 Get Angles Sequence Models coursera-platform	2	778	August 13, 2021
C5_W4_A1_Transformer_Subclass_v1- Exercise 8 Sequence Models coursera-platform	7	737	June 11, 2025
Pls Help Assertion error get_angles Sequence Models coursera-platform	3	547	May 6, 2022
Bug in week 1 practice lab unit tests? Advanced Learning Algorithms week-1	7	37	December 24, 2024

Not sure the first unit test is right in week 4 lab

Related topics