Not sure the first unit test is right in week 4 lab

Week 4 lab

The following information is provided prior to the first routine we are asked to implement in the week 4 lab on transformers.

I am going to show first an error I get when I implement the routine in a way I believe is wrong, because it provides some useful debugging info, then an error I get when I implement it in a way I believe is correct. I implemented it in the correct way first, in reality.

The unit test is below

from public_tests import *

get_angles_test(get_angles)

# Example
position = 4
d_model = 8
pos_m = np.arange(position)[:, np.newaxis]
dims = np.arange(d_model)[np.newaxis, :]
get_angles(pos_m, dims, d_model)

The debugging output

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-20-2c76efaf3fad> in <module>
      1 from public_tests import *
      2 
----> 3 get_angles_test(get_angles)
      4 
      5 # Example

~/work/W4A1/public_tests.py in get_angles_test(target)
     11 
     12     assert type(result) == np.ndarray, "You must return a numpy ndarray"
---> 13     assert result.shape == (position, d_model), f"Wrong shape. We expected: ({position}, {d_model})"
     14     assert np.sum(result[0, :]) == 0
     15     assert np.isclose(np.sum(result[:, 0]), position * (position - 1) / 2)

AssertionError: Wrong shape. We expected: (4, 16)

Okay, well, that doesn’t surprise me. But what that does inform me, is that the position is supposed to form dimension zero.

result[pos,dims]

Is the desired dimension

Okay, I’m done with the definitely wrong way, that I knew was wrong from the start, now that I’ve proved the point that the position is along the column.

I’ve transposed it (actually, removed a transpose, since I had the dimensions right initially)

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-22-2c76efaf3fad> in <module>
      1 from public_tests import *
      2 
----> 3 get_angles_test(get_angles)
      4 
      5 # Example

~/work/W4A1/public_tests.py in get_angles_test(target)
     12     assert type(result) == np.ndarray, "You must return a numpy ndarray"
     13     assert result.shape == (position, d_model), f"Wrong shape. We expected: ({position}, {d_model})"
---> 14     assert np.sum(result[0, :]) == 0
     15     assert np.isclose(np.sum(result[:, 0]), position * (position - 1) / 2)
     16     even_cols =  result[:, 0::2]

AssertionError: 

Here I’m seeing that if I set the position equal to zero and sum along the rows, it is supposed to sum to zero. That’s a Discrete Fourier sum in terms of sines and cosines of the following segment of the table

image

A DFT sums to zero if and only if the signal being transformed is zero. However, a DFT can be a dirac delta function (zero everywhere but at one frequency). This is a DFT where the amplitude of every component is one. In other words, a constant signal. So I actually computed this sum analytically, to check, and it is approximately one, not approximately zero. Handwritten calculation attached.

Could someone please let me know what you think? Either if something is going wrong conceptually, with this calculation, or with the unit test?

I’m very tired right now, but fortunately there are also a couple of days to look into this.

Thanks,
Steven Dorsher

It worked for me. I added some print statements in the body of get_angles to show what is going on and here’s what I see:

pos [[0]
 [1]
 [2]
 [3]]
k [[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]]
angles [[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.00000000e+00 1.00000000e+00 3.16227766e-01 3.16227766e-01
  1.00000000e-01 1.00000000e-01 3.16227766e-02 3.16227766e-02
  1.00000000e-02 1.00000000e-02 3.16227766e-03 3.16227766e-03
  1.00000000e-03 1.00000000e-03 3.16227766e-04 3.16227766e-04]
 [2.00000000e+00 2.00000000e+00 6.32455532e-01 6.32455532e-01
  2.00000000e-01 2.00000000e-01 6.32455532e-02 6.32455532e-02
  2.00000000e-02 2.00000000e-02 6.32455532e-03 6.32455532e-03
  2.00000000e-03 2.00000000e-03 6.32455532e-04 6.32455532e-04]
 [3.00000000e+00 3.00000000e+00 9.48683298e-01 9.48683298e-01
  3.00000000e-01 3.00000000e-01 9.48683298e-02 9.48683298e-02
  3.00000000e-02 3.00000000e-02 9.48683298e-03 9.48683298e-03
  3.00000000e-03 3.00000000e-03 9.48683298e-04 9.48683298e-04]]
All tests passed
pos [[0]
 [1]
 [2]
 [3]]
k [[0 1 2 3 4 5 6 7]]
angles [[0.e+00 0.e+00 0.e+00 0.e+00 0.e+00 0.e+00 0.e+00 0.e+00]
 [1.e+00 1.e+00 1.e-01 1.e-01 1.e-02 1.e-02 1.e-03 1.e-03]
 [2.e+00 2.e+00 2.e-01 2.e-01 2.e-02 2.e-02 2.e-03 2.e-03]
 [3.e+00 3.e+00 3.e-01 3.e-01 3.e-02 3.e-02 3.e-03 3.e-03]]

So you can see that (at least this time :smile:) the docstring of the function is actually correct: pos is a column vector and k is a row vector. So you don’t need to do any manipulation of the shapes or orientations: just write the code as if they were scalars and it all “just works” through the magic of broadcasting.

To see a demonstration of what is happening there, watch this:

A = np.arange(4)[:, np.newaxis]
B = 2 * np.ones((1,6))
print(f"A.shape {A.shape}")
print(f"A {A}")
print(f"B.shape {B.shape}")
print(f"B {B}")
quotient = A / B
print(f"quotient.shape {quotient.shape}")
print(f"quotient {quotient}")
A.shape (4, 1)
A [[0]
 [1]
 [2]
 [3]]
B.shape (1, 6)
B [[2. 2. 2. 2. 2. 2.]]
quotient.shape (4, 6)
quotient [[0.  0.  0.  0.  0.  0. ]
 [0.5 0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1.  1. ]
 [1.5 1.5 1.5 1.5 1.5 1.5]]

The “/” operation is “elementwise”, so broadcasting just expands both operands to get a compatible shape.

Also note that we aren’t doing Fourier Transforms yet. That doesn’t come until the next function. The reason the first row sums to zero is that it’s all zeros, right? :nerd_face:

Up until this sentence, I agreed with everything.

The problem is that we’re supposed to calculate the angles. And those are given by alternating sines and cosines of

\frac{pos}{10000^\frac{2i}{d}}

Let’s consider what the first row of this matrix SHOULD be.

pos=0

so

OH THE ANGLES ARENT GIVEN BY THE SINES AND COSINES THE ANGLE IS THE

\frac{pos}{10000^\frac{2i}{d}}

Okay that was the concept I was missing. I wasn’t sure what the function was supposed to return.

It’s actually not quite a fourier sum, as it turns out, because

10000^\frac{2i}{d} is a factor that doesn’t scale linearly with i

But it certainly is inspired by one

Steven

Fair enough, but note that I was not the one who brought up Fourier Transforms here. :nerd_face:

Here are the instructions for that function:

### Exercise 1 - get_angles

Implement the function `get_angles()` to calculate the possible angles for the sine and cosine positional encodings

It’s been right, according to the unit tests, for at least 12 hours. The first three exercises within the lab have all been right according to the unit tests for at least 12 hours. I don’t know why I got a zero on the homework. The unit tests say three parts out of eight are 100% right.

I explained it on this thread.

There is no point in submitting to the grader until you pass all the tests in the notebook and of course that is only a necessary condition for passing the grader, not a sufficient condition.

Then why is the pass condition 80/100 instead of 100?

It’s possible that you can fail a grader check without throwing an exception.

But given that they seem to have written a lot of the tests in this notebook using assertions, maybe it literally isn’t possible to get 80/100 on this assignment. I would bet you all the beer you can drink in one sitting that the course staff set that threshold, but never actually ran any tests to make sure it was possible to get a score between 80 and 100 strictly.

2 Likes