Wrong value while training for encoder

Final lab

I’m on exercise 4 out of 8 on the Transformers lab. Obviously, having worked this hard, I would really like to pass this course. I’ve gotten here with some help, but without any use of AI or googling solutions, and without anyone actually showing me how to do anything, on this forum or outside of it.

Now I am half a lab from done with a 20ish week class. I’m not a cheater. I have never cheated on a homework or test problem in my life.

But I would honestly like what help you are able to give me without actually cheating here. Please. I’m so sorry to bother you.

I’ve got code written that passes the unit tests for the previous three parts. Thanks. Clarifying what question one was asking did help. Question four here honestly seriously looks okay to me. I’ve checked up down and backward. But it is giving me this error.

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-21-00617004b1af> in <module>
      1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
     92                        [[ 0.23017104, -0.98100424, -0.78707516,  1.5379084 ],
     93                        [-1.2280797 ,  0.76477575, -0.7169283 ,  1.1802323 ],
---> 94                        [ 0.14880152, -0.48318022, -1.1908402 ,  1.5252188 ]]), "Wrong values when training=True"
     95 
     96     encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))

AssertionError: Wrong values when training=True

So I guess my first question: Is it supposed to have training=training for every layer call? Since it is in fact training? I guess I would have assumed so, since backward propogation has to happen. That’s how it currently is in my jupyter notebook, though I’ve tried zillions of variations.

Secondly, based on the documentation for tf.keras.MultiHeadAttention, I think I’m probably not supposed to modify the mask here in this specific function? So that it is either ones or zeros? Right now I have not modified it.

Thirdly, all this routine is asking for is the MHA followed by a layer normalization followed by a FFN followed by a Dropout layer followed by a layer normalization of a sum right?

Am I missing anything here conceptually?

Steven

No, just for the layer its mentioned in the comments above it.

You are not supposed to modify the mask in this function, correct.

Yes.

I wasn’t sure whether you meant that the MHA layer should also have training=training so I tried that both ways, and neither gives the correct output. My assumption was that that layer should use training=training.

When it says to add the skip connection to the ffn output, does it literally mean to add the variables skip_x_attention and ffn_output? That would make sense (and is what I did). Or does it mean something more nuanced?

Steven

And, I guess, a second related question. I don’t see a place to include it here, but to account for back-propogation when training, usually we need to have something like model=Model(inputs=,outputs=, initial_values=). Here the EncoderLayer object, which inherits from tf.keras.Layer, is playing the role of the Sequential or Model object normally, rather than a Layer object. I’m not quite sure how something that inherits from Layer can do that, I’m not familiar enough with keras. But how is the necessary information about the skip connection provided to the model/layer/whatever for back-propogation purposes if there are no inputs or outputs? Or should there be?

Its not in the solution, from what I understand from this in TF:

training Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (no dropout). Will go with either using the training mode of the parent layer/model, or False (inference) if there is no parent layer.

So if there is a dropout the training becomes True.

Thats right, you are correct.

I have to study this point thoroughly myself its been a long time and I am not sure If I analyzed this when I was doing the course back then myself!

I’m just not getting their answer and I don’t see a reason. Is there someone who can take a look and see if something is wrong with how it is running? It looks like my understanding is correct and my previous code is correct. Likely, my code is correct.

Steven

This class costs actual money for me. I make a couple hundred a month. It costs actual time for me. I work part time for health reasons. I need this for actual career reasons. Please take a look.

The way Layer functions work is in two steps: you invoke it as a “Layer” to “instantiate” it with the parameters you want. The return value of that function is the actual instantiated function. Then you call that function with inputs and outputs. Here’s a thread that discusses that general idea in more details.

It all worked for me but I don’t recall how many iterations it took me to get to that point. :nerd_face:

If the tests fail, that means your code is incorrect. Now we need to figure out why. This is all complicated enough that we probably can’t do it just by talking about issues in general. I will send you a DM and we can use that private thread to discuss the actual code. Stay tuned …

1 Like

Yeah, I know. I’ve dealt with functions like that before. I’ll take a look at the thread though. That wasn’t what I meant by my question.

Inheritance in Object Oriented Programming

# UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION EncoderLayer
class EncoderLayer(tf.keras.layers.Layer):
    """
    The encoder layer is composed by a multi-head self-attention mechanism,
    followed by a simple, positionwise fully connected feed-forward network. 
    This architecture includes a residual connection around each of the two 
    sub-layers, followed by layer normalization.
    """
    def __init__(self, embedding_dim, num_heads, fully_connected_dim,
                 dropout_rate=0.1, layernorm_eps=1e-6):
        super(EncoderLayer, self).__init__()

In this object specification, the class Encoder which is defined by whomever wrote the project, inherits from the tf.keras object tf.keras.Layer. This is specified in the line

class EncoderLayer(tf.keras.Layer):

In the init method, which specifies what happens when the constructor to the Encoder method is called to instantiate the EncoderLayer object, the super method is called on the EncoderLayer object (which is the function defined right here), in order to call the constructor method inherited from the parent class tf.keras.Layer.

But the parent class is tf.keras.Layer, not tf.keras.Sequential or tf.keras.Model. I just found, trying to look up the inheritance structure of Layer and Model, that Layers is recursively callable. So, it is possible to have Layer objects as member variables of Layers, each defined by their own constructor within the init method and then called, as you said, within the call method.

Recursive layer inheritance

So I should have been doing that part basically right, the question is about th exact details.

I guess the only remaining question, then, is about the skip connection. The example given here has only one input and one output to the layer which is taking the place of the model. What happens if there are two outputs?

I took this class 7 years ago ! and don’t have access to the current version. Apologies in advance if this older material is obsolete, but, it seems this error message has a history in the forum. This thread, for example:

Have you compared any of those to your situation? The learner who created that post seems to have resolved the impasse, and that of others with the same assertion error. Maybe worth a look?

1 Like