Question about Week-3 Assignment 1

Where we define the shared layers as global variables, why use custom softmax rather than the name of an activation?

activator = Activation(softmax, name=‘attention_weights’) # We are using a custom softmax(axis = 1) loaded in this notebook

Did you take a look at the source for the softmax function that they provide here? You can find it by clicking “File -> Open” and then opening the file nmt_utils.py (I figured out that name by examining the import cell early in the notebook).

You can see that they have added some logic to handle some special cases.