Where we define the shared layers as global variables, why use custom softmax rather than the name of an activation?
activator = Activation(softmax, name=‘attention_weights’) # We are using a custom softmax(axis = 1) loaded in this notebook
Where we define the shared layers as global variables, why use custom softmax rather than the name of an activation?
activator = Activation(softmax, name=‘attention_weights’) # We are using a custom softmax(axis = 1) loaded in this notebook
Did you take a look at the source for the softmax
function that they provide here? You can find it by clicking “File -> Open
” and then opening the file nmt_utils.py
(I figured out that name by examining the import cell early in the notebook).
You can see that they have added some logic to handle some special cases.