Neural_machine_translation_with_attention_v4a //Why is alphas calculated by axis=1?

Hey there,

In the homework, we see this to get the alphas for the attention that Y-t will pay for A-t of source, only wondering here about axis=1.

activator = Activation(sftm, name='attention_weights') # We are using a custom softmax(axis = 1) loaded in this notebook

Actually it is tf.nn.softmax( logits, axis=1 ), but
why :man_shrugging: on axis 1 , as I know axis 1 means the timesteps refered by Tx (If I am wrong, just correct me).

Hi @Chris.X ,

axis: Integer, axis along which the softmax normalization is applied

So when axis =1, it means the normalization is applied on the column axis.

Why must be 1 not -1 here or must the softmax be applied on column? @Kic

Hi @Chris.X ,

This softmax() function is a custom built utility function, implemented with Keras. You can find it in nmt.utilits.py. Just click file->open on the top menu bar of your Jupyter notebook.

This softmax() function has two input parameters, axis is one of them. Axis is an integer specifying which axis the normalization is to applied. In this particular case, axis is set to 1 for column. If it is set to 0, the normalization will be applied to the rows.

Thanks,

BTW I find that using build-in function to do this is much more efficiently. :wink:

def softmax(logits):
    return tf.nn.softmax( logits, axis=1 )

activator = Activation(softmax, name='attention_weights')

Hi @Chris,

One of the beautiful things about software development is the creativity of the designer. I am sure you would agree with me here.

1 Like