C5W3A1 NMT w/ attention - What if prev pred are fed into input along context

I understand date format translation doesn’t involve inter-character dependency, so the post-attention LSTM has context as the only inputs (along with prev hidden and cell state). But let’s say we have a different task where prediction at <t-1> helps (language translation perhaps?), then how should the input be constructed?

My guess is concat the context with pred at <t-1> (more like the classic RNN), treating the concat as the input to the post-attention LSTM cell? I am not sure about this approach as it would make the input shape larger, would appreciate any insight!

_, s, c = post_attention_LSTM_cell(inputs=context + ???, initial_state=[s_prev, c_prev])

Your approach looks right. What are your results?

I tried for time step 2~Ty (cuz t=1 doesn’t have prev prediction):
_, s, c = post_activation_LSTM_cell(inputs=concatenator([context, outputs[t-1]]), initial_state=[s, c])

Using the keras Concatenate threw a ValueError, saying shape(context) =(1,number of hidden nodes) is different from shape(outputs[t-1]) which is a softmax output = (len(Y_vocab)) 1D vector. My question is how to combine pred<t-1> and context, so the new output of the post-attention LSTM takes into account both. Thanks!

ValueError: Shape must be rank 3 but is rank 2 for ‘{{node concatenate/concat_2}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](dot/MatMul_1, dense_2/Softmax, concatenate/concat_2/axis)’ with input shapes: [?,1,64], [?,11], .

One way to combine both pieces of information is to add them.

Here’s some code to help with that:

    output = None
    # Step 2: Iterate for Ty steps
    for t in range(Ty):
        # Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)
        context = None # fill this
        if output is None:
            output = tf.zeros_like(context)
        context = tf.add(context, output)
        # ...
        # Step 2.B: Apply the post-attention LSTM cell 
        output, s, c = None # post_activation_LSTM_cell
        output = tf.expand_dims(output, axis=-2)

You’ll have to play around with the shapes and rest of the code to ensure that the NN can work properly for the case of concatenation.

1 Like

tf.expand_dims(output, axis=-2)
seems to give new shape [?,1,11], can this be added with shape [?,1,64]?
according to broadcasting rule: two dimensions either have to be equal or one of them needs to be 1. I seem to have to further enlarge [?,1,11] to [?,1,64]? Is this the correct mindset and if so how do I reshape it accordingly?

One thing to keep in mind that we can always use a Dense layer to get a different representation of features. For instance, if you have a data of shape (None, 11), they can be mapped to (None, 64) via a Dense layer. Couple this with expand_dims and you have (None, 1, 64). With the same shape as context, adding is straight forward.

you said
output, s, c = post_activation_LSTM_cell(context,initial_state=[s, c])
both output and s are final hidden state, because in lab return_state=True
So basically you are feeding s twice?

from keras documentation


lstm = keras.layers.LSTM(
    4, return_sequences=True, return_state=True)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)

I think the previous prediction is
out = output_layer(s)
and we have to pass it as input along with context? And when Andrew mentioned that previous timestep prediction can be feed as input along with context . He really means the output of dense layer ?

if yes, kindly help me to resolve the below query. I am working on some self project

Below is some section of Training model code

Prev_pred=None

for t in range(Ty):

# Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t
context = one_step_attention(X[:,t,:], cnn_model_input, s)
print(context.shape)

# Apply the post-attention LSTM cell to the "context" vector.
if prev_pred is None:
  # Initialize prev_pred with the desired shape
  prev_pred = tf.keras.Input(shape=(1, features))
  prev_pred = tf.zeros_like(prev_pred)
  #print(prev_pred.shape)
else :
  prev_pred=RepeatVector(1)(prev_pred)
  #print(prev_pred.shape)


s,_,c = LSTM(n_s, return_state = True)(context,initial_state=[s, c])

# Apply a Dense layer to the hidden state output of the post-attention LSTM 
out = Dense(features, activation='linear')(s)
prev_pred=out

# Step 2.D: Append "out" to the "outputs" list (≈ 1 line)
outputs.append(out)



# Create model
model = Model(inputs=[X_CNN_input, X_lstm_input, s0, c0],outputs=outputs)

return model

Below are the dimensions for my variables.

context -> (None, None, 64) 
s-> (None, 64)
out -> (None, 30)
prev_pred -> I have made it be of shape (None, 1, 30)

I have tried so many things like concatenation and applying the Add() layer after adding 34 lengths of additional zeros to prev_pred in the channel direction so that it can be added with context easily, but nothing is working getting different errors.

#context = tf.concat([context, prev_pred], axis=-1)  # Concatenate with previous prediction
#context=Add()(context,prev_pred) # already added zeros to prev_pred in the channel direction, but that part is not included here to avoid confusion

The above these two methods don’t work

In my code 
n_s value is 64
features value is 30

If I use neither concatenation nor Add, my code works fine.

Seems simpler to initialize the initial prev_pred to zero, then you can avoid the complicated (and probably incorrect) if-statement that includes defining the input shape.

1 Like