W3A1E2 - def modelf error "Input 0 of layer repeat_vector is incompatible with the layer: expected ndim=2, found ndim=1. Full shape received: [64]"

I’m defining modelf and I suspect I am not getting right the variables for one_step_attention call or the Bidirectional(LSTM(...))(X) call.

I have printed the dimensions for a and s to see how to go about slicing using t for every Ty and maybe there’s something I’m missing. If I can remove the ‘None’ rank from the tensors, my slice of ‘a’ could match the rank and shape of ‘s’.

a dim : 3 
 s dim : 2
a shape =  (None, 30, 64)
s shape =  (None, 64)
a[:,0,:] shape =  (None, 64)
s[0] shape =  (64,)

This is the error I’m getting:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-cf0a64ed59c0> in <module>
     35 
     36 
---> 37 modelf_test(modelf)

<ipython-input-27-cf0a64ed59c0> in modelf_test(target)
     12 
     13 
---> 14     model = target(Tx, Ty, n_a, n_s, len_human_vocab, len_machine_vocab)
     15 
     16     print(summary(model))

<ipython-input-26-b2337de09fe4> in modelf(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size)
     49 
     50         # Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)
---> 51         context = one_step_attention(a, s[t-1])
     52 
     53         # Step 2.B: Apply the post-attention LSTM cell to the "context" vector. (≈ 1 line)

<ipython-input-9-0faf1fc1147e> in one_step_attention(a, s_prev)
     17     ### START CODE HERE ###
     18     # Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)
---> 19     s_prev = repeator(s_prev)
     20     # Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)
     21     # For grading purposes, please list 'a' first and 's_prev' second, in this order.

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    924     if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
    925       return self._functional_construction_call(inputs, args, kwargs,
--> 926                                                 input_list)
    927 
    928     # Maintains info about the `Layer.call` stack.

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
   1090       # TODO(reedwm): We should assert input compatibility after the inputs
   1091       # are casted, not before.
-> 1092       input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
   1093       graph = backend.get_graph()
   1094       # Use `self._name_scope()` to avoid auto-incrementing the name.

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/input_spec.py in assert_input_compatibility(input_spec, inputs, layer_name)
    178                          'expected ndim=' + str(spec.ndim) + ', found ndim=' +
    179                          str(ndim) + '. Full shape received: ' +
--> 180                          str(x.shape.as_list()))
    181     if spec.max_ndim is not None:
    182       ndim = x.shape.ndims

ValueError: Input 0 of layer repeat_vector is incompatible with the layer: expected ndim=2, found ndim=1. Full shape received: [64]

I think the problem is the way that you are calling one_step_attention. I added print statements in one_step_attention to see what is happening when I call it from modelf. Here’s what I see with those print statements when I run the test cell for modelf:

before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
before repeator s_prev.shape = (None, 64)
after repeator s_prev.shape = (None, 30, 64)
[['InputLayer', [(None, 30, 37)], 0], ['InputLayer', [(None, 64)], 0], ['Bidirectional', (None, 30, 64), 17920], ['RepeatVector', (None, 30, 64), 0, 30], ['Concatenate', (None, 30, 128), 0], ['Dense', (None, 30, 10), 1290, 'tanh'], ['Dense', (None, 30, 1), 11, 'relu'], ['Activation', (None, 30, 1), 0], ['Dot', (None, 1, 64), 0], ['InputLayer', [(None, 64)], 0], ['LSTM', [(None, 64), (None, 64), (None, 64)], 33024, [(None, 1, 64), (None, 64), (None, 64)], 'tanh'], ['Dense', (None, 11), 715, 'softmax']]
All tests passed!
2 Likes

Thanks again, Paul. I was able to troubleshoot the code and produce a successful test using your feedback but now I have more questions.

  1. Originally, I thought for t in range(Ty): was going to require me slicing any of the tensors (s,c) using t. My working version references (s,c) without any slicing. What is the need for this for loop? I didn’t have to use t at any point so I don’t understand how we’re moving through the attention network.

  2. I removed inputs=and outputs= from the Model(...) statement. Some literature shows that we can just drop our inputs and outputs into the function. The formated statement by the notebook was producing errors, do we need to report this to the course?

The architecture here is pretty complicated and different in at least one respect from the earlier models we have seen that do some form of language translation. They explain in the earlier part of the notebook the difference. E.g. take a look at that they say in the section titled:

Each time step does not use predictions from the previous time step

I did not have a problem using the model statement as written, so there must be something still wrong in your code if you had to take the actions you described in order to pass.