Why is the translation result from English to Portuguese is very strange?

I have completed an assignment on Neural Machine Translation using LSTM networks with attention. And the result when translating from English to Portuguese is very strange.

For example: “i love languages” into Portuguese gives a very strange result, the result is:

With temperature = 0.0:

"preferir instinto vento instinto enviadas terminassem amado empresta gorda este ovo encontrasse apressemse apressemse apressemse pensar moda moda falir oh idoso idoso maravilhosa ultravioleta ultravioleta transfusao transfusao nervoso nervoso nervoso nervoso nervoso nervoso juiz roubaram produtor estudem privadas lhes transmitir asiaticos transmitir produtor ajudemme asa pulso asiaticos asiaticos era era"

With temperature = 0.7:

"acreditam vera baralho arregacou procurei trocadilho prece gentileza europeus sarcastica comecara convencida insatisfeito detergente cala audiolivro reconhecendo australianos abri tradutor morcego conversamos quintal fornecido estando coberta lavar suicidou morte polen afirmacao vizinha porao einstein vulcoes aconteca indios adoravam esquecerao atingiu es prantos aparencia fechou terei excluir hemisferios mendigos viver cancelada" 

Two result above is wrong and so long instead of the short english sentence input. And the above result has nothing to do with the english input data.

And when I check it on google translate, the real result is very short and give the true semantic: “eu amo línguas”

1 Like

Hi @trungsanglong25

This means that you completed the Assignment with wrong code which grading and unit tests failed to catch. Exercises 3,4,5 should be ones you should look at again.

Cheers

1 Like

Is it also perhaps because this model is trained on much smaller dataset than the google translate model?

I doubt it @gent.spah . Translating 3 words should not output 50, something’s wrong with OP’s implementations.

Thats true. I didnt notice that.

Hi @gent.spah @arvyzukai,

Finally I success to solve this problem, but I still have the last question about the state in the Decoder class, when I set the state equal to hidden state, the code in the translate function will run, but when I set the state like:

state = [hidden_state, cell_state]

it will got error:

Cell In[29], line 16, in generate_next_token(decoder, context, next_token, done, state, temperature)
      2 """Generates the next token in the sequence
      3 
      4 Args:
   (...)
     13     tuple(tf.Tensor, np.float, list[tf.Tensor, tf.Tensor], bool): The next token, log prob of said token, hidden state of LSTM and if translation is done
     14 """
     15 # Get the logits and state from the decoder
---> 16 logits, state = decoder(context, next_token, state=state, return_state=True)
     18 # Trim the intermediate dimension 
     19 logits = logits[:, -1, :]

File /usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

Cell In[21], line 66, in Decoder.call(self, context, target, state, return_state)
     60 x = self.embedding(target)
     62 # Pass the embedded input into the pre attention LSTM
     63 # Hints:
     64 # - The LSTM you defined earlier should return the output alongside the state (made up of two tensors)
     65 # - Pass in the state to the LSTM (needed for inference)
---> 66 x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=state)
     67 state = [hidden_state, cell_state]
     68 # Perform cross attention between the context and the output of the LSTM (in that order)

InvalidArgumentError: Exception encountered when calling layer 'lstm_15' (type LSTM).

{{function_node __wrapped__CudnnRNNV3_device_/job:localhost/replica:0/task:0/device:GPU:0}} RNN input_h must be a 3-D vector.
	 [[{{node CudnnRNNV3}}]] [Op:CudnnRNNV3]

Call arguments received by layer 'lstm_15' (type LSTM):
  • inputs=tf.Tensor(shape=(1, 1, 256), dtype=float32)
  • mask=tf.Tensor(shape=(1, 1), dtype=bool)
  • training=None
  • initial_state=[['tf.Tensor(shape=(1, 256), dtype=float32)', 'tf.Tensor(shape=(1, 256), dtype=float32)'], 'tf.Tensor(shape=(1, 256), dtype=float32)']

Hi @trungsanglong25

You get the error because the way the whole code is set up, in addition you should not change the part outside ### END CODE HERE ###:

as you can probably see the problem is that in your modification your state variable starts to contain both the hidden state and cell state, while your code returns cell states in a list once again.

Also, note that sharing large portions of your code is against the rules, so please remove the first part (your Decoder class implementation).

Cheers

1 Like

Oh nice, thanks so much for your reply, I also remove the Decoder implement.

Thank you again.

1 Like