Explanation of LSTM_cell call

Maybe this is trivial but in of week 1 assignment 3 UNQ_C1 djmodel, there’s a call to LSTM cell which is
a, _, c = LSTM_cell(inputs=x, initial_state=[a, c])

Here I don’t understand why the output should be a, something maybe unimportant and c. Took a look at examples in the documentation but I see they often write:
outputs = lstm_cell(inputs)

I tried outputs = LSTM_cell(inputs=x, initial_state=[a, c]) and got a list of many Tensors, but still have no idea what they are exactly.

Can somebody explain what this _ part is or give me a pointer?

a = output for current step
c = hidden state

Does this help?

The _ is a placeholder for a return parameter that is ignored.

When you call a function that returns multiple return values in python, you have two choices:

  1. You can specify the list of return values.
  2. You can give a single return value.

In some other languages, e.g. MATLAB, using option 2 just gives you the first return value and discards the rest. Python is different: it converts all the output values into a single “tuple” and returns that. The thing to note for option 1) is that in that case your list needs to match exactly the number of return values or it just throws an error.

But if you want to understand what the output values are, you need to understand the definition of the function, right? Here’s the definition from earlier in the notebook:

LSTM_cell = LSTM(n_a, return_state = True)

So you need to look at the documentation of tf.keras.layers.LSTM. That is a subclass of “Layer” in Keras, so calling it like that returns a function.

Here’s what the documentation says about the inputs to that function.

For the output, you have to dig around a bit more. It turns out LSTM inherits from a number of parent classes. The one that matters for the output question is RNN. Here’s a good page to read to understand more about all this.

Thank you for your helpful comment. I reread the documentations you linked and found out that when we set argument return_state = True as stated in the programming exercise, a call to LSTM function returns the output, hidden state and cell state respectively. Here’s an example from the documentation:

output, state_h, state_c = layers.LSTM(64, return_state=True, name="encoder")(encoder_embedded)

Compare it to code in the programming notebook:

a, _, c = LSTM_cell(inputs=x, initial_state=[a, c])

we can see that hidden state a is considered as the output, _ as the hidden state state_h. My question is why we take a as the output when it should be hidden state state_h? E.i. why the code in the programming exercise is not:

output, a, c = LSTM_cell(inputs=x, initial_state=[a, c])

and then we pass output to the densor() function to get the final output with the wanted shape?

In the programming exercise, we consider a as output and pass it to the densor() function, which is quite confusing as it’s the hidden state, not the output.

This is a good question. I agree that based on the documentation it looks like what they are doing is not what I would have expected. It certainly deserves an explanation. Unfortunately I do not have an answer at this point and it will take some effort to figure this out. My life is pretty complicated right now and for the next couple of weeks, so I will not have the kind of concentrated quality thinking time that would be required to get further. Sorry, I would like to understand better both from a personal learning perspective and also to make sure that this is not a bug in the course. Also please realize that the mentors are just fellow student volunteers. So a) we do not get paid to do this and b) there is no guarantee we actually know all the answers. :nerd_face:

Maybe we’ll get lucky and someone else will stop by who does know more. Or if you have the time and motivation to dig deeper on this, please share anything more that you learn.

1 Like

buddy, did you find out the answer?