Sequence Models Week 1 - Questions on Assignments 2 and 3

Hello, I have a couple questions on Assignments 2 and 3 in Week 1 of Sequence Models.

  1. Why does Assignment 2 (Dinosaur name generation) use a basic RNN cell whereas Assignment 3 (Music Generation) uses an LSTM cell?

  2. Dinosaur name generation makes sense to me, as it uses x to predict y=x<t+1>. However, I am confused by music generation. I thought it would be the same case but instead the notebook says that Y is the same as X but shifted one to the left (aka the past). Why would we be using xt to predict xt-1?

  3. I also don’t understand the how the model works in the Music Generation assignment. djmodel runs all of the time steps in X to generate a list of outputs (I believe that this model’s parameters are not updated). And then these outputs are used in another model which is then trained. I have a couple confusions here:

  • Why do we have a second model here instead of just training djmodel?

  • What even are the outputs generated by djmodel that are used in the second model that is actually trained? I’m not sure how the second model learns anything useful.

  • How does the second model know the model architecture if it only is given the inputs and outputs? (I think this is more of a confusion around the how the Functional API model works)

I would really appreciate any help here. Thank you very much!

Which type of network you use just depends on what works for a given situation. RNNs are cheaper to train (fewer parameters), so you try that first. If it works well enough, then you’re done. If it doesn’t give you good enough results, then you try a richer and more powerful architecture like GRU or LSTM.

For question 2, I think you’re just misinterpreting what they are saying. The music generation works the same way as name generation in the sense that the future notes are the model’s prediction based on the past notes.

We do train djmodel and the output is called just model. Then what happens after that gets a little complicated, so we have to track carefully how the inference model is constructed. Note that it takes the LSTM_cell and densor that are components of djmodel and were trained by that previous training of model and then uses those trained functions as input to the inference model which then calls them to make predictions. There is no training involved in the section of the notebook that defines and runs the inference function, right? The training happened earlier, as I described above.