Can I check my understanding of the two models we built in Jazz Solo?
Explain like I’m a 5 yr old
DJModel is similar to the dinosaur character level model in which we can predict the next character or musical value in this case. We take the learnings (model weights and biases) in this case and in the inference model, we’re using a single note to predict a series of notes?
When we go to test the model (in training), we don’t get the entire input sequence, but rather get 1 at a time. We will generate them instead, one at a time using x = y to predict the next note using the previously predicted value. We also are using globally shared weights so taht the learnings can be shared across models (3 precreated layer functions)
You posted the question under wrong Specialization/Course (Natural Language Processing). I’m not sure which Course is your question from so please edit the topic yourself.
From my understanding of the DJModel, you’re mostly right.
During training, we are given the entire “target”/“real” musical sequence (all the time steps). At each time step, we input a “target” note from the training sample, which should output the next “predicted” note. The next predicted note is compared with the next target note, and the loss is computed from that (summing over all the comparisons).
During generation, we aren’t given any inputs to start with (except the model, with the globally shared weights). As you mentioned, the sequence is generated one step at a time, using the previously predicted value (plus the hidden state from the LSTM).