In the jazz assignment, there’s the paragraph below saying that the Section 3 is using the trained weight of Section 2. I don’t see how music_inference_model (Section 3) is using the djmodel (Section 2)?
LSTM_cell = LSTM(n_a, return_state = True) was added to ‘reset’ the LSTM cell before calling inference_model = music_inference_model(LSTM_cell, densor, Ty = 50)
And what do “global shared layers” mean?
Why can’t we just use the djmodel directly for inference?
Full text from the assignment:
In Section 2, you’re going to train a model that predicts the next note in a style similar to the jazz music it’s trained on. The training is contained in the weights and biases of the model.
Then, in Section 3, you’re going to use those weights and biases in a new model that predicts a series of notes, and using the previous note to predict the next note.
The weights and biases are transferred to the new model using the global shared layers (LSTM_cell, densor, reshaper) described below
I assume that global variables LSTM_cell and densor for Keras layers are passed by reference to djmodel. Thus when the model is trained, they retain all trained parameters, and we used them later on as “trained” layers (I am not sure if this is correct terminology).
But this is not explained well, and it would be good if one of the mentors would respond to this question.
LSTM_cell = LSTM(n_a, return_state = True) was added to ‘reset’ the LSTM cell before calling inference_model = music_inference_model(LSTM_cell, densor, Ty = 50)
This should be a kind of workaround during debugging, and should be removed once you complete a coding, simply because all learned parameters will be reset by this.
I explained the cause of a trouble during debugging is in this thread.
As answers to original questions;
And what do “global shared layers” mean?
These are “core” layers used by both “djmodel” and “music_inference_model”, and are “LSTM_Cell” and “densor”. Those core layers are defined as a global variables to be shared with two models. Those are trained in “djmodel” and used for inference in “music_inference_model”.
Why can’t we just use the djmodel directly for inference?
A simple answer is, “djmodel” is not designed for inference. It focuses on the training with using same data X for input and output. For output, actually, we create Y which is one step time shift of X, and used as labels. Once, we train the core layers, then, we slightly changed input/output for inference, i.e., using the output from the previous step as the input to the next step. Again, the core layers are “trained” “LSTM_cell” and “densor”. In this sense, I wrote resetting “LSTM_cell” is just for debugging. We need to use trained core layers for inference.