W1A3 jazz djmodel vs inference model

yc2984 · November 19, 2021, 9:50am

In the jazz assignment, there’s the paragraph below saying that the Section 3 is using the trained weight of Section 2. I don’t see how music_inference_model (Section 3) is using the djmodel (Section 2)?

Due to this issue: Course 5: Week 1: Music inference model (LSTM) - #23 by artgreer

LSTM_cell = LSTM(n_a, return_state = True) was added to ‘reset’ the LSTM cell before calling inference_model = music_inference_model(LSTM_cell, densor, Ty = 50)

And what do “global shared layers” mean?
Why can’t we just use the djmodel directly for inference?
Full text from the assignment:

In Section 2, you’re going to train a model that predicts the next note in a style similar to the jazz music it’s trained on. The training is contained in the weights and biases of the model.

Then, in Section 3, you’re going to use those weights and biases in a new model that predicts a series of notes, and using the previous note to predict the next note.

The weights and biases are transferred to the new model using the global shared layers (LSTM_cell, densor, reshaper) described below

Thanks so much!

yc2984 · November 30, 2021, 7:43am

@paulinpaloalto @XpRienzo can you please have a look at this question? Thanks so much!

Milan_Adamovic · July 25, 2022, 8:09pm

I assume that global variables LSTM_cell and densor for Keras layers are passed by reference to djmodel. Thus when the model is trained, they retain all trained parameters, and we used them later on as “trained” layers (I am not sure if this is correct terminology).
But this is not explained well, and it would be good if one of the mentors would respond to this question.

anon57530071 · July 26, 2022, 1:33am

LSTM_cell = LSTM(n_a, return_state = True) was added to ‘reset’ the LSTM cell before calling inference_model = music_inference_model(LSTM_cell, densor, Ty = 50)

This should be a kind of workaround during debugging, and should be removed once you complete a coding, simply because all learned parameters will be reset by this.

I explained the cause of a trouble during debugging is in this thread.

As answers to original questions;

And what do “global shared layers” mean?

These are “core” layers used by both “djmodel” and “music_inference_model”, and are “LSTM_Cell” and “densor”. Those core layers are defined as a global variables to be shared with two models. Those are trained in “djmodel” and used for inference in “music_inference_model”.

Why can’t we just use the djmodel directly for inference?

A simple answer is, “djmodel” is not designed for inference. It focuses on the training with using same data X for input and output. For output, actually, we create Y which is one step time shift of X, and used as labels. Once, we train the core layers, then, we slightly changed input/output for inference, i.e., using the output from the previous step as the input to the next step. Again, the core layers are “trained” “LSTM_cell” and “densor”. In this sense, I wrote resetting “LSTM_cell” is just for debugging. We need to use trained core layers for inference.

Hope this helps.

Topic		Replies	Views
The two models in the Jazz assignment Sequence Models coursera-platform	4	642	June 23, 2021
C5-W1-A3: Idiomatic usage of tf.keras.layers.LSTM vs LSTMCell Sequence Models coursera-platform	2	605	January 10, 2024
Week 1 music_inference_model Sequence Models coursera-platform	4	509	January 20, 2023
C5 W1 - Jazz Improv HW - What is the relationship between Exercise 2 and 3? Sequence Models coursera-platform	10	568	September 20, 2021
Week 1 assignment 3 Sequence Models coursera-platform	1	515	September 27, 2022

W1A3 jazz djmodel vs inference model

Related topics