Week 1 Assignment 3 Exercise 2 - Why?

Alexander_Valarus · November 11, 2022, 2:49pm

Exercise 1 of this assignment creates “djmodel”. In exercise 2, we create “music_inference_model.” But by the end of step 2 in exercise 2, we’ve already generated the outputs we want. Why do we build a second inference model in step 3 of that exercise when we already have the trained model that can generate the sequence we want? Why not just return the outputs and post-process them instead of using them to compile a second model?

If other words, why don’t we change the end of the music_inference_model function from this

# Step 3: Create model instance with the correct "inputs" and "outputs" (≈1 line)
    inference_model = Model(inputs=[//snip//], outputs=//snip//)
    
    ### END CODE HERE ###
    
    return inference_model

to this

return outputs

Then just post-process the returned outputs instead of instantiating a new model and making yet another function to sample from it?

paulinpaloalto · November 11, 2022, 5:22pm

It is the difference between defining a function and invoking a function. What are the inputs to music_inference_model? They are functions not data, right? Where does the input data come from? It’s just defined in terms of its shape. outputs there is just a formal parameter. You’re defining a compute graph, which you then execute with actual input data in the next section.

Alexander_Valarus · November 11, 2022, 6:26pm

I suppose my confusion is that I don’t understand why defining that second compute graph is necessary instead of just defining one that can do what we want and directly running inference on it. The assignment’s way of defining and training one model then ignoring it and making a second similar one that will actually be used for inference seemed like a very roundabout way of solving the task.

Juan_Olano · November 11, 2022, 7:21pm

H @Alexander_Valarus , I have reviewed the assignment, specifically Exercises 1 and 2.

I’d like to convey what @paulinpaloalto has very well explained, in my own words.

In Ex. 1 we create a function djModel that is in charge of creating a model that will predict the next note, based on the previous note. We train this model and, after training, the model stores the weights and biases that are transferred to the new model using the global shared layers (LSTM_cell , densor , reshaper ).

In Ex. 2 we create another function called music_inference_model that returns another model that is in charge of creating a sequence of notes based on the parameters learned from the previous model.

Remember: the objective of these functions is to return a model. They are not the models per-se, but instead they are the ‘creators’ of the models

When you call the functions, you pass them some parameters, and then the function returns a model. For example:

new_inference_model = music_inference_model(LSTM_cell, densor, Ty = 50)

We are calling the ‘music_inference_model’ function, which creates a model, and ‘stores’ this model in the variable ‘new_inference_model’. Now you have a model in this variable, and you can train it and so on.

So at no point we are creating a model, then creating another model while ignoring the supposedly previously created model.

I hope this sheds some light on your question.

Juan

Alexander_Valarus · November 11, 2022, 7:59pm

I appreciate the time you’ve both taken to respond thus far, but I’m still a little confused.

…at no point we are creating a model, then creating another model while
ignoring the supposedly previously created model.

Code cell 6 defines djmodel(). Code cell 7 calls the djmodel() function, which instantiates a model, and assigns the reference to that model in the variable called model.

model = djmodel(Tx=30, LSTM_cell=LSTM_cell, densor=densor, reshaper=reshaper)

Code cells 10 through 12 compile and train model.

Then in exercise two (code cell 14) we define music_inference_model(), a function. In code cell 15, we call that function, instantiating a [second] model, and storing the reference to that newly instantiated model in the variable called inference_model.

inference_model = music_inference_model(LSTM_cell, densor, Ty = 50)

By my count, we now have instantiated two models. First we instantiated and trained model which was built and returned by the djmodel() function, then we instantiated and did not train, but used for inference, inference model, which was built and returned by the music_inference_model() function.

I’m asking why we bothered to train the weights in one model, model, then took those globally shared layers and used them to instantiate a second slightly different model, inference_model?

In my head, it seems like after training the initial model, we could have immediately done something like this (in pseudo-code):

x = [initial_value]
ty = desired length
for t in range(ty):
    predicted_next = model.predict(x)
    process the prediction appropriately
    x.append(predicted_next)

We give some initial value, have the model predict the next output, append that to the list of values, and just loop it until we have the number of musical values that we want. Where have I gone wrong? Is the ability to specify a new sequence length the only reason why we used music_inference_model() to get a second model? Is needing to pre-specify the length prior to instantiating the model a limiting factor that requires this longer workaround?

Topic		Replies	Views
W1 A3: We trained a model in Part 2, music_inference_model() in Part 3 creates "outputs" , why create a Model and do inference then? Sequence Models coursera-platform	2	763	January 10, 2022
Week 1 music_inference_model Sequence Models coursera-platform	4	509	January 20, 2023
C5 W1 A3 E2 >>> What does the 'inference_model' line do? Sequence Models coursera-platform	1	509	February 18, 2022
Week 1 assignment 3 Sequence Models coursera-platform	1	515	September 27, 2022
Week 1 - Assignment 3: Differences between djmodel and music_inference_model? Sequence Models coursera-platform	5	539	November 24, 2022

Week 1 Assignment 3 Exercise 2 - Why?

Related topics