Week 1 Assignment 3 Exercise 2 - Why?

Exercise 1 of this assignment creates “djmodel”. In exercise 2, we create “music_inference_model.” But by the end of step 2 in exercise 2, we’ve already generated the outputs we want. Why do we build a second inference model in step 3 of that exercise when we already have the trained model that can generate the sequence we want? Why not just return the outputs and post-process them instead of using them to compile a second model?

If other words, why don’t we change the end of the music_inference_model function from this

# Step 3: Create model instance with the correct "inputs" and "outputs" (≈1 line)
    inference_model = Model(inputs=[//snip//], outputs=//snip//)
    
    ### END CODE HERE ###
    
    return inference_model

to this

return outputs

Then just post-process the returned outputs instead of instantiating a new model and making yet another function to sample from it?

It is the difference between defining a function and invoking a function. What are the inputs to music_inference_model? They are functions not data, right? Where does the input data come from? It’s just defined in terms of its shape. outputs there is just a formal parameter. You’re defining a compute graph, which you then execute with actual input data in the next section.

I suppose my confusion is that I don’t understand why defining that second compute graph is necessary instead of just defining one that can do what we want and directly running inference on it. The assignment’s way of defining and training one model then ignoring it and making a second similar one that will actually be used for inference seemed like a very roundabout way of solving the task.

H @Alexander_Valarus , I have reviewed the assignment, specifically Exercises 1 and 2.

I’d like to convey what @paulinpaloalto has very well explained, in my own words.

In Ex. 1 we create a function djModel that is in charge of creating a model that will predict the next note, based on the previous note. We train this model and, after training, the model stores the weights and biases that are transferred to the new model using the global shared layers (LSTM_cell , densor , reshaper ).

In Ex. 2 we create another function called music_inference_model that returns another model that is in charge of creating a sequence of notes based on the parameters learned from the previous model.

Remember: the objective of these functions is to return a model. They are not the models per-se, but instead they are the ‘creators’ of the models

When you call the functions, you pass them some parameters, and then the function returns a model. For example:

new_inference_model = music_inference_model(LSTM_cell, densor, Ty = 50)

We are calling the ‘music_inference_model’ function, which creates a model, and ‘stores’ this model in the variable ‘new_inference_model’. Now you have a model in this variable, and you can train it and so on.

So at no point we are creating a model, then creating another model while ignoring the supposedly previously created model.

I hope this sheds some light on your question.

Juan

I appreciate the time you’ve both taken to respond thus far, but I’m still a little confused.

…at no point we are creating a model, then creating another model while
ignoring the supposedly previously created model.

Code cell 6 defines djmodel(). Code cell 7 calls the djmodel() function, which instantiates a model, and assigns the reference to that model in the variable called model.

model = djmodel(Tx=30, LSTM_cell=LSTM_cell, densor=densor, reshaper=reshaper)

Code cells 10 through 12 compile and train model.

Then in exercise two (code cell 14) we define music_inference_model(), a function. In code cell 15, we call that function, instantiating a [second] model, and storing the reference to that newly instantiated model in the variable called inference_model.

inference_model = music_inference_model(LSTM_cell, densor, Ty = 50)

By my count, we now have instantiated two models. First we instantiated and trained model which was built and returned by the djmodel() function, then we instantiated and did not train, but used for inference, inference model, which was built and returned by the music_inference_model() function.

I’m asking why we bothered to train the weights in one model, model, then took those globally shared layers and used them to instantiate a second slightly different model, inference_model?

In my head, it seems like after training the initial model, we could have immediately done something like this (in pseudo-code):

x = [initial_value]
ty = desired length
for t in range(ty):
    predicted_next = model.predict(x)
    process the prediction appropriately
    x.append(predicted_next)

We give some initial value, have the model predict the next output, append that to the list of values, and just loop it until we have the number of musical values that we want. Where have I gone wrong? Is the ability to specify a new sequence length the only reason why we used music_inference_model() to get a second model? Is needing to pre-specify the length prior to instantiating the model a limiting factor that requires this longer workaround?