[Week 1] [Assignment 3] Conceptual understanding of model architecture

There are some points that I have not been able to develop understanding about, please help:-

  1. In loading the tensors, when the instruction says-

Notice that the data in Y is reordered to be dimension (𝑇𝑦,π‘š,90), where 𝑇𝑦=𝑇π‘₯. This format makes it more convenient to feed into the LSTM later.

I still dont understand why this is important and what doest it achieve?

  1. After the creating the model as such, for loss calculation the instruction states-

You’ll turn Y into a list, since the cost function expects Y to be provided in this format.

  • list(Y) is a list with 30 items, where each of the list items is of shape (60,90).

Where 60 is the batch size. So have we created a model that outputs sequences for 60 batches? How does this work at the time of inference?

I am thoroughly confused. Please help

Hey @Vashishtha_Vidyarthi,
Welcome to the community, and apologies for the delayed response. The answer to your first query lies in the way, we have coded our djmodel function, so let’s examine it a bit. I have attached the screenshot below for your reference.

Screenshot from 2022-05-10 20-47-38

In the function, we are iterating over the time steps, i.e., Tx, and feeding in x having dimensions (m, 90) at every time step. Now, for this input, we will get the output’s dimension as (m, 90) as well, which is simply appended to the outputs list. So, ultimately the dimension of the outputs variable would be (Tx, m, 90) or (Ty, m, 90), since Ty = Tx. Note here that outputs represents the predicted results, while Y represents the actual results. So, when we keep the dimensions of Y same as that of outputs, which is (Ty, m, 90), it’s easier for any loss function to compute the loss and backpropagate, without the need of any re-ordering.

As for your second query, the answer is pretty straightforward. In your example, time-steps = 30, batch-size = 60, and #distinct music values = 90, and hence the dimensions of Y are (30, 60, 90). At the inference time, considering you want to get the output for each example individually, the dimensions of Y would be (30, 1, 90), where batch-size = 1. I hope this helps.

Regards,
Elemento