What exactly is Y in C5 Week1, Jazz assignment

I probably missed something in the notebook, but what exactly is Y? I know it is a one-hot vector.

What are we predicting in the LSTM?

Hey @skyrockets_21,
Welcome to the community. Y has been introduced in the very beginning of the assignment itself. Let me highlight it for you here.


Y: a (Ty,m,90) dimensional array

  • This is essentially the same as X, but shifted one step to the left (to the past).
  • Notice that the data in Y is reordered to be dimension (Ty,m,90), where Ty=Tx. This format makes it more convenient to feed into the LSTM later.
  • Similar to the dinosaur assignment, you’re using the previous values to predict the next value.
    • So your sequence model will try to predict y⟨t⟩ given x⟨1⟩,…,x⟨t⟩.

So, we are essentially predicting the musical values, which we are hoping to compose as Jazz pieces. Let me know if this helps. If you have any further queries, feel free to let us know.

Regards,
Elemento

Thanks Elemento. I am aware of what you copied from the notebook, perhaps let me rephrase the question:

what exactly does Y mean? i.e. what does each label stand for? a particular musical value?

I suggest more explanations be added to the jupyter notebook to give the user better context.

I think @Elemento posted the right description about Y. But, as you see, it is created from X. So, we need to start from X, of course.

  • X: This is an (m, 𝑇𝑥, 90) dimensional array.
    • You have m training examples, each of which is a snippet of 𝑇𝑥=30 musical values.
    • At each time step, the input is one of 90 different possible values, represented as a one-hot vector.

So, this is the starting point. If we want to really understand what it is, we need to understand MIDI format and MUSIC21 format to see how a music score can be written. And, this is what this exercise skipped like below.

What are musical “values”? (optional)

You can informally think of each “value” as a note, which comprises a pitch and duration. For example, if you press down a specific piano key for 0.5 seconds, then you have just played a note. In music theory, a “value” is actually more complicated than this – specifically, it also captures the information needed to play multiple notes at the same time. For example, when playing a music piece, you might press down two piano keys at the same time (playing multiple notes at the same time generates what’s called a “chord”). But you don’t need to worry about the details of music theory for this assignment.

I also do not want to dig in detail about MIDI and MUSIC21, I want focus on how MUSIC21 output can be transformed to usable values for us, i.e, machine learning.

First, we have “indices_values” as output from load_music_utils(). This is quite important Python dictionary. It includes a mapping between an index value to MUSIC21 encoded value. Each MUSUC21 value represents scale, interval, and so on. Here are some entries in “indices_values”.

{0: ‘S,0.250,<M-2,m-6>’,
1: ‘S,0.667,<d5,P1>’,
2: ‘R,0.250’,
3: ‘C,0.250,<M3,d-3>’,
4: ‘C,0.500,<m6,M2>’,
5: ‘S,0.250,<P1,d-5>’,
6: ‘A,0.250,<M2,d-4>’,
7: ‘S,0.667,<m3,m-3>’,
8: ‘S,0.500,<m7,M3>’,
9: ‘R,0.167’,
10: ‘C,0.250,<M-2,m-6>’,
11: ‘C,0.250’,

So, essentially, X is “one hot encoding” of MUSIC21 object, which represents a particular part of a music score.
As X’s shape is (m, Tx, n_value), let’s extract one sample which has 30 sec, and covert back one-hot-encoding to a value by argmax.

Now you see what is X.
Then, Y is just one step time shift of X as a label. In this exercise, for latter use, some dimensions are exchanged, but, the meaning of Y is unchanged.

Hope this helps.

1 Like

Thank you so much Nobu! It is super helpful and everything makes sense to me now!