Jazz improvisation - Summary understanding et al

Dear all,

I have completed all programming assignments of W1 successfully, but have a serious of doubts.

I would appreciate clarification on the output of model.summary() and inference_model.summary(). While CNN’s were clear for LSTM they are not.

Additionally, what is the difference between tf.one_hot and ‘tf.keras.utils.to_categorical’. Why sometimes use one and other times use another?

Equally, why use tf.math.argmax vs np.argmax?

Additional basic doubts:

  1. Example of a tensor with shape (3, None) and
  2. Example of a tensor with shape (None, 3).
  3. I don’t understand why in exercise 2E, output of the one-hot enconding has shape=(None, 90)

Thank you all.

Hey @AntonioMaher,
This is interesting indeed, I never really focused on this summary myself being a learner, so, let’s see to what extent I am able to explain this. Let’s first talk about the output of model.summary(). Let me paste a shortened version of the same here, so that, both of us can easily reference that from time-to-time.


Layer (type) Output Shape Params Connected to
input_1 (InputLayer) [(None, 30, 90)] 0
tf.operators.getitem (Slici (None, 90) 0 input_1[0][0]
reshape (Reshape) (None, 1, 90) 0 tf.operators.getitem[0][0]
tf.operators.getitem_2[0][0] …
a0 (InputLayer) [(None, 64)] 0
c0 (InputLayer) [(None, 64)] 0
tf.operators.getitem_1 (Sli (None, 90) 0 input_1[0][0]
lstm (LSTM) [(None, 64), (None, 39680 reshape[0][0]
lstm[0][2] …
tf.operators.getitem_2 (Sli (None, 90) 0 input_1[0][0]
tf.operators.getitem_3 (Sli (None, 90) 0 input_1[0][0]
tf.operators.getitem_28 (Sli (None, 90) 0 input_1[0][0]
tf.operators.getitem_29 (Sli (None, 90) 0 input_1[0][0]
dense (Dense) (None, 90) 5850 lstm[0][0]
lstm[1][0] …

Now, let’s try to understand this. This summary is based on the djmodel function. The first layer here, i.e., input_1 denotes the first input layer in the function (X), which is having the shape as (Tx, n_values). Here, n_values = 90 and Tx = 30, hence, the input shape of this layer is (None, 30, 90), where None indicates the batch-size. Also, since this is the first layer, hence, this layer doesn’t receive input from any other layer.

The next layer, i.e., tf.operators.getitem (Slici is for the step 2.A, where we select the t-th time step vector from X, and hence, the shape of the output for this layer is (None, 90), since there are 30 time-steps, and one time-step has 90 musical values. Once again, None indicates the batch-size here. Also, this layer is connected to the input_1 layer, and hence, the value in the Connected to field.

The next layer is the reshape (Reshape) layer, for the step 2.B. I guess the input and output shapes are pretty trivial. Let’s focus on the Connected to field, since this is pretty interesting. As you can see that for each of the time-steps, we use a reshape layer. So, tensorflow simply shares this layer for all the time-steps, and hence, this layer gets input from all the 30 (for the 30 time-steps) tf.operators.getitem layers, and hence, the current Connected to values.

Now, you must be wondering, why haven’t we shared the tf.operators.getitem layers then, and instead created 30 different layers. If I am not wrong, for all Tensorflow operators (irrespective of whether they are same or different), Tensorflow creates a different layer depending on the number of times the operator is called. And in this case, the operator is getitem. You can easily validate this by replacing x = X[:, t, :] with x = X[:, 0, :]. In this case, the operator will perform 30 times the same operation, still Tensorflow creates 30 different operator layers.

And I guess by now, you have get the gist as to how to read the model summary. Using your understanding, you can clearly reason about the entire model.summary() and inference_model.summary().

P.S. - Since this post is getting too big, let me divide my post into 3 replies.


As to this, I wasn’t able to find much on the web myself pertaining to this. It seems to be that all the things that can be done with tf.keras.utils.to_categorical, can also be achieved with tf.one_hot. In-fact, tf.one_hot offers some additional features which you can check out here. By additional features, I basically mean the extra arguments such as on_value, off_value, etc. One good way to figure the solution out for this is replace the two in their places, and see if you get any error. If so, then you can easily figure out the difference, and if not, I guess it is safe to assume that we can use tf.one_hot throughout our code, since it supersedes the other. Do share your findings with the community.

I guess this query follows the trend of the previous query, so I leave it up to you to figure out the answer for this. Do share your results with the community. Just to set you off, I notice the difference in the output types of the 2 functions.


As to this, None is basically a placeholder for one or more dimensions, of which you don’t know the exact size. You can read more about it here and here.

Here, I would like to highlight one thing. I have usually seen tensors with None as a placeholder in the first dimension, but not in the rest of them. However, you can create such tensors if you want.

Lastly coming to this, where exactly is the one-hot encoding in 2E? In 2E, we are appending out, which is the output of densor, which is nothing but a Dense layer, with n_values number of output neurons, where n_values = 90. Once you have gone through the previous threads, the shape (None, 90) makes the perfect sense, don’t you think? I hope this helps.