Hey @AntonioMaher,
This is interesting indeed, I never really focused on this summary myself being a learner, so, let’s see to what extent I am able to explain this. Let’s first talk about the output of model.summary()
. Let me paste a shortened version of the same here, so that, both of us can easily reference that from timetotime.
Summary
Layer (type) 
Output Shape 
Params 
Connected to 
input_1 (InputLayer) 
[(None, 30, 90)] 
0 

tf.operators.getitem (Slici 
(None, 90) 
0 
input_1[0][0] 
reshape (Reshape) 
(None, 1, 90) 
0 
tf.operators.getitem[0][0] tf.operators.getitem_1[0][0] tf.operators.getitem_2[0][0] … tf.operators.getitem_28[0][0] tf.operators.getitem_29[0][0] 
a0 (InputLayer) 
[(None, 64)] 
0 

c0 (InputLayer) 
[(None, 64)] 
0 

tf.operators.getitem_1 (Sli 
(None, 90) 
0 
input_1[0][0] 
lstm (LSTM) 
[(None, 64), 
(None, 39680 
reshape[0][0] a0[0][0] c0[0][0] reshape[1][0] lstm[0][0] lstm[0][2] … reshape[29][0] lstm[28][0] lstm[28][2] 
tf.operators.getitem_2 (Sli 
(None, 90) 
0 
input_1[0][0] 
tf.operators.getitem_3 (Sli 
(None, 90) 
0 
input_1[0][0] 
… 
… 
… 
… 
tf.operators.getitem_28 (Sli 
(None, 90) 
0 
input_1[0][0] 
tf.operators.getitem_29 (Sli 
(None, 90) 
0 
input_1[0][0] 
dense (Dense) 
(None, 90) 
5850 
lstm[0][0] lstm[1][0] … lstm[28][0] lstm[29][0] 
Now, let’s try to understand this. This summary is based on the djmodel
function. The first layer here, i.e., input_1
denotes the first input layer in the function (X), which is having the shape as (Tx, n_values)
. Here, n_values = 90
and Tx = 30
, hence, the input shape of this layer is (None, 30, 90)
, where None
indicates the batchsize. Also, since this is the first layer, hence, this layer doesn’t receive input from any other layer.
The next layer, i.e., tf.operators.getitem (Slici
is for the step 2.A, where we select the tth time step vector from X, and hence, the shape of the output for this layer is (None, 90)
, since there are 30 timesteps, and one timestep has 90 musical values. Once again, None
indicates the batchsize here. Also, this layer is connected to the input_1
layer, and hence, the value in the Connected to field.
The next layer is the reshape (Reshape)
layer, for the step 2.B. I guess the input and output shapes are pretty trivial. Let’s focus on the Connected to field, since this is pretty interesting. As you can see that for each of the timesteps, we use a reshape
layer. So, tensorflow simply shares this layer for all the timesteps, and hence, this layer gets input from all the 30 (for the 30 timesteps) tf.operators.getitem
layers, and hence, the current Connected to values.
Now, you must be wondering, why haven’t we shared the tf.operators.getitem
layers then, and instead created 30 different layers. If I am not wrong, for all Tensorflow operators (irrespective of whether they are same or different), Tensorflow creates a different layer depending on the number of times the operator is called. And in this case, the operator is getitem
. You can easily validate this by replacing x = X[:, t, :]
with x = X[:, 0, :]
. In this case, the operator will perform 30 times the same operation, still Tensorflow creates 30 different operator layers.
And I guess by now, you have get the gist as to how to read the model summary. Using your understanding, you can clearly reason about the entire model.summary()
and inference_model.summary()
.
P.S.  Since this post is getting too big, let me divide my post into 3 replies.
Cheers,
Elemento