Week3 Programming Exercise, Section 3.3 Train the Model

In the Model function, when they use forward propagation (see code below), why are they transposing minibatch_X?

Z3 = forward_propagation(tf.transpose(minibatch_X), parameters)

Isn’t the shape of minibatch_X equal to (12288, minibatch_size) which is what the forward_propagation function would expect?

1 Like

Hi @Manas_Rastogi ,

The reason transpose(minibatch_X) is need because the shape of minibatfh_X is (examples, 12288). You can verify this by putting a print statement just before calling the forward_propagation() function.

In order to carry out tf.linalg.matmul(W1,X) which is a dot product operation, the number of elements in row has to match the number of elements in each column. If you referred back to where W1 is created, its shape is (nodes, 12288).

1 Like

Hi @Kic , thank you for your response. Somehow I was thinking I couldn’t modify the code for non-graded functions and hence didn’t debug using print :slight_smile: .

So, I understand that the shape of minibatch_X is (examples, 12288). But we know that the shape of X_train is (12288, examples), which means there is a transpose happening. Which function is causing that?

I added three print statements to understand more. Below is the code

**print(X_train)**

dataset = tf.data.Dataset.zip((X_train, Y_train))

**print(dataset)**

minibatches = dataset.batch(minibatch_size).prefetch(8)

**print(minibatches)**

And here is the output of the three print statements:

Now two questions:

  1. Does .batch function does a transpose as well? (because that’s what it seems to be doing)
  2. Why don’t the shape in any of the three outputs show the number of examples. The shape seems to be a 1-D vector. Why is that? Shouldn’t it be 2 D tensor

Appreciate your help

1 Like

Sorry, but that’s not true. Take a look at the earlier code in the notebook. What you’ll see is that the image files (both train and test) are images in the usual orientation where the dimensions are:

samples x height x width x channels

So “samples” is the first dimension. Then we process those datasets by normalizing them using the “map()” method of the Dataset class, and the normalization function also converts them from (64, 64, 3) arrays to 12288 length vectors, but the “samples” dimension is not changed by that operation. What you see in the MapDataset and ZipDataset shapes is the shape of one element of the dataset, which is a 1D vector of dimension (12288,), but that tells you nothing about the “samples” dimension.

Then you see it in the PrefetchDataset shape where the “None” is the first dimension representing the samples.

But Prof Ng has us write the forward propagation code using the same orientation we’ve been using all the way from Course 1 Week 2 to now, which is features x samples. I assume he does that just to avoid having to explain it or confuse us. But then when it comes time to call the TF loss function, we have to deal with the fact that it expects the samples dimension to be first.

So what we do is leave everything with “samples” first and then only do the transpose when we call forward_propagation.

1 Like

That’s just the way TF works: it treats everything as part of the “compute graph”, so we know the shapes of some of the dimensions (height, width, channels), but some dimensions (like “samples”) are not known until we actually execute the compute graph with a particular input. Meaning that the code can handle any value of that dimension. That’s what it’s saying by showing that dimension as either “None” (meaning a placeholder) or not even there in the case it’s dealing with one sample.

In this case the input has two dimensions “samples” and “features” and the “samples” are not known. When it shows you the dimension of one element, then it’s a 1D tensor (as in the MapDataset and ZipDataset cases), but it becomes a 2D tensor with an unknown first dimension when we get to logic that handles a minibatch.

1 Like

Ah! that is why in his ‘forward_propagation_test’ function, he needs to do a transpose as follows because our forward propagation assumes features x samples:

1 Like

wow, I understand now, thanks so much Paul!

But this sure was a tough one to grasp!. I guess would have to read the TF documentation in detail to really be on top of it.

1 Like

Don’t feel bad about being new to TensorFlow. It is very deep waters and quite complicated. We’re just barely getting introduced to it here and there is plenty more to learn. We’ll be using TF a lot more in Course 4 and Course 5 of this series.

It is a great idea to spend some time with the TF documentation. They have a lot of layers of documentation including some nice tutorials down to the details of each of the APIs. The other thing to realize is that this is also our first time doing OOP (Object Oriented Programming) in python. The classes in TF are very rich and inherit from lots of higher layers. When you look at the documentation for a specific API, they frequently don’t document all the attributes and methods that it inherits from the higher layers.

1 Like

thank you Paul for your guidance, appreciate it!

1 Like