Neural Network_input shape_Batch size vs Features

In this course the shape of a neural network is specified as (features, training example) so when implementing neural network in tensorflow, why does the input shape is taken as (batch_size, features) . Isn’t it inconsistent?

Please follow the conventions followed by the framework you’re using. When it comes to tensorflow, the input_shape parameter should specify the dimension of a single training example. Batch size should not be included as part of the shape information. Upon printing model.summary(), you’ll notice that None is included as the 0th element in the shape dimension which represents the batch dimension.

Yes exactly, the batch size comes later. But shouldn’t it be (features, None) instead of (None, features)?

Batch is the 0th dimension in tensorflow. Shape is therefore [None, features]

Got it, but my question was different. Andrew told us to take the features as rows and training examples as columns, i.e for a dataset of nx features and m training examples, the input layer should be of shape (nx,m) because we are stacking up different training examples column wise.
So when implementing neural network in tensorflow, shouldn’t the input shape be of (features, training example per batch)? Therefore shouldn’t the input shape be (features, none)?

Which lecture and timestamp are you referring to?

In the 1st course of Deep learning specialization(Neural networks and deep learning), week 3 , Vectorizing over multiple training example at 4:24

The lecture teaches how to perform vectorization. Andrew focusses on vectorization based on the data layout he uses.

Tensorflow expects batch to be the 0th dimension.

Thank you for your clarification but Andrew also taught to use features as rows and number of training examples as columns at the beginning of the course itself, although you are correct that in the lecture that i mentioned earlier, Andrew was indeed talking about vectorization .

Throughout Course 1 and Course 2 up to Week 3 of Course 2, Prof Ng uses the features x samples orientation of the data. Then in the middle of the C2 W3 assignment, right at the point that he needs to call a TF loss function for the first time, he needs to add a transpose to get samples x features.

So the high level point is that this is simply a choice that you can make either way, but of course choices have consequences. My guess is that Prof Ng uses features by samples in Course 1, because we’re writing the code by hand in python and it just works out more cleanly with that orientation. But then, as Balaji pointed out, TensorFlow has made the opposite choice, so we need to shift gears when we get to the point of using TF as our primary implementation mechanism instead of hand-coding the core algorithms in python.


All datasets and tools are not constructed identically.

1 Like