Note, the input to the layer must be 2-D, so we'll reshape it

i have a very simple doubt here. why we reshaped the X_train[0] values to (1,1)

a1 = linear_layer(X_train[0].reshape(1,1))

Hello @Utsav_Sharma1,

I think you quoted that line of code from this lab: “C2_W1_Lab01_Neurons_and_Layers”. Next time, please state it in your post.

linear_layer is a tf.keras.layers.Dense object and it is the tensorflow’s requirement that the minimum number of dimensions to be 2. We take the zeroth axis as the “sample axis” (or “batch axis” as called in many TF documentations) . By reshaping it to (1, 1), the zeroth axis becomes the “batch axis”, and the first axis be the “feature axis”.

So, it is a requirement.


I did not understand what you mean by sample axis and feature axis

Hey @Mohammed_Ahmed3,

This is about how we arrange the samples. For example, let’s say we have 2 samples, where the first sample is [1., 2., 3.], and the second one is [4., 5., 6.].

Now, obviously each sample has 3 features.

There are 2 ways we can arrange them into one 2D array.

First way:

import numpy as np
data1 = np.array([
    [1., 2., 3.],
    [4., 5., 6.],

Second way:

data2 = np.array([
    [1., 4.],
    [2., 5.],
    [3., 6.],

Here, it is important that you realize their difference. The first way stacks the sample vertically (it grows in the vectrical direction) whereas the second way horizontally.

Another way to speak their difference is that, in the first way we take the zeroth axis (aka 0th axis) as the sample axis because if you index an “element” along the zeroth axis, you get a sample. Note here that an element of a 2D array is a 1D array. For example, data1[0, :] gives us [1., 2., 3.] and data1[1, :] gives us [4., 5., 6.], and they are obviously the first and the second samples.

Because when we index along the zeroth axis, we get a sample, the zeroth axis can be called as the sample axis.

On the other hand, when we index along the first axis, for example, data1[:, 0] gives us [1., 4.]. Here, The two values corresponds to the first feature of the first and second samples respectively, therefore, in short, data1[:, 0] gives us the first feature. Similarity, data1[:, 1] gives us the second feature, and data1[:, 2] the third.

Because when we index along the first axis, we get a feature, the first axis can be called as the feature axis.

In summary, the first way of arrangement, which is data1, stacks the samples vertically to make the zeroth axis as the sample axis because indexing along that axis gives us a sample, and make the first axis as the feature axis because indexing along which gives us a feature.

The second way arranges the samples in the opposite way but I will let you figure out the rest yourself if you are interested. It is important to note that both this machine learning specialization and Tensorflow adopt the first way, and that’s why I spend all the time explaining the first way, but if you understand everything, you should be able to tell which axis is the sample axis in the second way.

Good luck, and cheers,

this example is not clear

Hey @Mohammed_Ahmed3, I have updated my last reply.

Btw, it’s very good that you are trying it!

@Mohammed_Ahmed3, I am going to stay around in the next 30 minutes for you, let me know here if you have any follow up.

Keep trying!

Hey @Mohammed_Ahmed3, I have got to go for a call. Good luck!


data2[0, :] gives us [4., 5., 6.] or give us array([1., 4.]) if it’s vertically so the first sample is 1,4 and that is good because data2[0,:] give us that but data1[0,:] give us [1,2,3] zeroth axis as the sample aixs is that mean wwe should get the same sample

what first axis mean?

I need proof to this. Please share a screenshot like you did last time that shows how you define data2 and you ran data2[0, :].

Also, I have a hard time following your last reply, please use punctuations and new paragraphs so that I can read your statements in a statement-by-statement manner. Whenever possible, supply a screenshot of running the code you mention in your statement, because in this way, it supports the statement.

Lastly, please explain what “wwe” means.

Thank you.

data2[0,:] give us [1,4] as you see

first look at your reply underlined by black line first samples stacked verticaly .
you mean by that samples are [1,4] [2,5] [3,6] where the second way horizontally already so data1[0,:] = [1,4] so that zero axis beacuse it give us sample
second what you mean by sample highlighted in second image ?
third is that the axis in first picture?

This is the first way:

They are stacked vertically because it grows vertically. The meaning of growing vertically is that, data1 has 2 rows because it has 2 samples. If there were 10 samples, since they were stacked vertically, it was going to have 10 rows, in other words, it was 10 rows tall.

3 samples → 3 rows tall.
4 samples → 4 rows tall.
5 samples → 5 rows tall.

Is this clear?

This is wrong. Run the code and share your screenshot. :wink:

Don’t mix up data1 with data2.

I won’t go into these questions because it seems to me you were mixing up data1 and data2. As I said, support your argument with screenshot because it helps avoid mixing up.

Run the code, compare the result with my previous reply again, and re-ask the questions if they remain.

Keep trying!

you say that data2[0,:] give us [4,5,6] and that is wrong
if we accesses data[1,:] it give us [4,5,6]

so how the zeroth axis is the sample axis

Thank you for telling me my mistakes! That helps, and I have corrected my previous post. My previous post should only give examples about data1.

I stand by this explanation:

and your experiments have shown the above statement. You said:

  • data1[0,:] gives us [1,2,3]
  • data1[1,:] gives us [4,5,6]

and they are the first and the second samples.


one also is a sample axis?
if data2[:,0] = [1,2,3] so axis zero is sample axis or first axis is the sample axis because we
index the first axis

That depends on which variable we are talking about, right? For data1, the zeroth axis is the sample axis, and for data2, the first axis is the sample axis. Agree?

So can you tell me what causes the sample axis to change from the zeroth to the first axis? What is the difference between data1 and data2?

because we stacked the samples in data1 vertically and in data2 horizontally
is that right?


Additionally, in these MLS courses, we mostly adopt the data1 way, meaning we almost always stack samples vertically, and thus we treat the zeroth axis as the sample axis.