# Dataframe doubt

I have a question from lab 2 of week 1.

``````import pandas as pd
df = pd.DataFrame({'a': [1.5, 2.5], 'b': [0.25, 2.75], 'c': [1.25, 0.75]})

print "The data frame"
print df
print "The mean value"
print df.mean()
print "The value after subtraction of mean"
print df -df.mean()
``````

The output is

``````The data frame

a     b     c
0  1.5  0.25  1.25
1  2.5  2.75  0.75

The mean value

a    2.0
b    1.5
c    1.0
dtype: float64

The value after subraction of mean

a     b     c
0 -0.5 -1.25  0.25
1  0.5  1.25 -0.25
``````

My question is how does python know it has to subtract row of â€śaâ€ť of df.mean() from column of â€śaâ€ť of df ? If I check â€ś.indexâ€ť of both df and df.mean, they are different. Python broadcasting doesnâ€™t make sense to me hereâ€¦

Hi @Krunal_Gedia!
I think your question is for a different course than â€śTensorFlow: Advanced Techniques Specializationâ€ť. Iâ€™ll take a stab at answering your question, but can you also change the category on this question to move it to the right course? That way people in that course can see it.

So, about your question: broadcasting will look for dimensions of matching sizes in any axes and adjust accordingly. So, in your case, since df is (2,3) and df.mean() is (3,), it will match along the dimension of size 3.

2 Likes

Thanks @Wendy for your reply. So I re-phrased by question. Also, I actually got this doubt from similar python code used for pre-processing of ml algorithms used in the lab/exercise where inputs are normalised by such syntax.

``````import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [2, 4]})
print(df)
a  b
0  1  2
1  2  4
print(df.mean())
a    1.5
b    3.0
print(df.shape)
(2, 2)
print(df.mean().shape)
(2,)
print(df.mean().transpose().shape)
(2,)
pd.DataFrame.equals(df-df.mean().transpose(),df-df.mean())
True
``````

My question is, wouldnâ€™t python broadcasting convert `df.mean()` from `(2,)` to `(2,2)` in the following form?

``````a    1.5 1.5
b    3.0 3.0
``````

and then subtraction `df - df.mean()` would be

``````1-1.5   2-1.5
2-1.5   4-3.0
``````

I know this is not happening currently, but I wonder why? Also why the transpose of `(2,)` is not ` (1,2)`?

Thanks a lot for your insights!

Ah. Youâ€™re right, @Krunal_Gedia. My first answer didnâ€™t go into the nuances of what happens if both dimensions of the larger array match the dimension of the smaller array. To be more precise, broadcasting will look for a match starting with the trailing (rightmost) dimension.

So, just as in the 2x3 example, where we matched the dimension of size 3 with the 3 values going across the top (which happens to be the trailing dimension), when the df is 2x2, our 2 values in our mean array match the 2 values in the trailing dimension - the 2 values across the top. In both cases, broadcasting will put the values from the mean array to the first row and then replicate these values to the rows below. So your broadcast version in your example will look like:

``````     a    b
0  1.5  3.0
1  1.5  3.0
``````

Also, @Krunal_Gedia, what course is this from? We should move this question to the category for that course.

1 Like

Many thanks for the answer @Wendy . It is from lab 2 of week 1. I just simplified the doubt by my own dataframe.