Dataframe doubt

I have a question from lab 2 of week 1.

import pandas as pd
df = pd.DataFrame({'a': [1.5, 2.5], 'b': [0.25, 2.75], 'c': [1.25, 0.75]})

print "The data frame"
print df
print "The mean value"
print df.mean()
print "The value after subtraction of mean"
print df -df.mean()

The output is

The data frame

    a     b     c
0  1.5  0.25  1.25
1  2.5  2.75  0.75

The mean value

a    2.0
b    1.5
c    1.0
dtype: float64

The value after subraction of mean

    a     b     c
0 -0.5 -1.25  0.25
1  0.5  1.25 -0.25

My question is how does python know it has to subtract row of “a” of df.mean() from column of “a” of df ? If I check “.index” of both df and df.mean, they are different. Python broadcasting doesn’t make sense to me here…

Thanks for your reply!

Hi @Krunal_Gedia!
I think your question is for a different course than “TensorFlow: Advanced Techniques Specialization”. I’ll take a stab at answering your question, but can you also change the category on this question to move it to the right course? That way people in that course can see it.

So, about your question: broadcasting will look for dimensions of matching sizes in any axes and adjust accordingly. So, in your case, since df is (2,3) and df.mean() is (3,), it will match along the dimension of size 3.

2 Likes

Thanks @Wendy for your reply. So I re-phrased by question. Also, I actually got this doubt from similar python code used for pre-processing of ml algorithms used in the lab/exercise where inputs are normalised by such syntax.

import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [2, 4]})
print(df) 
   a  b
0  1  2
1  2  4
print(df.mean())
a    1.5
b    3.0
print(df.shape)
(2, 2)
print(df.mean().shape)
(2,)
print(df.mean().transpose().shape)
(2,)
pd.DataFrame.equals(df-df.mean().transpose(),df-df.mean())
True

My question is, wouldn’t python broadcasting convert df.mean() from (2,) to (2,2) in the following form?

a    1.5 1.5
b    3.0 3.0

and then subtraction df - df.mean() would be

1-1.5   2-1.5
2-1.5   4-3.0

I know this is not happening currently, but I wonder why? Also why the transpose of (2,) is not (1,2)?

Thanks a lot for your insights!

Ah. You’re right, @Krunal_Gedia. My first answer didn’t go into the nuances of what happens if both dimensions of the larger array match the dimension of the smaller array. To be more precise, broadcasting will look for a match starting with the trailing (rightmost) dimension.

So, just as in the 2x3 example, where we matched the dimension of size 3 with the 3 values going across the top (which happens to be the trailing dimension), when the df is 2x2, our 2 values in our mean array match the 2 values in the trailing dimension - the 2 values across the top. In both cases, broadcasting will put the values from the mean array to the first row and then replicate these values to the rows below. So your broadcast version in your example will look like:

     a    b
0  1.5  3.0
1  1.5  3.0

For more info, you can see the “General Broadcasting Rules” here: Broadcasting — NumPy v1.22 Manual

Also, @Krunal_Gedia, what course is this from? We should move this question to the category for that course.

1 Like

Many thanks for the answer @Wendy . It is from lab 2 of week 1. I just simplified the doubt by my own dataframe.