I am always curious why we do not use the mean deviation directly but have to square the difference(xi-mu) and then root the result.

The main reason is “The main reason that the standard deviation (SD) was created like this was because the squaring eliminates all negative deviations, making the result easier to work with algebraically.” (2004. Stephen Gorard)

Here is a very interesting explanation that I want to share.

Revisiting a 90yearold debate: the advantages of the mean deviation

1 Like

Thank you, Nick @Nick_Han, for sharing this with us. I have read the paper and I have changed “mean distribution” in your post to “mean deviation” for you because I thought you meant to talk about the latter.

I think one important point from the paper is that

… SD emphasises the larger deviations (page 3) … The distortion caused by squaring deviations has led us to a culture in which advice is routinely given to students to remove or ignore valid measurements with large deviations because these unduly influence the final results. This is done regardless of their importance as data, and it means that we no longer allow our prior assumptions about distributions to be disturbed merely by the fact that they are not matched by the evidence. Good science should treasure results that show an interesting gulf between theoretical analysis and actual observations, but we have a long and ignoble history of simply ignoring any results that threaten our fundamental tenets (Moss 2001). Extreme scores are important occurrences in a variety of natural and social phenomena, including city growth, income distribution, earthquakes, traffic jams, solar flares, and avalanches … (page 6)

Thank you for that Raymond.

One more thing that is worth to add is “square the difference and then root the result” is the way to calculate the distance between two points in an coordinate per Pythagoras Theorem.

If we see the deviation from the perspective of vectors, this makes more sense.