askvity

How Does Mean and Median Affect Skew?

Published in Data Distribution Skewness 3 mins read

The relationship between the mean and the median is a key indicator of a dataset's skewness.

Skewness describes the asymmetry of a distribution. In a perfectly symmetrical distribution (like a normal distribution), the mean, median, and mode are all located at the same point. However, when a distribution is skewed, the mean and median separate, with their relative positions revealing the direction of the skew.

Understanding Skewness

Skewness can be either positive (right-skewed) or negative (left-skewed).

  • Right Skew (Positive Skew): The tail of the distribution extends towards the right. This typically occurs when there are a few unusually high values that pull the mean in that direction.
  • Left Skew (Negative Skew): The tail of the distribution extends towards the left. This typically occurs when there are a few unusually low values that pull the mean in that direction.

The Relationship Between Mean, Median, and Skew

The provided reference highlights the core relationship:

Therefore, when the distribution of data is skewed to the left, the mean is often less than the median. When the distribution is skewed to the right, the mean is often greater than the median.

Here's a breakdown of how their positions indicate skew:

Skew Type Appearance Relationship between Mean and Median Why this happens
Symmetrical Bell-shaped (e.g., Normal Distribution) Mean ≈ Median ≈ Mode No extreme values pulling either measure significantly.
Right Skew Tail points to the right Mean > Median High outliers pull the mean towards the right tail.
Left Skew Tail points to the left Mean < Median Low outliers pull the mean towards the left tail.

Why the Mean is Affected More

The mean is calculated by summing all values and dividing by the count. This makes it highly sensitive to extreme values (outliers). The median, on the other hand, is the middle value when the data is ordered; it's less affected by outliers because its position is based on rank rather than the magnitude of extreme values.

  • In a right-skewed distribution, the high values in the right tail significantly increase the sum of the data, pulling the mean up above the median.
  • In a left-skewed distribution, the low values in the left tail significantly decrease the sum of the data, pulling the mean down below the median.

Example:

Consider two small datasets:

  1. Slightly Right-Skewed: 1, 2, 3, 4, 10
    • Mean = (1+2+3+4+10) / 5 = 4
    • Median = 3 (the middle value)
    • Mean (4) > Median (3), indicating right skew.
  2. Slightly Left-Skewed: 1, 8, 9, 10, 11
    • Mean = (1+8+9+10+11) / 5 = 7.8
    • Median = 9 (the middle value)
    • Mean (7.8) < Median (9), indicating left skew.

This difference in how the mean and median react to extreme values makes their relative positions a valuable diagnostic tool for quickly assessing the shape of a distribution and understanding the impact of outliers.

Related Articles