In statistics, the median represents the middle value in a dataset that has been sorted in ascending or descending order.
Understanding the Median
The median is a measure of central tendency, much like the mean (average) and the mode. However, unlike the mean, the median is not affected by extreme values or outliers in the dataset. This makes it a more robust measure when dealing with skewed distributions.
How to Find the Median
- Sort the data: Arrange the dataset in ascending (lowest to highest) or descending (highest to lowest) order.
- Identify the middle value:
- Odd number of data points: The median is the middle value in the sorted list. For example, in the dataset {2, 4, 6, 8, 10}, the median is 6.
- Even number of data points: The median is the average of the two middle values. For example, in the dataset {2, 4, 6, 8}, the median is (4 + 6) / 2 = 5.
Why Use the Median?
- Resistant to Outliers: As mentioned earlier, the median is not significantly affected by extreme values. For example, consider the dataset {1, 2, 3, 4, 100}. The mean is 22, while the median is 3. The median provides a better representation of the "typical" value in this dataset.
- Describes the Center: The median represents the point at which 50% of the data falls below it and 50% falls above it. It's a direct measure of the "middle" of the data.
Example
Consider the following dataset representing salaries (in thousands of dollars): {40, 45, 50, 55, 60, 100}.
- Mean: (40 + 45 + 50 + 55 + 60 + 100) / 6 = 58.33
- Median: Since there are an even number of data points, the median is the average of the two middle values (50 and 55): (50 + 55) / 2 = 52.5
In this case, the median (52.5) gives a more accurate picture of the typical salary than the mean (58.33), which is skewed upwards by the outlier (100).
In essence, the median is a valuable tool for understanding the central tendency of a dataset, especially when dealing with data that may contain outliers or have a skewed distribution. It provides a more stable and representative measure of the "middle" value than the mean in such scenarios.