Median
The median is a measure of central tendency used in statistics to describe the middle value of a dataset when it is ordered from smallest to largest. This measure is especially useful when describing data that is skewed or that has extreme values because, unlike the mean, it is not influenced by outliers.
Formally, let's denote an ordered dataset as X, which contains N observations:
X = {x_1, x_2, x_3, ..., x_N} where x_1 ≤ x_2 ≤ x_3 ≤ ... ≤ x_N
The median, M, of this dataset is defined as:
- If N is odd, the median is the value at the position (N + 1) / 2. Mathematically, this can be written as:
M = x((N+1)/2)
- If N is even, the median is the average of the values at positions N / 2 and (N / 2) + 1. Mathematically, this is written as:
M = (x(N/2) + x((N/2)+1)) / 2
Therefore, the median can be formally defined as a piecewise function:
M = x((N+1)/2) if N is odd,
(x(N/2) + x((N/2)+1)) / 2 if N is even
The median is a particularly useful measure of central tendency because it is not affected by outliers or skewed data. In a data distribution with an extreme outlier, the mean would be drawn toward the outlier, but the median would still accurately reflect the center of the data. It's often used in conjunction with other statistical measures such as the mean and mode to provide a more comprehensive understanding of a dataset.
Example 1: Odd number of observations
Consider the following dataset: {1, 3, 4, 7, 9}
This set has 5 observations. Since 5 is an odd number, we will take the value at position (5+1)/2 = 3. So the third value in our dataset is the median.
Thus, the median for this dataset is 4.
Example 2: Even number of observations
Consider the following dataset: {2, 4, 7, 12, 15, 20}
This set has 6 observations. Since 6 is an even number, we will take the average of the values at positions 6/2 = 3 and (6/2)+1 = 4. So the third and fourth values in our dataset will be used to calculate the median.
Thus, the median for this dataset is (7 + 12)/2 = 9.5.
As you can see, the calculation of the median depends on the number of observations in your dataset and it gives you a measure of the central tendency of the data, which can be particularly useful when your data contains outliers or is skewed.
Updated 5 months ago