12

I am working with SQL Server 2008

If I have a Table as such:

Code   Value
-----------------------
4      240
4      299
4      210
2      NULL
2      3
6      30
6      80
6      10
4      240
2      30

How can I find the median AND group by the Code column please? To get a resultset like this:

Code   Median
-----------------------
4      240
2      16.5
6      30

I really like this solution for median, but unfortunately it doesn't include Group By: https://stackoverflow.com/a/2026609/106227

4
  • Possible duplicate of stackoverflow.com/questions/1342898/…
    – jean
    Commented Dec 13, 2013 at 12:29
  • I don't think that link actually deals with grouping by a second column.
    – Stu Harper
    Commented Dec 13, 2013 at 12:33
  • In that link the OP states "What would be the best way (if possible) to do this - allow for the calculation of a median value (assuming a numeric data type) in an aggregate query?" It usually means he ill using a group by (AGGREGATE).
    – jean
    Commented Dec 13, 2013 at 12:37
  • @StuHarper: "it doesn't include Group By"...No it does!! Commented Dec 13, 2013 at 12:54

3 Answers 3

14

The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.

1
2
3
4

The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.

WITH CTE AS
(   SELECT  Code,
            Value, 
            [half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value), 
            [half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
    FROM    T
    WHERE   Value IS NOT NULL
)
SELECT  Code,
        (MAX(CASE WHEN Half1 = 1 THEN Value END) + 
        MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM    CTE
GROUP BY Code;

Example on SQL Fiddle


In SQL Server 2012 you can use PERCENTILE_CONT

SELECT  DISTINCT
        Code,
        Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM    T;

Example on SQL Fiddle

1
2

SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:

WITH RankedTable AS (
    SELECT Code, Value, 
        ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
        COUNT(*) OVER (PARTITION BY Code) AS Cnt
    FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1

To elaborate a bit on this solution, consider the output of the RankedTable CTE:

Code   Value   Rnk    Cnt
---------------------------
4      240     2      3   -- Median
4      299     3      3
4      210     1      3
2      NULL    1      2
2      3       2      2   -- Median
6      30      2      3   -- Median
6      80      3      3
6      10      1      3

Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.

6
  • Thanks Dan. This nearly works, but not quite. I added another row (4, 240). This results in your query missing out Code 4 completely from the resultset.
    – Stu Harper
    Commented Dec 13, 2013 at 12:53
  • Ah yeah, you'll get in trouble when you have multiple identical values, as these are assigned the same RANK. I'm editing my solution to use the ROW_NUMBER function instead, as this will assign a unique value to each row.
    – Dan
    Commented Dec 13, 2013 at 12:54
  • Again, thanks Dan but it still doesn't quite work. I added another row (2,30) and your query then results in 2,30 rather than 2,16.5. GarethD's solution below handles that OK.
    – Stu Harper
    Commented Dec 13, 2013 at 13:30
  • How can 16,5 be the median, if the number doesn't even exist in the original dataset?
    – Dan
    Commented Dec 13, 2013 at 13:33
  • 1
    @Dan The median doesn't have to be in the original dataset. According to Wikipedia: "If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values"
    – GarethD
    Commented Dec 13, 2013 at 13:37
-2

SELECT Code, Value FROM (SELECT Code, Value, ROW_NUMBER() OVER(PARTITION BY Code, ORDER BY Value ASC) as r1, ROW_NUMBER() OVER(PARTITION BY Code, ORDER BY Value DESC) as r2 FROM TABLE)temp WHERE (r1=r2) or (abs(r1-r2)=1);

1
  • 2
    Welcome to Stack Overflow! This question is almost 10 years old, and already has an accepted answer with upvotes. Posting a code-only solution is typically not helpful. You should explain why you feel a different solution is needed and what makes yours better than the other answers. Otherwise, is there really any point in posting it? Commented Jul 3, 2023 at 11:43

Not the answer you're looking for? Browse other questions tagged or ask your own question.