0

I have a sample of cars in my table and I would like to calculate the median price for my sample with SQL. What is the best way to do it?

+-----+-------+----------+
| Car | Price | Quantity |
+-----+-------+----------+
| A   |   100 |        2 |
| B   |   150 |        4 |
| C   |   200 |        8 |
+-----+-------+----------+

I know that I can use percentile_cont (or percentile_disc) if my table is like this:

+-----+-------+
| Car | Price |
+-----+-------+
| A   |   100 |
| A   |   100 |
| B   |   150 |
| B   |   150 |
| B   |   150 |
| B   |   150 |
| C   |   200 |
| C   |   200 |
| C   |   200 |
| C   |   200 |
| C   |   200 |
| C   |   200 |
| C   |   200 |
| C   |   200 |
+-----+-------+

But in the real world, my first table has about 100 million rows and the second table should have about 3 billiard rows (and moreover I don't know how to transform my first table into the second).

5
  • Have you tried this? sisense.com/blog/medians-in-sql Commented Sep 28, 2020 at 15:45
  • All methods are for my second table, not the first
    – Room'on
    Commented Sep 28, 2020 at 15:51
  • jarlh, this will be the average price not median
    – Room'on
    Commented Sep 28, 2020 at 15:58
  • please tag the db you are using Commented Sep 28, 2020 at 16:01
  • George Joseph, I have two DB - Oracle and SQL Server
    – Room'on
    Commented Sep 28, 2020 at 16:09

2 Answers 2

1

Here is a way to do this in sql server

In the first step i do is calculate the indexes corresponding to the lower and upper bounds for the median (if we have odd number of elements then the lower and upper bounds are same else its based on the x/2 and x/2+1th value)

Then i get the cumulative sum of the quantity and the use that to choose the elements corresponding to the lower and upper bounds as follows

with median_dt
  as (
select case when sum(quantity)%2=0 then
                 sum(quantity)/2 
            else 
                 sum(quantity)/2 + 1
        end as lower_limit
      ,case when sum(quantity)%2=0 then
                 (sum(quantity)/2) + 1
            else 
                 sum(quantity)/2 + 1
        end as upper_limit  
 from t 
     )
    ,data
    as (
 select *,sum(quantity) over(order by price asc) as cum_sum
   from t
       )
   ,rnk_val
   as(select * 
       from (
             select price,row_number() over(order by d.cum_sum asc) as rnk
               from data d
               join median_dt b
                 on b.lower_limit<=d.cum_sum
             )x 
      where x.rnk=1
      union all 
     select * 
       from (
             select price,row_number() over(order by d.cum_sum asc) as rnk
               from data d
               join median_dt b
                 on b.upper_limit<=d.cum_sum
             )x 
      where x.rnk=1
      ) 
  select avg(price) as median
    from rnk_val



+--------+
| median |
+--------+
|    200 |
+--------+

db fiddle link https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c5cfa645a22aa9c135032eb28f1749f6

1

This looks right on few results, but try on a larger set to double-check.

First create a table which has the total for each car (or use CTE or sub-query), your choice. I'm just creating a separate table here.

    create table table2 as
    (
        select car, 
        quantity, 
        price, 
        price * quantity as total
        from table1
    )

Then run this query, which looks for the price group that falls in the middle.

    select price
    from (
        select car, price, 
        sum(total) over (order by car) as rollsum, 
        sum(total) over () as total
        from table2
        )a
    where rollsum >= total/2

Correctly returns a value of $200.

2
  • I think this won't work if the same value has more than half the number of rows....
    – SQLpro
    Commented Sep 28, 2020 at 16:41
  • I used the same approach with the extra step where I choose the nearest to 50% price group. But I want to use linear interpolation like in PERCINTILE_CONT and I want it to work when there is only one point in my sample.
    – Room'on
    Commented Sep 28, 2020 at 16:52

Not the answer you're looking for? Browse other questions tagged or ask your own question.