Get most common value for each value of another column in SQL

Question

I have a table like this:

 Column  | Type | Modifiers 
---------+------+-----------
 country | text | 
 food_id | int  | 
 eaten   | date |

And for each country, I want to get the food that is eaten most often. The best I can think of (I'm using postgres) is:

CREATE TEMP TABLE counts AS 
   SELECT country, food_id, count(*) as count FROM munch GROUP BY country, food_id;

CREATE TEMP TABLE max_counts AS 
   SELECT country, max(count) as max_count FROM counts GROUP BY country;

SELECT country, max(food_id) FROM counts 
   WHERE (country, count) IN (SELECT * from max_counts) GROUP BY country;

In that last statement, the GROUP BY and max() are needed to break ties, where two different foods have the same count.

This seems like a lot of work for something conceptually simple. Is there a more straight forward way to do it?

jrouquie · Accepted Answer · 2017-04-25 14:40:13Z

26

It is now even simpler: PostgreSQL 9.4 introduced the mode() function:

select mode() within group (order by food_id)
from munch
group by country

returns (like user2247323's example):

country | mode
--------------
GB      | 3
US      | 1

See documentation here: https://wiki.postgresql.org/wiki/Aggregate_Mode

https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE

answered Apr 25, 2017 at 14:40

jrouquie

4,3754 gold badges29 silver badges43 bronze badges

Add a comment |

pilcrow · Accepted Answer · 2012-09-16 17:17:31Z

PostgreSQL introduced support for window functions in 8.4, the year after this question was asked. It's worth noting that it might be solved today as follows:

SELECT country, food_id
  FROM (SELECT country, food_id, ROW_NUMBER() OVER (PARTITION BY country ORDER BY freq DESC) AS rn
          FROM (  SELECT country, food_id, COUNT('x') AS freq
                    FROM country_foods
                GROUP BY 1, 2) food_freq) ranked_food_req
 WHERE rn = 1;

The above will break ties. If you don't want to break ties, you could use DENSE_RANK() instead.

jkramer · Accepted Answer · 2008-12-06 01:46:56Z

SELECT DISTINCT
"F1"."food",
"F1"."country"
FROM "foo" "F1"
WHERE
"F1"."food" =
    (SELECT "food" FROM
        (
            SELECT "food", COUNT(*) AS "count"
            FROM "foo" "F2" 
            WHERE "F2"."country" = "F1"."country" 
            GROUP BY "F2"."food" 
            ORDER BY "count" DESC
        ) AS "F5"
        LIMIT 1
    )

Well, I wrote this in a hurry and didn't check it really well. The sub-select might be pretty slow, but this is shortest and most simple SQL statement that I could think of. I'll probably tell more when I'm less drunk.

PS: Oh well, "foo" is the name of my table, "food" contains the name of the food and "country" the name of the country. Sample output:

   food    |  country   
-----------+------------
 Bratwurst | Germany
 Fisch     | Frankreich

Those need to be single quotes in most places, I think.
– ocket8888
Commented Apr 6, 2017 at 10:20 — ocket8888, Commented Apr 6, 2017 at 10:20

Jamal Hansen · Accepted Answer · 2008-12-06 19:36:27Z

6

try this:

Select Country, Food_id
From Munch T1
Where Food_id= 
    (Select Food_id
     from Munch T2
     where T1.Country= T2.Country
     group by Food_id
     order by count(Food_id) desc
      limit 1)
group by Country, Food_id

answered Dec 6, 2008 at 19:36

Jamal Hansen

9547 silver badges17 bronze badges

Add a comment |

John MacIntyre · Accepted Answer · 2008-12-05 18:17:57Z

3

Try something like this

select country, food_id, count(*) cnt 
into #tempTbl 
from mytable 
group by country, food_id

select country, food_id
from  #tempTbl as x
where cnt = 
  (select max(cnt) 
  from mytable 
  where country=x.country 
  and food_id=x.food_id)

This could be put all into a single select, but I don't have time to muck around with it right now.

Good luck.

answered Dec 5, 2008 at 18:17

John MacIntyre

13k13 gold badges68 silver badges107 bronze badges

Add a comment |

JosephStyons · Accepted Answer · 2008-12-05 20:11:27Z

3

Here's how to do it without any temp tables:

Edit: simplified

select nf.country, nf.food_id as most_frequent_food_id
from national_foods nf
group by country, food_id 
having
  (country,count(*)) in (  
                        select country, max(cnt)
                        from
                          (
                          select country, food_id, count(*) as cnt
                          from national_foods nf1
                          group by country, food_id
                          )
                        group by country
                        having country = nf.country
                        )

edited Dec 5, 2008 at 20:11

answered Dec 5, 2008 at 18:23

JosephStyons

58.1k63 gold badges165 silver badges237 bronze badges

I'd be interested in seeing the plan for this execution vs. the temp table -- those "having" clauses are evaluated after select retrieves matching rows, right? Seems like there may be a boatload of extra IO.
– Ken Gentle
Commented Dec 5, 2008 at 18:32
There are a couple of full table scans in the plan, yes.
– JosephStyons
Commented Dec 5, 2008 at 20:12

Add a comment |

Matt Rogish · Accepted Answer · 2008-12-05 20:50:05Z

SELECT country, MAX( food_id )
  FROM( SELECT m1.country, m1.food_id
          FROM munch m1
         INNER JOIN ( SELECT country
                           , food_id
                           , COUNT(*) as food_counts
                        FROM munch m2
                    GROUP BY country, food_id ) as m3
                 ON m1.country = m3.country
         GROUP BY m1.country, m1.food_id 
        HAVING COUNT(*) / COUNT(DISTINCT m3.food_id) = MAX(food_counts) ) AS max_foods
  GROUP BY country

I don't like the MAX(.) GROUP BY to break ties... There's gotta be a way to incorporate eaten date into the JOIN in some way to arbitrarily select the most recent one...

I'm interested on the query plan for this thing if you run it on your live data!

Theo · Accepted Answer · 2008-12-05 21:31:29Z

3

select country,food_id, count(*) ne  
from   food f1  
group by country,food_id    
having count(*) = (select max(count(*))  
                   from   food f2  
                   where  country = f1.country  
                   group by food_id)

answered Dec 5, 2008 at 21:31

Theo

4523 silver badges3 bronze badges

Add a comment |

JCF · Accepted Answer · 2017-01-12 22:04:49Z

3

Here is a statement which I believe gives you what you want and is simple and concise:

select distinct on (country) country, food_id
from munch
group by country, food_id
order by country, count(*) desc

Please let me know what you think.

BTW, the distinct on feature is only available in Postgres.

Example, source data:

country | food_id | eaten
US        1         2017-1-1
US        1         2017-1-1
US        2         2017-1-1
US        3         2017-1-1
GB        3         2017-1-1
GB        3         2017-1-1
GB        2         2017-1-1

output:

country | food_id
US        1
GB        3

edited Jan 12, 2017 at 22:04

answered Apr 13, 2016 at 15:30

JCF

7015 silver badges9 bronze badges

1

If you are going to propose a new answer after all this time, I recommend trying it on a sample table, and posting the results you get. Also, please mention which database server you are using (mysql or whatever).
– ToolmakerSteve
Commented Jan 12, 2017 at 1:24
3

The distinct on feature is only available in Postgres so, I'm not sure how you would do something like this in another database. OP is using Postgres so it seems appropriate. I wrote this using the database table suggested by op called munch which has three fields: country (text), food_id (int), and eaten (date)
– JCF
Commented Jan 12, 2017 at 21:52

Add a comment |

Collectives™ on Stack Overflow

Get most common value for each value of another column in SQL

9 Answers 9

Not the answer you're looking for? Browse other questions tagged
sql
postgresql
greatest-n-per-group
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

Not the answer you're looking for? Browse other questions tagged sqlpostgresqlgreatest-n-per-group or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
sql
postgresql
greatest-n-per-group
or ask your own question.