74

I've been migrating some of my MySQL queries to PostgreSQL to use Heroku. Most of my queries work fine, but I keep having a similar recurring error when I use group by:

ERROR: column "XYZ" must appear in the GROUP BY clause or be used in an aggregate function

What am I doing wrong?

MySQL which works 100%

SELECT `availables`.*
FROM `availables`
INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056 AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
GROUP BY availables.bookdate
ORDER BY availables.updated_at

PostgreSQL error

ActiveRecord::StatementInvalid: PGError: ERROR: column "availables.id" must appear in the GROUP BY clause or be used in an aggregate function: SELECT "availables".* FROM "availables" INNER JOIN "rooms" ON "rooms".id = "availables".room_id WHERE (rooms.hotel_id = 5056 AND availables.bookdate BETWEEN E'2009-10-21' AND E'2009-10-23') GROUP BY availables.bookdate ORDER BY availables.updated_at

Ruby code generating the SQL

expiration = Available.find(:all,
    :joins => [ :room ],
    :conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
    :group => 'availables.bookdate',
    :order => 'availables.updated_at')

Expected Output (from working MySQL query):

+-----+-------+-------+------------+---------+---------------+---------------+
| id  | price | spots | bookdate   | room_id | created_at    | updated_at    |
+-----+-------+-------+------------+---------+---------------+---------------+
| 414 | 38.0  | 1     | 2009-11-22 | 1762    | 2009-11-20... | 2009-11-20... |
| 415 | 38.0  | 1     | 2009-11-23 | 1762    | 2009-11-20... | 2009-11-20... |
| 416 | 38.0  | 2     | 2009-11-24 | 1762    | 2009-11-20... | 2009-11-20... |
+-----+-------+-------+------------+---------+---------------+---------------+
3 rows in set
2
  • sooo... would I be better served using the distinct function on bookdate? If I did that, would I still need the group by clause?
    – holden
    Commented Nov 20, 2009 at 10:07
  • 2
    DISTINCT is slower than GROUP BY. So you should be careful and prefer a GROUP BY solution if it is possible.
    – Franz
    Commented Nov 20, 2009 at 11:10

9 Answers 9

116

MySQL's totally non standards compliant GROUP BY can be emulated by Postgres' DISTINCT ON. Consider this:

MySQL:

SELECT a,b,c,d,e FROM table GROUP BY a

This delivers 1 row per value of a (which one, you don't really know). Well actually you can guess, because MySQL doesn't know about hash aggregates, so it will probably use a sort... but it will only sort on a, so the order of the rows could be random. Unless it uses a multicolumn index instead of sorting. Well, anyway, it's not specified by the query.

Postgres:

SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c

This delivers 1 row per value of a, this row will be the first one in the sort according to the ORDER BY specified by the query. Simple.

Note that here, it's not an aggregate I'm computing. So GROUP BY actually makes no sense. DISTINCT ON makes a lot more sense.

Rails is married to MySQL, so I'm not surprised that it generates SQL that doesn't work in Postgres.

6
  • 6
    Adding to this, though, Postgres 9.1 allows to not list all columns if their table's primary key is part of the group by clause. Commented Jun 2, 2011 at 15:59
  • 5
    According to this article "Debunking GROUP BY myths", it has nothing to do with "non standards compliant GROUP BY".
    – Rafa
    Commented Aug 8, 2012 at 16:26
  • 5
    According to this article, MySQL's GROUP BY is still noncompliant to both versions of the standard, because it does not verify if the extra columns in the selectlist are dependent on the group by columns. It will output incorrect data without warning (but can serve useful purposes too). PG 9.1 assumes that incuding the PK of a table means all other columns are dependent, which is right. This does not cover the standard 100% (other correct queries may be flagged as errors) but covers most use cases without returning incorrect results...
    – bobflux
    Commented Aug 18, 2012 at 9:15
  • 7
    "Rails is married to MySQL, so I'm not surprised that it generates SQL that doesn't work in postgres." I don't think this is true anymore, as Postgres has gotten really popular in the Rails community because of its noSQL capabilities.
    – yagooar
    Commented Jul 28, 2013 at 8:42
  • 4
    Rails is no longer married to MySQL. Commented Jun 23, 2014 at 9:40
17

PostgreSQL is more SQL compliant than MySQL. All fields - except computed field with aggregation function - in the output must be present in the GROUP BY clause.

11

MySQL's GROUP BY can be used without an aggregate function (which is contrary to the SQL standard), and returns the first row in the group (I don't know based on what criteria), while PostgreSQL must have an aggregate function (MAX, SUM, etc) on the column, on which the GROUP BY clause is issued.

5

Correct, the solution to fixing this is to use :select and to select each field that you wish to decorate the resulting object with and group by them.

Nasty - but it is how group by should work as opposed to how MySQL works with it by guessing what you mean if you don't stick fields in your group by.

3
  • 2
    I suppose MySQL has spoiled me, or ruined me, whichever adjective you prefer, so there's no better way? Ie. throwing in an aggregate function such as MAX(bookdate) or DISTINCT which i was told above is much slower?
    – holden
    Commented Nov 20, 2009 at 11:33
  • I would stick with group by - but tread carefully, especially since you have to manually select which fields you want to decorate the object with. Also writing the manual select with group by is a more database agnostic approach, considering that MSSQL (if you are unfortunate enough to have to use it) and Oracle will also complain in a similar fashion. Commented Nov 21, 2009 at 0:15
  • DISTINCT doesn't necessarily mean slower.
    – nos
    Commented Nov 22, 2009 at 1:40
3

If I remember correctly, in PostgreSQL you have to add every column you fetch from the table where the GROUP BY clause applies to the GROUP BY clause.

2

Not the prettiest solution, but changing the group parameter to output every column in model works in PostgreSQL:

expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => Available.column_names.collect{|col| "availables.#{col}"},
:order => 'availables.updated_at')
1

According to MySQL's "Debuking GROUP BY Myths" http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html. SQL (2003 version of the standard) doesn't requires columns referenced in the SELECT list of a query to also appear in the GROUP BY clause.

3
  • 1
    But, as others have pointed out, it does require them to be "functionally dependent" on columns which are in the GROUP BY. MySQL's ability to reference any non-grouped column is entirely non-standard, and allows users to write illogical and unreliable queries.
    – IMSoP
    Commented Apr 15, 2013 at 19:49
  • It was a standard at the time, so it's not "entirely non-standard." I side with you, but that will be our opinion. Commented Apr 15, 2013 at 20:30
  • At what time? The linked article (via Wayback, or alt URL) says that both SQL:1999 and SQL:2003 impose limits on GROUP BY that MySQL ignores.
    – IMSoP
    Commented Apr 15, 2013 at 21:23
1

For others looking for a way to order by any field, including joined field, in postgresql, use a subquery:

SELECT * FROM(
SELECT DISTINCT ON(availables.bookdate) `availables`.* 
FROM `availables` INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id 
WHERE (rooms.hotel_id = 5056 
AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
) AS distinct_selected
ORDER BY availables.updated_at

or arel:

subquery = SomeRecord.select("distinct on(xx.id) xx.*, jointable.order_field")
      .where("").joins(")
result = SomeRecord.select("*").from("(#{subquery.to_sql}) AS distinct_selected").order(" xx.order_field ASC, jointable.order_field ASC")
-1

I think that .uniq [1] will solve your problem.

[1] Available.select('...').uniq

Take a look at http://guides.rubyonrails.org/active_record_querying.html#selecting-specific-fields

Not the answer you're looking for? Browse other questions tagged or ask your own question.