2

Let's say I have the following records:

Table: sessions
+----+-------------------------+-------------------------+---------+----------+   
| id | created_at              | updated_at              | user_id | group_id |
+----+-------------------------+-------------------------+---------+----------+
| 1  | 2014-06-08 20:37:03 UTC | 2014-06-08 20:37:03 UTC | 1       | 3        | 
| 31 | 2014-06-09 17:23:33 UTC | 2014-06-09 17:23:33 UTC | 1       | 4        |
| 32 | 2014-06-10 22:26:58 UTC | 2014-06-10 22:27:08 UTC | 1       | 2        |
| 33 | 2014-06-11 22:56:06 UTC | 2014-06-11 22:56:18 UTC | 1       | 2        |
| 35 | 2014-06-16 17:25:55 UTC | 2014-06-16 17:26:06 UTC | 1       | 2        | 
| 36 | 2014-06-17 17:26:34 UTC | 2014-06-17 17:26:47 UTC | 1       | 2        |
| 37 | 2014-06-18 17:46:51 UTC | 2014-06-18 17:46:51 UTC | 1       | 2        |
| 38 | 2014-06-19 17:47:47 UTC | 2014-06-19 17:49:00 UTC | 1       | 1        |
+----+-------------------------+-------------------------+---------+----------+

How would I query the database to get a user's longest "streak" of consecutive days.

So, based on records 35-38 above, there are consecutive records between 7/16 - 7/19, which would return 4.

I'm using PostgreSQL with Rails 4.

0

1 Answer 1

1

This first query should show you the "start date" of each "consecutive group" (each string of consecutive days) and the number of consecutive days in each group.

I did this for illustration just so you can see how it works - http://sqlfiddle.com/#!15/ad3e1/22/0

with sub as(
select user_id, dy, prev_dy, grp, case when lead(grp,1) over (order by dy) = grp+1 or lag(grp,1) over (order by dy) = grp-1 then 'X' else null end as gid
  from (select user_id,
               dy,
               prev_dy,
               row_number() over(partition by dy - prev_dy <> 1 order by dy) as grp
          from (select user_id, dy, lag(dy, 1) over(order by dy) as prev_dy
                  from (select distinct user_id,
                                        cast(created_at as date) as dy
                          from sessions
                         where user_id = 1
                         order by dy) x
                 order by dy) x
         order by dy) x
 order by dy)
select x.dy, count(*)
from sub x cross join sub y
where x.gid is null
and y.dy >= x.dy
and (y.dy < (select min(z.dy) from sub z where z.gid is null and z.dy > x.dy)
or not exists (select 1 from sub z where z.gid is null and z.dy > x.dy))
group by x.dy

To just get the result of 4 (there are actually two strings of 4 days so you have a tie in this case), you can run the below which just grabs the highest number of "consecs" from my query above (the row representing the group with the highest # of consecutive days) --

with sub as(
select user_id, dy, prev_dy, grp, case when lead(grp,1) over (order by dy) = grp+1 or lag(grp,1) over (order by dy) = grp-1 then 'X' else null end as gid
  from (select user_id,
               dy,
               prev_dy,
               row_number() over(partition by dy - prev_dy <> 1 order by dy) as grp
          from (select user_id, dy, lag(dy, 1) over(order by dy) as prev_dy
                  from (select distinct user_id,
                                        cast(created_at as date) as dy
                          from sessions
                         where user_id = 1
                         order by dy) x
                 order by dy) x
         order by dy) x
 order by dy)
select max(consecs)
from(
select x.dy, count(*) as consecs
from sub x cross join sub y
where x.gid is null
and y.dy >= x.dy
and (y.dy < (select min(z.dy) from sub z where z.gid is null and z.dy > x.dy)
or not exists (select 1 from sub z where z.gid is null and z.dy > x.dy))
group by x.dy) x

http://sqlfiddle.com/#!15/ad3e1/24/0

6
  • Hi Brian, thanks for your reply and example. However, if I remove one of the insert lines and rebuild the schema, and then execute the SQL, it does not return the "max" as expected. (I'm using your 2nd sqlfiddle).
    – Dodinas
    Commented Jul 14, 2014 at 13:50
  • @Dodinas If you remove 1 insert line the result is and should still be 4 because you have two periods of 4 consecutive days (6/8 to 6/11 and 6/16 to 6/19). By taking out one row, you break one of those strings of 4 days, but there is still another string of 4 days. This is illustrated if you do the same but run the first query. Commented Jul 14, 2014 at 20:36
  • @Dodinas I think you might have just missed that the 6/8-6/11 is also a 4 day consecutive string of time. If not please show me a fiddle where you believe the result is incorrect. Commented Jul 14, 2014 at 20:37
  • Hey @Brian, thanks for your reply. Check out sqlfiddle.com/#!15/ad3e1/2. Shouldn't the max or consecs be 4? (6/16 - 6/19 = 4 days). It comes up with 3 as the max, with both queries.
    – Dodinas
    Commented Jul 14, 2014 at 21:22
  • @Dodinas thanks for the example I think I have it fixed. Try my edit. In the first of the 2 sqls you should now get a result of 3 and 4. The second sql returns 4 (higher of 3 and 4). Seems to correctly reduce the first string of days. Please play around w/ it using your data and let me know if it works. Commented Jul 14, 2014 at 23:03

Not the answer you're looking for? Browse other questions tagged or ask your own question.