Average time between dates in same field by groups

Question

I have a transactional database with sales data and user id like the following:

id_usuarioweb   dt_fechaventa
1551415         2015-08-01 14:57:21.737
1551415         2015-08-06 15:34:21.920
6958538         2015-07-30 09:26:24.427
6958538         2015-08-05 09:30:06.247
6958538         2015-08-31 17:39:02.027
39101175        2015-08-05 16:34:17.990
39101175        2015-09-20 20:37:26.043
1551415         2015-09-05 13:41:43.767
3673384         2015-09-06 13:34:23.440

And I would like to calculate the average diference between dates by the same customer in the data base (to find average frequency with which the user buys).

I'm aware I can do datediff with two columns, but i'm have issues trying to do it in the same field and "grouping" by user id.

The desired outcome would be like this:

id_usuarioweb   avgtime_days
1551415         5
6958538         25
39101175        25
1551415         0
3673384         0

How can I achieve this? I would have the database ordered by user_id and then dt_fechaventa (the sale time).

USING: SQL Server 2008

the expected output is not average. what do you exactly need? — Vamsi Prabhala, Commented Sep 22, 2015 at 17:46
@Lamak they are the same user... purchasing two or three times. There is a unique id for event, but it is not relevant here... — eflores89, Commented Sep 22, 2015 at 17:46

Gordon Linoff · Accepted Answer · 2015-09-22 18:27:40Z

10

I think what you are looking for is calculated like this. Take the maximum and minimum dates, get the difference between them and divide by the number of purchases.

SELECT id_usuarioweb, CASE 
        WHEN COUNT(*) < 2
            THEN 0
        ELSE DATEDIFF(dd, 
                MIN(
                    dt_fechaventa
                ), MAX(
                    dt_fechaventa
                )) / (
                COUNT(*) - 
                1
                )
        END AS avgtime_days
FROM mytable
GROUP BY id_usuarioweb

EDIT: (by @GordonLinoff)

The reason that this is correct is easily seen if you look at the math. Consider three dates, a, b, and c.

The average time between them is:

((b - a) + (c - b)) / 2

This simplifies to:

(c - a) / 2

In other words, the intermediate value cancels out. And, this continues regardless of the number of intermediate values.

edited Sep 22, 2015 at 18:27

Gordon Linoff

1.3m59 gold badges672 silver badges820 bronze badges

answered Sep 22, 2015 at 17:52

Greg Viers

3,5233 gold badges20 silver badges36 bronze badges

Sorry, but this is not the same, because I could have 3 dates, but the average between each one is not necesarilly the same as dividing the max and min.
– eflores89
Commented Sep 22, 2015 at 17:57
2

@eflores89 . . . The average is the max minus the min divided by one less than the count. This is the easiest and best way to calculate that value.
– Gordon Linoff
Commented Sep 22, 2015 at 18:14
yup, this is actually the simplest way
– Lamak
Commented Sep 22, 2015 at 18:24
you're right. Didn't realize the -1 ... this is much simpler
– eflores89
Commented Sep 22, 2015 at 18:27

Add a comment |

Lamak · Accepted Answer · 2015-09-22 18:01:04Z

This should do:

;WITH CTE AS
(
    SELECT  *, 
            RN = ROW_NUMBER() OVER(PARTITION BY id_usuarioweb ORDER BY dt_fechaventa),
            N  = COUNT(*) OVER(PARTITION BY id_usuarioweb)
    FROM dbo.YourTable
)
SELECT  A.id_usuarioweb,
        AVG(DATEDIFF(DAY,A.dt_fechaventa,B.dt_fechaventa)) avgtime_days
FROM CTE A
INNER JOIN CTE B
    ON A.id_usuarioweb = B.id_usuarioweb
    AND A.RN = B.RN - 1
WHERE A.N > 1
GROUP BY A.id_usuarioweb;

I'm filtering the users that only have one row there, because you can't calculate an average of days with them.

Here is a demo in sqlfiddle of this. And the results are:

╔═══════════════╦══════════════╗
║ id_usuarioweb ║ avgtime_days ║
╠═══════════════╬══════════════╣
║       1551415 ║           17 ║
║       6958538 ║           16 ║
║      39101175 ║           46 ║
╚═══════════════╩══════════════╝

Yep! Brilliant! Thanks!
– eflores89
Commented Sep 22, 2015 at 18:05 — eflores89, Commented Sep 22, 2015 at 18:05

Vamsi Prabhala · Accepted Answer · 2015-09-22 18:17:24Z

1

You can first number rows using row_number and then do a self-join with the cte. Then perform the average. However, you would get one row per user, but not as per the expected result.

with x as 
(select id_usuarioweb, dt_fechaventa,
row_number() over(partition by id_usuarioweb order by dt_fechaventa) as rn
from tablename)
select x1.id_usuarioweb, avg(datediff(dd,x1.dt_fechaventa,x2.dt_fechaventa)) as avgdiff
from x x1 join x x2 
on x1.id_usuarioweb = x2.id_usuarioweb and x1.rn = x2.rn-1
group by x1.id_usuarioweb

edited Sep 22, 2015 at 18:17

answered Sep 22, 2015 at 17:56

Vamsi Prabhala

49.1k4 gold badges39 silver badges62 bronze badges

there's no lag in SQL Server 2008
– Lamak
Commented Sep 22, 2015 at 17:57
changed it to exclude lag
– Vamsi Prabhala
Commented Sep 22, 2015 at 18:03

Add a comment |

Collectives™ on Stack Overflow

Average time between dates in same field by groups

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
sql
sql-server
date
average
datediff
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged sqlsql-serverdateaveragedatediff or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
sql
sql-server
date
average
datediff
or ask your own question.