6

I have a transactional database with sales data and user id like the following:

id_usuarioweb   dt_fechaventa
1551415         2015-08-01 14:57:21.737
1551415         2015-08-06 15:34:21.920
6958538         2015-07-30 09:26:24.427
6958538         2015-08-05 09:30:06.247
6958538         2015-08-31 17:39:02.027
39101175        2015-08-05 16:34:17.990
39101175        2015-09-20 20:37:26.043
1551415         2015-09-05 13:41:43.767
3673384         2015-09-06 13:34:23.440

And I would like to calculate the average diference between dates by the same customer in the data base (to find average frequency with which the user buys).

I'm aware I can do datediff with two columns, but i'm have issues trying to do it in the same field and "grouping" by user id.

The desired outcome would be like this:

id_usuarioweb   avgtime_days
1551415         5
6958538         25
39101175        25
1551415         0
3673384         0

How can I achieve this? I would have the database ordered by user_id and then dt_fechaventa (the sale time).

USING: SQL Server 2008

9
  • Why do you have duplicate ids on your result?
    – Lamak
    Commented Sep 22, 2015 at 17:45
  • @sstan SQL 2008, added in text
    – eflores89
    Commented Sep 22, 2015 at 17:46
  • the expected output is not average. what do you exactly need? Commented Sep 22, 2015 at 17:46
  • @Lamak they are the same user... purchasing two or three times. There is a unique id for event, but it is not relevant here...
    – eflores89
    Commented Sep 22, 2015 at 17:46
  • 2
    EXACTLY, then you should get one result per user
    – Lamak
    Commented Sep 22, 2015 at 17:49

3 Answers 3

10

I think what you are looking for is calculated like this. Take the maximum and minimum dates, get the difference between them and divide by the number of purchases.

SELECT id_usuarioweb, CASE 
        WHEN COUNT(*) < 2
            THEN 0
        ELSE DATEDIFF(dd, 
                MIN(
                    dt_fechaventa
                ), MAX(
                    dt_fechaventa
                )) / (
                COUNT(*) - 
                1
                )
        END AS avgtime_days
FROM mytable
GROUP BY id_usuarioweb

EDIT: (by @GordonLinoff)

The reason that this is correct is easily seen if you look at the math. Consider three dates, a, b, and c.

The average time between them is:

((b - a) + (c - b)) / 2

This simplifies to:

(c - a) / 2

In other words, the intermediate value cancels out. And, this continues regardless of the number of intermediate values.

4
  • Sorry, but this is not the same, because I could have 3 dates, but the average between each one is not necesarilly the same as dividing the max and min.
    – eflores89
    Commented Sep 22, 2015 at 17:57
  • 2
    @eflores89 . . . The average is the max minus the min divided by one less than the count. This is the easiest and best way to calculate that value. Commented Sep 22, 2015 at 18:14
  • yup, this is actually the simplest way
    – Lamak
    Commented Sep 22, 2015 at 18:24
  • you're right. Didn't realize the -1 ... this is much simpler
    – eflores89
    Commented Sep 22, 2015 at 18:27
2

This should do:

;WITH CTE AS
(
    SELECT  *, 
            RN = ROW_NUMBER() OVER(PARTITION BY id_usuarioweb ORDER BY dt_fechaventa),
            N  = COUNT(*) OVER(PARTITION BY id_usuarioweb)
    FROM dbo.YourTable
)
SELECT  A.id_usuarioweb,
        AVG(DATEDIFF(DAY,A.dt_fechaventa,B.dt_fechaventa)) avgtime_days
FROM CTE A
INNER JOIN CTE B
    ON A.id_usuarioweb = B.id_usuarioweb
    AND A.RN = B.RN - 1
WHERE A.N > 1
GROUP BY A.id_usuarioweb;

I'm filtering the users that only have one row there, because you can't calculate an average of days with them.

Here is a demo in sqlfiddle of this. And the results are:

╔═══════════════╦══════════════╗
║ id_usuarioweb ║ avgtime_days ║
╠═══════════════╬══════════════╣
║       1551415 ║           17 ║
║       6958538 ║           16 ║
║      39101175 ║           46 ║
╚═══════════════╩══════════════╝
1
  • Yep! Brilliant! Thanks!
    – eflores89
    Commented Sep 22, 2015 at 18:05
1

You can first number rows using row_number and then do a self-join with the cte. Then perform the average. However, you would get one row per user, but not as per the expected result.

with x as 
(select id_usuarioweb, dt_fechaventa,
row_number() over(partition by id_usuarioweb order by dt_fechaventa) as rn
from tablename)
select x1.id_usuarioweb, avg(datediff(dd,x1.dt_fechaventa,x2.dt_fechaventa)) as avgdiff
from x x1 join x x2 
on x1.id_usuarioweb = x2.id_usuarioweb and x1.rn = x2.rn-1
group by x1.id_usuarioweb
2
  • there's no lag in SQL Server 2008
    – Lamak
    Commented Sep 22, 2015 at 17:57
  • changed it to exclude lag Commented Sep 22, 2015 at 18:03

Not the answer you're looking for? Browse other questions tagged or ask your own question.