5

I'm trying my best to create a custom query on Data Explorer but I think I've reached the limit of my own incompetence. I want it to identify;

  1. The top 20 users on a given Stack Exchange community
  2. How many days they've been a member
  3. How many days they've hit the reputation cap (e.g. 200 points)

and then return (and rank) those results as a ordered percentage.

e.g.

  • User 1 has been a member for 100 days and has hit the cap 100 times. His ratio is 100%
  • User 2 has been a member for 100 days and has hit the cap 50 times. His ratio is 50%
1

1 Answer 1

15

I started off with the query pointed out by Martijn and ended up with this beauty

Be careful if you count rows when you are joining tables...

Do note that these results might be slightly off because downvotes are anonymous so those votes on answers are not subtracted. SEDE is update weekly (On Monday!?) so at best the data is off for a week.

;
with topusers as (select top 200 
                    id
                    , datediff(d , creationdate, getdate()) as Daysmember
             from users 
             order by reputation desc) ,
voting as (
         SELECT 
                SUM(case votes.votetypeid 
                        WHEN 1 THEN 15  -- accept
                        WHEN 2 THEN   -- upvote
                           CASE posts.posttypeid 
                              WHEN 1 THEN 5  -- upvote question
                              WHEN 2 THEN 10  -- upvote answer
                           END
                        WHEN 3 THEN -2  -- downvote
                        WHEN 9 THEN BountyAmount -- collected bounty
                 END) as Rep,
                 Votes.CreationDate AS CreationDate,
                 Posts.OwnerUserId 
          FROM   Posts
                 INNER JOIN Votes              
                   ON Votes.PostId = Posts.Id
                 INNER JOIN topusers t 
                    on t.id = Posts.OwnerUserId 
                      
          WHERE ( Posts.CommunityOwnedDate IS NULL ) 
          GROUP BY Votes.CreationDate,
                    Posts.OwnerUserId
),
bounty as (
         SELECT 
                SUM(-BountyAmount) as Rep,  -- Bounty given
                Votes.CreationDate AS CreationDate,
                Votes.UserID as OwnerUserId
          FROM  Votes                    
          INNER JOIN topusers t on t.id =  Votes.UserID
          WHERE votes.votetypeid = 8
          GROUP BY Votes.CreationDate,
                Votes.UserID
),
sugedit as (  -- suggested edits
          SELECT 
                 COUNT(*) * 2 as Rep 
               , cast(ApprovalDate as date) As CreationDate
               , OwnerUserId
          FROM  SuggestedEdits 
          INNER JOIN topusers t on t.id =  SuggestedEdits.OwnerUserId 
          WHERE ApprovalDate IS NOT NULL
          GROUP BY cast(ApprovalDate as date)
                 , OwnerUserId
)
select userid as [User Link]
, count(*) as [Days of Rep Cap]
, min(daysmember) [Days a member]
from (
select creationdate
       , userid
       , sum(votesrep + bountyrep + sugedtrep) as rep
    from (
    select v.creationdate as creationdate
         , v.owneruserid as userid
         , v.rep as votesrep
         , 0 as bountyrep
         , 0 as sugedtrep
    from voting v
    union
    select b.creationdate
          , b.owneruserid
          , 0
          , b.rep
          , 0
    from bounty b
    union 
    select s.creationdate
         , s.owneruserid
         , 0
         , 0
         , s.rep
    from sugedit s 
    ) as ingroup
    group by creationdate,
    userid
    having sum(votesrep + bountyrep + sugedtrep)>200
    ) cntdays
inner join topusers tu on tu.id = userid
group by userid
order by count(*) desc

The usual suspects on Stack Overflow are in the top result:

User Link     |Days of Rep Cap|Days a member
--------------------------------------------
Jon Skeet     |    2062       |   2101
Marc Gravell  |    1606       |   2098
Darin Dimitrov|    1558       |   2078
BalusC        |    1526       |   1776
Hans Passant  |    1444       |   2109
SLaks         |    1255       |   2062
VonC          |    1156       |   2114
CommonsWare   |    1139       |   1854
Greg Hewgill  |    1025       |   2148
Mark Byers    |     904       |   1971

Explaination about this query

I based the functional requirements of this query on the FAQ item How does “Reputation” work?. I'll match each bullet with the construction in the query so the query becomes verifiable (as requested in the comments). The text in code is the explanation.
Votes and Posts are joined on Postid and Id Userid is here 1 (Jeff Atwood on SO).

You gain reputation when:

  • one of your questions is voted up/useful: +5
    Sum(5) from Votes where Votes.VoteTypeId == 2 and Post.PostTypeId = 1 and Post.OwnerUserId = 1
  • one of your answers is voted up/useful: +10
    Sum(10) from Votes where Votes.VoteTypeId == 2 and Post.PostTypeId = 2 and Post.OwnerUserId = 1
  • one of your answers becomes accepted: +15
    Sum(15) from Votes where Votes.VoteTypeId == 1 and Post.PostTypeId = 1 and Post.OwnerUserId = 1
  • you accept an answer written by someone else to one of your own questions: +2
    un accounted for yet
  • a downvote on one of your questions or answers is removed: +2
    this is not available in SEDE
  • you suggest an edit and it is accepted: +2 (up to a total of +1000 per user)
    Sum(2) from SuggestedEdits where OwnerUserid = 1 and ApprovalDate is not null
  • you remove a downvote from an answer: +1
    this is not available in SEDE (voting is anonymous)
  • one of your answers is awarded a bounty by the user offering the bounty: +full bounty amount
    Sum(BountyAmount) from Votes where Votes.VoteTypeId == 9 and Post.PostTypeId = 2 and Post.OwnerUserId = 1
  • one of your answers is awarded a bounty automatically: +1/2 of the bounty amount (see bounty FAQ for details)
    Sum(BountyAmount) from Votes where Votes.VoteTypeId == 9 and Post.PostTypeId = 2 and Post.OwnerUserId = 1
  • you associate accounts of two or more Stack Exchange network sites, and at least one of those accounts already has 200 or more reputation: +100 on each site (awarded a maximum of one time per site)
    Not available in SEDE (there is no cross-site query possible)

You lose reputation when:

  • one of your questions or answers is voted down/not useful: −2
    Sum(5) from Votes where Votes.VoteTypeId == 3 and Post.OwnerUserId = 1
  • a post where you had successfully suggested an edit has been deleted (reputation page shows the cause as "removed"): -2
    this is not in SEDE
  • you vote an answer down/not useful: −1
    this is not in SEDE
  • an upvote on one of your questions is removed: −5
    this event is not in SEDE (but not needed if you're intersted in totals)
  • an upvote on one of your answers is removed: −10
    this event is not in SEDE (but not needed if you're intersted in totals)
  • one of your accepted answers loses accepted status: −15
    this event is not in SEDE (but not needed if you're intersted in totals)
  • you unaccept an answer written by someone else to one of your own questions: -2
    this event is not in SEDE (but not needed if you're intersted in totals)
  • you place a bounty on a question: −full bounty amount
    Sum(-BountyAmount) from Votes (unjoinded with Posts!) where Votes.UserId =1
  • one of your posts receives 6 spam or "it is not welcome in our community" flags (formerly known as offensive flags): −100
    this is not in SEDE
11
  • The results I get (when I run the same query on Scifi Stackexchange) aren't making any sense; oi61.tinypic.com/v5zfbd.jpg. The number of days seems accurate but the "times hit the rep cap" are completely wrong.
    – Richard
    Commented Jun 28, 2014 at 10:37
  • @Richard That was somewhat emabarrising but I believe this gives now better results...sorry...
    – rene
    Commented Jun 28, 2014 at 11:44
  • It works much better now, but the cap results are still incorrect. The scores I get with this; data.stackexchange.com/scifi/query/167539/… don't seem to match up
    – Richard
    Commented Jun 28, 2014 at 12:00
  • That query is wrong for the bounties and for upvotes on questions and answers and doesn't take suggested edits into account
    – rene
    Commented Jun 28, 2014 at 12:06
  • It can't be right. According to this, DVK (legendary) has only hit the rep cap 218 times :-) ; data.stackexchange.com/scifi/query/edit/204793#resultSets
    – Richard
    Commented Jun 28, 2014 at 12:53
  • hmmm, there are some subtle differences....
    – rene
    Commented Jun 28, 2014 at 13:05
  • An upvote on a question is only 5 rep, on an answer it is 10. If I compensate the legendary for that the results only diffee for three days. Also the rep cap at 200 or 201? That would be an off-by-one bug.
    – rene
    Commented Jun 28, 2014 at 13:11
  • So how do I correct the rep cap error? Other searches seem to be able to generate an accurate count but I can't see what they're doing right that this is doing wrong
    – Richard
    Commented Jun 28, 2014 at 20:01
  • Well, on this the rep for votetypeid=8 is wrong, I don't know why you get +2 for votetypeid=16 and it doesn't count rep for suggestededits. That are the things I take into account.
    – rene
    Commented Jun 28, 2014 at 20:06
  • So how do we fix it? I mean short of running both searches and then just dumping the output into Excel :-)
    – Richard
    Commented Jun 28, 2014 at 20:18
  • Maybe a wise thing to do is to ask a new question where you bring those different queries together and ask which is the best approximation. I'm happy to dig into that tomorrow...
    – rene
    Commented Jun 28, 2014 at 20:29

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .