1

I am trying to query SEDE so I can count the upvotes (and I would also like to count the downvotes in its own column) that happen within 90 days after the question or answer was created.

I have tried this but it always gives me a timeout if I add the AND Dateadd:

select top 10 
p.Id, 
p.OwnerUserId, 
p.CreationDate, 
p.PostTypeId, 
p.Score, 
p.CommentCount, 
p.Body, 
v.postid AS [Post Link], 
count(v.postid) as 'UpVote count'

from votes v 

inner join posts p on p.id=v.postid

where (((PostTypeId = 1) OR (PostTypeId = 2)) and VoteTypeId in (2)) 
AND (v.CreationDate = DATEADD(day, 90, CONVERT(DATE, p.CreationDate)))

group by v.postid, p.body, p.Id, p.OwnerUserId, p.CreationDate, 
p.PostTypeId, p.Score, p.CommentCount, p.Body
2
  • 3
    Group by … body? Commented Nov 1, 2023 at 23:50
  • 5
    Also, you have a TOP 10 but no ORDER BY. Which TOP 10 do you want? Commented Nov 2, 2023 at 0:55

1 Answer 1

1

Okay, thank you for the support!

I took out the body and added the ORDER BY, which works.

However, I would like the Body to be displayed too in the output and if I only put body in the select statement, then it tells me that it is not in the Group by.

Moreover, I actually only did the Top 10 to test. I want it without Top 10 (so to the limit of 50k) but this still runs into timeout. Any ideas, how to make this more efficient and less load on the server?

select Top 10 
p.Id, 
p.OwnerUserId, 
p.CreationDate, 
p.PostTypeId, 
p.Score, 
p.CommentCount, 
v.postid AS [Post Link], 
count(v.postid) as 'UpVote count'

from votes v 

inner join posts p on p.id=v.postid

where (((PostTypeId = 1) OR (PostTypeId = 2)) and VoteTypeId in (2)) 
AND (v.CreationDate = DATEADD(day, 90, CONVERT(DATE, p.CreationDate)))

group by v.postid, p.Id, p.OwnerUserId, p.CreationDate, 
p.PostTypeId, p.Score, p.CommentCount

Order by p.Score desc
5
  • If you want SEDE to return all posts, even without including the body, I think you’re in for a tough time. Maybe that would work on a small site but for Stack Overflow that’s millions of rows. Even just the sort would not be able to run within the timeout, I’d expect. Commented Nov 2, 2023 at 15:38
  • 2
    If you want to do this kind of heavy analysis I recommend downloading the data dump and restoring the data on your own system, where you won’t be constrained by application parameters necessary for a scalable shared solution. Commented Nov 2, 2023 at 15:41
  • 1
    Like Stuck said, getting all post rows from SO in a query like this isn't feasible– SEDE itself limits results to 50,000 rows anyway, which means you won't get anything more than that regardless. If you're really wanting everything, you'll need to download the database from the data dump and host the database on something you control, which won't have those limits.
    – zcoop98
    Commented Nov 2, 2023 at 15:48
  • Thank you and sorry, I wasn’t clear enough. I meant, i wanted to make this query so efficient, that SEDE returns the results limited to 50k. Any ideas how to make the query more efficient?
    – Dennisdd
    Commented Nov 2, 2023 at 22:06
  • Returning 50,000 posts (including bodies!) to a web UI is just not going to work. Think about just how much data you're asking to be transmitted to a web page and then rendered to a grid in your browser. Then what are you going to do with 50,000 rows of data? What is the actual goal? Even SELECT TOP (50000) Id, Score FROM Posts ORDER BY Score DESC; takes over a minute. Step back and re-evaluate what you're actually trying to accomplish here, and consider that it would be so much simpler to download the data dump and work on these queries againt your own system. Commented Nov 3, 2023 at 0:17

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .