43

Zypher promised that data.SE would "soon" get live data:

I realize it's a long weekend, but given that data.SE is now over 2 months out of date, it would be nice if this was finally rolled out.

1
  • Last I heard the data sets were to be updated once per quarter. You probably have to wait another month. But the promise to get live updates from a RO copy sounds promising! Commented Sep 2, 2012 at 9:14

1 Answer 1

57

The data is updated early every Sunday morning around 3:00 UTC

Update 9 Nov 2012 - Done!

Since our datacenter move has been delayed, I was able to use the new server as soon as it arrived. Data.SE now has fresh data and the import is almost fully automated. I'll finish the automation on Monday and we'll have a weekly refresh of data going on. On Monday we'll also get the graduated sites that are missing in there - we haven't forgotten about them.

There may be issues with this new build that we're unaware of - it's a whole new setup. Please comment here if you notice any issues.


Update: we're just waiting on the new server to arrive (ETA 2012-11-09) in Oregon so I can get Data.SE on an auto-update. We should be able to get it up about a week after it comes in (it arrives the same weekend as our New York datacenter move, so it will likely have to sit a few days).


To clarify/set expectations: for data.SE we're talking about a weekly refresh (that's my goal, since it has to be down for a moment when we reload fresh data and it takes time to run). The SEDE George (Zypher) was talking about in chat was actually our internal copy of SEDE (which is just raw data, not transformed or anonymized - much easier to do "live").

We call that "LocalSEDE" internally and just "SEDE" for the public data.SE. So while it will be much fresher, it won't be instant. To get data there we have to transform and move it, and we can't logistically do that continuously, at least not in the foreseeable future.

Rebecca is the one taking lead on this as I'm working on the infrastructure side, but we'll update here in case we run into anything that makes weekly an issue.


Also to clarify, for data.SE we're talking about a weekly refresh or so (that's my goal, since it has to be down for a moment when we reload fresh data).

We have uncovered a critical bug in our network setup paired with SQL Server 2012 clustering... We'll get to the data refresh when Stack Overflow and chat are no longer in danger of going offline. We're working with Microsoft on the issue now.

Data Explorer lives in Oregon, a faster data refresh is predicated on getting a replica up and running out there for all databases, which currently exists for Stack Overflow but not any other sites. To get a data refresh going we need to:

  1. Get the bug fixed and deploy that fix across our network
  2. Get all other sites moved to the second SQL Server 2012 cluster
  3. Move chat to SQL Server 2012
  4. Re-task the chat database server as a 2012 instance that SEDE will run on
  5. Move SEDE to the new server
  6. Re-write the data import process for SEDE to be totally automated and pull from the replicas

We plan on doing all of this, but our primary concern is getting our architecture fixed and moved to support this (for other, more important reasons like moving data centers really). Once that's done then we'll get to SEDE data refresh. Since we are shuffling hardware here and SEDE is dependent on data, it would be a large amount of throw-away effort to get anything working before the above is complete.

The result of this will be no more manual imports, but (hopefully) something like a weekly data refresh to SEDE rather than the somewhat random intervals it's updated on currently. I apologize that things aren't going as planned with our timeline, trust me when I say we are far more frustrated with this than you are. Bear with us, the result will be worth it. SEDE's getting fresher, more reliable data refreshes and a beefier database server to run on.

18
  • Since it sounds like you're rewriting the data import process, would you be able to include deleted questions in the import? Statistics are always skewed because deleted questions are not included in Data.SE.
    – Rachel
    Commented Sep 6, 2012 at 19:26
  • 3
    @Rachel - They are intentionally excluded (not a bug)...I don't think there are any plans to change this. Commented Sep 6, 2012 at 23:41
  • Out of interest, why do you use MSSQL, not something like MySQL?
    – uınbɐɥs
    Commented Sep 7, 2012 at 1:37
  • @NickCraver - not to be a nag, but it would be greatly appreciated if there was an update to this? Not sure about SO DE, but SFF one still has the posts from the latest date of June 27 2012. I know SE was doing some datacenter work last week, was that related to your bullets?
    – DVK
    Commented Oct 15, 2012 at 1:14
  • @DVK we determined the chat server wasn't a good fit (old, no remote management), we'll be ordering a new one for SEDE hopefully this week then ill be able to move things over to that box. The datacenter work is tangentially related...that's around our moving the NY datacenter which is a huge company-wide effort that's eaten most of my time the last 2 months. I haven't forgotten about SEDE, it just has lots of prerequisites (which are mostly done) and some time, which we haven't had any to spare. Commented Oct 15, 2012 at 1:55
  • 1
    @NickCraver - 6 to 8 weeks, then :))) Good luck with the move - I know how fun that can be. Appreciate the update!
    – DVK
    Commented Oct 15, 2012 at 1:57
  • @DVK - Update: Server is now ordered, I'll get to work on SEDE as soon as it's up and running. Commented Oct 15, 2012 at 20:30
  • 4
    So data had been borked for 4 months and there isn't so much as a banner warning about the extremely outdated data? Any idea of a timeline to have it fixed (either updated the "old" way, or the new system)?
    – Chris S
    Commented Oct 19, 2012 at 16:10
  • @DVK Nag him! and also Rebecca Chernoff... they both need much more nagging! I can't do it all myself! Commented Oct 19, 2012 at 17:26
  • @ChrisS The date is published beside every site, so that at least is pretty visible.
    – Adam Lear StaffMod
    Commented Oct 21, 2012 at 15:19
  • 3
    @AnnaLear Although people seem to overlook those dates somehow...I guess the resultsets could indicate somewhere that they're showing "Results as of..." or something, if that'd help any.
    – Tim Stone
    Commented Oct 23, 2012 at 13:07
  • 1
    And a chorus of Angels was heard! Commented Nov 10, 2012 at 5:25
  • Typical... after waiting 4 1/2 months, the query I wanted to run doesn't seem to work. :( Commented Nov 10, 2012 at 6:03
  • You're just clearing the results cache and updating the front page statistics as part of the load process now, I take it?
    – Tim Stone
    Commented Nov 10, 2012 at 10:20
  • 2
    Cheers Nick, the faq should be updated as well, right now it says "Data is updated monthly". :) Commented Dec 25, 2012 at 15:12

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .