5

According to this answer from 2012, the SE Data Explorer data is updated every Sunday at 3am. If true, is it possible to schedule a cron job to run every Monday and execute this query?

When the query finishes it would need to download the results in CSV format somewhere.

More specifically - is there a simple way I can run a query against SEDE directly from the terminal?


Concerns

I imagine it's a little complicated getting cron authenticated by Stack Exchange Data Explorer. I've never run cron with my own User ID. It always runs as sudo either every 15 minutes to execute updatedb or via scripts in /etc/cron.daily, /etc/cron.weekly and /etc/cron.monthly. SE always wants authentication when I run a DE query as a "normal" user. I have never used cron for signing into a site, running a query, waiting for results, clicking a "download" button, and logging out. It sounds horribly complicated giving cron an X11 session, a GUI Desktop and a browser.

4
  • 1
    uh. Wrong question I suspect. cron dosen't authenticate. cron does nothing but run a command... What you need to ask about is running a SEDE query in CLI Commented Nov 27, 2021 at 0:56
  • @JourneymanGeek Correct. How do you run a query from CLI :) Commented Nov 27, 2021 at 1:01
  • 1
    That dosen't matter at all! Your unix user and SE user have nothing to do with each other. Its the "how do I run a query against SEDE from command line" that's the 'real' question. You'd literally throw the same command you would run in the terminal to cron Commented Nov 27, 2021 at 1:01
  • I don't know D: That's why this is a comment, not an answer. Commented Nov 27, 2021 at 1:02

2 Answers 2

5

Is there a simple way I can run a query against SEDE directly from the terminal?

No, there is not.

The Stack Exchange Data Explorer no longer has an API that can be consumed programmatically.

The supported way is to use/ build a web scraper, with for example Beautiful Soup, or use browser automation, with for example Selenium, to authenticate, click the run button and then click the CSV Download button.

For some scenario's using the Stack Exchange API might be a better alternative, specially if you need unattended operation.

1
  • Thanks for your answer. I'm already using web scraping and Beautiful Soup to automatically get and save lyrics in my python music player so that would be a viable option. The Stack Exchange API looks similar to the Google Gmail API I'm already using to manage daily backups compressed in Gmail messages. Commented Nov 27, 2021 at 15:46
5

There is a way but it's not simple. I'm using a cronjob to archive some queries, e.g. this one with refresh timestamps for each site, in the Wayback Machine.

enter image description here

The cronjob starts a Java program which simulates login (to Stack Overflow via OAuth) (to avoid the captcha), runs the query, and waits for the results. The results are received by the program as JSON (but discarded). Simulating a click on the download link should be possible but I never tried. After that, the program adds the page to the Wayback Machine (in a similar way as what I do here); because the query has already run, it will immediately show the results, even though the Wayback Machine is not logged into SEDE. And to my surprise, the CSV download link even works in the snapshot!

1
  • 1
    Thank you for sharing. The refresh timestamps helps confirm that data dumps are every Sunday and my extract could run every Monday? I just started learning HTML and CSS. last month, so the Java script is above my current skill level. I don't use the Wayback Machine but it is good to know the CSV download link works. Commented Nov 27, 2021 at 15:57

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .