Use cron to get SE Data Explorer data when it's updated

Question

According to this answer from 2012, the SE Data Explorer data is updated every Sunday at 3am. If true, is it possible to schedule a cron job to run every Monday and execute this query?

When the query finishes it would need to download the results in CSV format somewhere.

More specifically - is there a simple way I can run a query against SEDE directly from the terminal?

Concerns

I imagine it's a little complicated getting cron authenticated by Stack Exchange Data Explorer. I've never run cron with my own User ID. It always runs as sudo either every 15 minutes to execute updatedb or via scripts in /etc/cron.daily, /etc/cron.weekly and /etc/cron.monthly. SE always wants authentication when I run a DE query as a "normal" user. I have never used cron for signing into a site, running a query, waiting for results, clicking a "download" button, and logging out. It sounds horribly complicated giving cron an X11 session, a GUI Desktop and a browser.

uh. Wrong question I suspect. cron dosen't authenticate. cron does nothing but run a command... What you need to ask about is running a SEDE query in CLI — Journeyman Geek, Commented Nov 27, 2021 at 0:56
That dosen't matter at all! Your unix user and SE user have nothing to do with each other. Its the "how do I run a query against SEDE from command line" that's the 'real' question. You'd literally throw the same command you would run in the terminal to cron — Journeyman Geek, Commented Nov 27, 2021 at 1:01
I don't know D: That's why this is a comment, not an answer. — Journeyman Geek, Commented Nov 27, 2021 at 1:02

rene · Accepted Answer · 2021-11-27 08:01:39Z

5

Is there a simple way I can run a query against SEDE directly from the terminal?

No, there is not.

The Stack Exchange Data Explorer no longer has an API that can be consumed programmatically.

The supported way is to use/ build a web scraper, with for example Beautiful Soup, or use browser automation, with for example Selenium, to authenticate, click the run button and then click the CSV Download button.

For some scenario's using the Stack Exchange API might be a better alternative, specially if you need unattended operation.

edited Nov 27, 2021 at 8:01

answered Nov 27, 2021 at 7:55

rene

91.1k17 gold badges241 silver badges511 bronze badges

Thanks for your answer. I'm already using web scraping and Beautiful Soup to automatically get and save lyrics in my python music player so that would be a viable option. The Stack Exchange API looks similar to the Google Gmail API I'm already using to manage daily backups compressed in Gmail messages.
– WinEunuuchs2Unix
Commented Nov 27, 2021 at 15:46

Add a comment |

Glorfindel · Accepted Answer · 2021-11-27 08:09:24Z

5

There is a way but it's not simple. I'm using a cronjob to archive some queries, e.g. this one with refresh timestamps for each site, in the Wayback Machine.

The cronjob starts a Java program which simulates login (to Stack Overflow via OAuth) (to avoid the captcha), runs the query, and waits for the results. The results are received by the program as JSON (but discarded). Simulating a click on the download link should be possible but I never tried. After that, the program adds the page to the Wayback Machine (in a similar way as what I do here); because the query has already run, it will immediately show the results, even though the Wayback Machine is not logged into SEDE. And to my surprise, the CSV download link even works in the snapshot!

answered Nov 27, 2021 at 8:09

GlorfindelMod

253k61 gold badges626 silver badges1.3k bronze badges

1

Thank you for sharing. The refresh timestamps helps confirm that data dumps are every Sunday and my extract could run every Monday? I just started learning HTML and CSS. last month, so the Java script is above my current skill level. I don't use the Wayback Machine but it is good to know the CSV download link works.
– WinEunuuchs2Unix
Commented Nov 27, 2021 at 15:57

Add a comment |

Stack Exchange Network

Use cron to get SE Data Explorer data when it's updated

Concerns

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
support
data-explorer
data-dump
.

Linked

Hot Network Questions

Use cron to get SE Data Explorer data when it's updated

Concerns

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged supportdata-explorerdata-dump.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
support
data-explorer
data-dump
.