Skip to main content

Questions tagged [data-dump]

This tag is about the quarterly creative commons data dumps of all public data in the Stack Exchange network Q&A sites.

-185 votes
19 answers
10k views
+150

Announcing a change to the data-dump process

Please note: We usually avoid posting on Fridays, but with the data dump scheduled for the end of July, we wanted to share this information with the community as soon as possible. We will monitor this ...
Philippe's user avatar
  • 21k
34 votes
10 answers
2k views

Why are people so unhappy about the OpenAI partnership? Am I missing something?

I'm reading the answers at Our Partnership with OpenAI, and a lot of people seem very unhappy about the partnership. I feel violated, cheated upon, betrayed, and exploited. And so on. But... why? ...
HolyBlackCat's user avatar
  • 5,208
150 votes
4 answers
27k views

Shifting the data dump schedule: A proposal

My colleague Rosie recently announced that, because of our difficulties predictably getting data dump files to the Internet Archive[1],[2], we are setting more realistic expectations for completing ...
Aaron Bertrand's user avatar
  • 42.8k
67 votes
0 answers
8k views

Data Dumps Releases: Timeline Updates and Clarification

We release our quarterly Data Dump towards the end of every quarter. We begin processing the Data Dump at the beginning of the last month of a quarter, but for the past three quarters, we’ve run into ...
Rosie's user avatar
  • 13.8k
6 votes
1 answer
145 views

Periodically publish a data dump of all IDs in the image ID space (and make it available in SEDE)

I see that Imgur has an API to query for image IDs in its ID space, but api.stack.imgur.com is not an API endpoint (and SE is moving off Imgur anyway). Say I want to get all the IDs of images in Stack ...
starball's user avatar
  • 26.8k
38 votes
1 answer
743 views

Delay in Data Dump release

Our quarterly publication of the Stack Exchange Network data (the “data dumps”) is currently delayed. While the process of copying and uploading data was scheduled to begin on Sunday, December 3rd, we’...
Rosie's user avatar
  • 13.8k
1 vote
0 answers
154 views

Why is the 'Favorite' count so low in the Stack Overflow data? [duplicate]

In the Stack Overflow database, there is a column called "favorite count". I have found that favorite count indicates users find the question interesting or save it for later review. I have ...
I192058 Misbah Minhas's user avatar
6 votes
0 answers
259 views

How can I get notified about released Stack Exchange dumps?

I am interested in being notified when a new data dump is published on the internet archive. I want to find some way to be automatically notified - ideally an RSS feed, but an email or something else ...
VLAZ's user avatar
  • 13k
71 votes
1 answer
1k views

Possible delay in Data Dump release - now targeting September 5th

As you probably know, it’s time for the quarterly publication of our data (affectionately known as the “data dumps”). The process of copying, uploading, and publishing the data over a notoriously ...
Jody Bailey's user avatar
  • 801
3 votes
1 answer
121 views

Discrepancy in Post Counts between Local MySQL Stack Overflow Database and Data Stack Exchange Query

I recently downloaded the Stack Overflow data from Stack Overflow datadump and imported it into my local MySQL database. While attempting to retrieve the total number of posts for the year 2021 using ...
I192058 Misbah Minhas's user avatar
63 votes
11 answers
4k views

The company's commitment to the data dumps, the API, and SEDE

Many words have been written around the company's commitment to the ongoing existence of the data dumps, the API, and the Stack Exchange Data Explorer (SEDE). Much of that text can be confusing or ...
Philippe's user avatar
  • 21k
3 votes
1 answer
523 views

How can I download a small data set (1 year 2023-2022) from Stack Exchange?

I want to download a small dataset for one year. I didn't find a link that allows me to download 1 year of data or the last five years' data. I tried "https://data.stackexchange.com/stackoverflow/...
I192058 Misbah Minhas's user avatar
-9 votes
1 answer
261 views

Does Stack Exchange Inc. intend to sue firms that use language models that are partly trained on Stack Exchange?

Stack Exchange Inc. has expressed an interest in financially profiting from some companies that use Stack Exchange data to train their model. Many language models are partly trained on Stack Exchange ...
Franck Dernoncourt's user avatar
4 votes
1 answer
170 views

Why are the profileimageurl in the data dump of almost all Stack Exchange websites blank?

I wonder why the profileimageurl in the data dump of almost all Stack Exchange websites are blank.
Ambitiouscoder's user avatar
2 votes
2 answers
196 views

Does Stack Exchange offer any incentives related to use of their data explorer in research?

I'm interested in whether or not there have historically been this sort of partnership with research in the academic space or if it is planned for the future. This could be something like a grant, ...
William Ledbetter's user avatar

15 30 50 per page
1
2 3 4 5
33