13

The creative commons data dump seem to be released irregularly at the moment. The last release is from June 2011.

I would welcome a schedule with more predictable release dates for the data dump. E.g. every second or every third month. At the moment it's 3.5 months since the last data dump, and before that there were releases almost every month.

2 Answers 2

8

It's already been released -- check

I'm losing interest in blogging these each and every time; the archive.org area will be updated as they are released.

0
7

For a long time we have aimed for the end of a quarter for each data dump. Circumstances have occasionally knocked us off course, either other distractions / priorities / deadlines or, in some cases, actual problems with the data dump itself (recent example).

To deal with these issues, we have done a lot of work over the past year improving and automating the process:

  • we have built more automation and robustness around the process itself, so it is much less of a manual effort and hand-holding fragile scripts
  • we have added notifications and reminders in multiple channels; a week before the work so we can prepare and plan, and afterward to go validate that the automation did all the things we expected it to do
  • we have shifted to the first week of the beginning of the traditional calendar quarter as our goal for releasing data dumps
  • we worked with folks at archive.org to help bypass some of the disruptive checks that were forcing manual intervention and sometimes preventing files from being uploaded

We are already organizing for kicking off the next data dump, which should start over the weekend or early next week.

We still have other cases where we might need to get involved or where specific sites might just be skipped; for example, in cases where malware scanners still prevent the upload of specific files.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .