63

On May 15th 2023, Imgur will implement its new TOS, in which

We will be focused on removing old, unused, and inactive content that is not tied to a user account.

I, and presumably many other users, have uploaded things to Imgur without logging in, and then posted those images to Stack Exchange, meaning many images are in danger of link rot. In addition, the TOS also bans NSFW content, which while not directly a concern for most Stack Exchange sites, the enforcement of this TOS would mostly rely on automation, which could falsely flag thousands of images.

Can we re-upload those to the dedicated Stack Exchange account on imgur, and replace all the links?

6
  • 9
    Does this answer your question? Is the change of policy of Imgur going to create problems for all posts with images on SO? (from MSO)
    – cocomac
    Commented Apr 21, 2023 at 2:32
  • 5
    @cocomac One can still link images from imgur directly, without using the Stackexchange image system.
    – qazwsx
    Commented Apr 21, 2023 at 3:12
  • 2
    Then that's no different from hosting images on other random image hosting sites with aggressive, radical, or unknown retention policies. Basically, your own responsibility if you ignore or subvert the mechanism provided for embedding images on the site itself, and probably just another reason to delete content which is not suitable for the long term.
    – tripleee
    Commented Apr 21, 2023 at 7:30
  • 5
    @tripleee - Since Imgur hosts Stack Imgur images, it's not unreasonable to assume that regular Imgur posts would be stable as well. Also, Stack Imgur posts are (were?) limited to 2MB, while the regular Imgur upload limit is larger. Cases where someone failed to upload to Stack Imgur because of the size and moved to regular Imgur definitely aren't unheard of.
    – Mithical
    Commented Apr 21, 2023 at 7:54
  • 7
    Thank you for accepting my answer, but I feel that's a bit premature. The problem hasn't been solved yet, I'm sure we (the community and/or the company) will solve it, but we're not there yet :)
    – Glorfindel Mod
    Commented Apr 28, 2023 at 20:00
  • 1

6 Answers 6

16

As Spencer indicated, we have updated all of the URLs that we could, across the network, to the new i.stack.imgur.com.

A little more background on what I did; hopefully, this helps explain why it took nearly a week:

  • Spencer approached me and asked if I could help with this issue. I said, "Yes, of course," of course.

  • I downloaded and started looking at the output file (thanks @Glorfindel!).

  • I manually mapped all of the API site parameters listed in the output file to database names (a few were trial and error, like rus/russian/video did not point to the database names I expected).

  • Replaced all of the ## {API site parameter} with USE {database name};.

  • Eliminated all instances of http: and https: in the output file (since I found many cases where the file indicated one but the post contained the other). Removing the protocol prefix prevented me from having to replace both, but would lead to other work.

  • Built two SQL statements from each row in the output file under ### Successful replacements:

    | 15 | https://i.imgur.com/DvpvklR.png 
         | https://i.sstatic.net/jEIKP.jpg 
         | 36199898,40503494,41611512,45018676,45635398,45712604,
           45880438,48328568,49699797,52905718,53318239,55318799,
           55366144,60723134,61174546 |
    

    ...became a much more elaborate...

    /* 15 */ UPDATE Posts SET Body = REPLACE(Body, 
                 N'//i.imgur.com/DvpvklR.png',
                 N'//i.sstatic.net/jEIKP.jpg')
               WHERE Id IN (
        36199898,40503494,41611512,45018676,45635398,45712604,
        45880438,48328568,49699797,52905718,53318239,55318799,
        55366144,60723134,61174546
      );   
    
    UPDATE PostHistory SET Text = REPLACE(Text, 
                 N'//i.imgur.com/DvpvklR.png',
                 N'//i.sstatic.net/jEIKP.jpg')
               WHERE PostId IN (
        36199898,40503494,41611512,45018676,45635398,45712604,
        45880438,48328568,49699797,52905718,53318239,55318799,
        55366144,60723134,61174546
      );        
    
  • We had to update both tables because, if we don't also update the history, anyone who edits the post would get the most recent version from history (or, if they revert to an older version), and that would contain the old URL.

I ran those statements against a handful of sites on Thursday. Initially, I just did the following sites in a very controlled way:

  • StackOverflow
  • ServerFault
  • SuperUser
  • Meta.StackExchange
  • WebApps
  • WebApps Meta
  • Gaming

Then, I updated all instances of https://i.sstatic.net/ to https://i.sstatic.net/ on those sites. This is not something to do lightly because, on Stack Overflow for example, trying to update thousands of posts/history will block all of you. It would also potentially cause issues downstream (we use availability groups and don't want to jam too much into the pipe at once). So I had to spread this out in batches, like this, and it worked like a charm:

EXEC {databasename}.sys.sp_executesql N'

  SET NOCOUNT ON;
  DROP TABLE IF EXISTS #p;

  SELECT Id = Id + 0 INTO #p FROM Posts 
    WHERE Body LIKE @src

  INSERT INTO #p(Id) SELECT PostId FROM PostHistory 
    WHERE Text LIKE @src;

  DECLARE @Id int, @c cursor;

  SET @c = CURSOR FOR SELECT DISTINCT Id FROM #p;
  OPEN @c; FETCH NEXT FROM @c INTO @Id;

  WHILE @@FETCH_STATUS <> -1
  BEGIN
     UPDATE Posts SET Body = REPLACE(Body, @src, @dest)
     WHERE Id = @Id AND Body LIKE LIKE N''%'' + @src + N''%'';

     UPDATE PostHistory SET Text = REPLACE(Text, @src, @dest)
     WHERE PostId = @Id AND Text LIKE N''%'' + @src + N''%'';

    FETCH NEXT FROM @c INTO @Id;
  END',
  N'@src nvarchar(255), @dest nvarchar(255)',
    @src  = N'https://i.sstatic.net/',
    @dest = N'https://i.sstatic.net/';

This took a long time, but it was completely non-disruptive from every measure I could take, so it gave me the confidence to run the remainder of the script against the other sites with a lot less hand-holding (and using sp_ineachdb instead of {specific db}.sys.sp_executesql). This finished over the weekend.

All told, across the network, we updated:

  • Over 200,000 posts affected by the explicit URLs here.
  • Over 1,500,000 URLs with http:// (most of those not from this scope).

There are still:

These are probably a mix of failures/uncertainties from the original output file, and possibly new images that were uploaded since the original script was run. If @Glorfindel can run their script again, we can probably continue to chip away at this.

Keep in mind that some can't ever be fixed, at least automatically, because:

  • they have been removed from imgur and the wayback machine (and in some cases, the user no longer exists or hasn't been seen in years; example)
  • they link to missing galleries, not individual images (example)
  • they link to videos or very large animated gifs (example) - some of these may have failed on first run of the script, but may work next time; others will continue to exceed size limits
  • they link to the imgur website on purpose (example)
49

Of course, things like this always happen during my holiday ... not that I take that many, but still ... Anyway, here's what I (read: my script) did: analyze almost 80k of posts containing links to i.imgur.com and almost 35k of posts containing links to imgur.com and try to upload these to the corporate repository.

Here are the results - note that the script doesn't know about the official site names like 'Arqade' and uses the API site parameter (gaming) instead. Some statistics:

  • 155k images have been uploaded
  • Some images (about 2k) could not be uploaded because they were over 2 MiB, or because of the MIME type (the uploader only supports PNG, JPG and GIF) and they have been archived to the Wayback Machine
  • Sometimes, images have already been removed, like http://i.imgur.com/uqJeW.png. The script tries to fetch those from the Wayback Machine and upload them, but the Wayback Machine snapshot may not be reliable. This question refers to an image URL that does not seem to match the Wayback Machine copy, right?

The results have been grouped per site, and the last column contains the IDs of the posts which have that image. Because there are so many different links (images are likely to be used in a single post), the mass replacement tool is not an option here. Perhaps Aaron Bertrand can help out like he did here; otherwise, we shouldn't bump too many posts only because they might be broken in the future.

13
  • 1
    Speaking of your update, back when Imgur used to expire images before 2015, the image URL would be freed, and sometimes you'd find that a new image would end up by coincidence taking the same URL as a previous expired image. Commented Apr 29, 2023 at 19:24
  • Oh yeah, that's a common problem with image hosters ...
    – Glorfindel Mod
    Commented Apr 29, 2023 at 19:34
  • 3
    I'm looking to manually edit the failed reuploads in Arqade to meet SE Imgur’s requirements. In the results tables, the “failures” do not have their post IDs listed, unlike the “successes”. Is this intentional? Commented May 1, 2023 at 8:40
  • 1
    @galacticninja more or less - the script (part of a larger collection trying to repair broken things) assumes that if it can't do anything, it doesn't matter what the post ID is. You can still search for the image URL - that's faster than waiting for a rerun of my script :) Oh, and thanks for helping out - there's quite a few large screenshots on Arqade to fix!
    – Glorfindel Mod
    Commented May 1, 2023 at 8:49
  • 2
    Thank you very much! I've a couple of questions: (1) Are you planning to do the same for the https://imgur.com URLs? (2) Do you know if there's there any way to find out which images on imgur are "not tied to a user account"? Commented May 3, 2023 at 10:05
  • 3
    @Scortchi-ReinstateMonica I didn't know about (1), so thanks for the pointer. I have no idea for (2)...
    – Glorfindel Mod
    Commented May 3, 2023 at 10:15
  • 2
    @galacticninja the updated results file has post IDs for failures as well.
    – Glorfindel Mod
    Commented May 11, 2023 at 18:23
  • 1
    @Scortchi-ReinstateMonica the script has now run for imgur.com links as well, and I've updated the results file.
    – Glorfindel Mod
    Commented May 11, 2023 at 18:24
  • @Glorfindel: Thank you so much Commented May 11, 2023 at 18:25
  • @Glorfindel Do you know whether there are (m)any such links to imgur.com or i.imgur.com in comments, too? Commented May 14, 2023 at 12:07
  • 1
    @TheAmplitwist yes but the API doesn't let you search for comment text.
    – Glorfindel Mod
    Commented May 14, 2023 at 12:15
  • Hi, has it already been decided? Commented May 28, 2023 at 8:54
  • 7
    This is all done - updated 200K+ posts referenced under the "successful replacements" except for one that didn't work (search for 0pFsYTj.jpg). Updated 1MM+ posts from http://i.stack... -> https://i.stack.... I did not deal with any of the uncertainties/failures. Can you run the script again? You should see a much smaller output but possibly some URLs that have been posted in the meantime (this work took the better part of a week).,
    – Aaron Bertrand Staff
    Commented Jul 17, 2023 at 1:27
15

This is an index to announcement posts on per-site metas (sorted by site name alphabetically).

Site Link
Android Enthusiasts link
Anime & Manga link
Arqade link
Ask Different link
Ask Ubuntu link
Code Golf and Coding Challenges link
Cross Validated link
English Language & Usage link
English Language Learners link
Mathematics link
MathOverflow link
Movies & TV link
Pets link
Science Fiction & Fantasy link
Stack Overflow link
Super User link
Unix & Linux link
14

To quote Cerbrus' answer to a similar question on MSO:

SE runs a corporate version of Imgur, so image persistence is guaranteed.

I also see no valid reason for nudity on SO, so that seems extremely unlikely to have any (negative) effect.

That corporate Imgur being referred to is https://stack.imgur.com.

That last point in the above quote should actually apply in general to all sites in the Stack Exchange network: See the Terms of Service's Acceptable Use Policy section, which states:

Sexually Explicit Material. Accounts that use Stack Exchange to post sexually explicit or pornographic material, or links to it, will be suspended.

But nudity does not necessarily equate to being sexually explicit. Whether nudity is appropriate will vary with context. Ex. it may be necessary (and non-sexually-explicit) on some Biology-related posts.


You do have some point about linked images that don't use SE's corporate Imgur. And yeah... that'll suck (just the same as it has when other image hosting sites that people use go down).

Quoting @Mithical:

Keep in mind that regular Imgur was used before the Stack Imgur agreement was made, and has also been historically used for large images that exceeded the 2MB limit on Stack Imgur. There are thousands of these images across the network; this does indeed effect us not-insignificantly.


As for how many of these images exist, you could search for it using stackexchange.com's search bar, which searches across all network sites: See the following queries: url:https://imgur.com (currently 27,993 results), url:https://i.imgur.com (currently 68,628 results). and url:imgur.io (currently 6 results). To be clear, both those domains are for "regular" (non-Stack-Exchange-corporate) Imgur.

8
  • 11
    Keep in mind that regular Imgur was used before the Stack Imgur agreement was made, and has also been historically used for large images that exceeded the 2MB limit on Stack Imgur. There are thousands of these images across the network; this does indeed effect us not-insignificantly.
    – Mithical
    Commented Apr 21, 2023 at 9:20
  • 6
    Would this be prime opportunity for someone to write something that would import all these to i.stack & shrink to under 2MB [not me, btw, I can't code "Hello World" without a tutorial.] Re imgur licensing - webapps.stackexchange.com/questions/11100/imgur-image-license
    – Tetsujin
    Commented Apr 21, 2023 at 9:48
  • 10
    @Tetsujin - I'm hopeful that Glorfindel's Broken Image Repairer can be directed towards this before everything breaks, but we'll see what he says.
    – Mithical
    Commented Apr 21, 2023 at 9:52
  • the one weird thing here is that that answer seems to conflate nudity with sexually explicit. They don't need to have any correlation at all. If imgur remove images containing nudity then there will be some sites (SO probably not so much) that could have issues
    – Rory Alsop
    Commented Apr 21, 2023 at 11:31
  • 4
    Some of these seem to be videos - and pretty large. I do wonder what's the right solution for these. Commented Apr 21, 2023 at 13:14
  • I also suspect that depending on how Imgur goes about things, if they have some sort of automated filter that's going to trawl through and delete content, that there will be false positives (and depending on the quality, a lot of them). If it won't be run on anything specifically hosted for StackExchange, that's fine, but I've seen things that filter for "explicit material" also trigger on photos of capybaras, for example, which would worry me if such a filter was applied to StackExchange content. Commented Apr 21, 2023 at 15:57
  • 3
    @Mithical are we allowed to re-upload images licence-wise to the Stack instance? Or can only the OP do that?
    – rene
    Commented Apr 25, 2023 at 10:49
  • 2
    @rene - IANAL, but Imgur has a "perpetual, irrevocable worldwide license" to distribute user-submitted content, and it's still Imgur distributing it. Also, it says you can use it for "personal, non-commercial uses" and anything that falls under "fair use"; educational SE content should qualify for one or both of those.
    – Mithical
    Commented Apr 25, 2023 at 10:58
12

We have completed the work on updating these links. Major props to Glorfindel for doing the legwork that they did and saving us some time on working through them.

4

As mentioned in starball's answer, images hosted on i.stack.imgur.com should be fine. If someone is interested in checking and fixing their own images, the following SEDE query returns links to all images (even when the post contains multiple images):

WITH CTE AS (
  SELECT Id,
    SUBSTRING(Body, CHARINDEX('<img src="', Body) + 10, CHARINDEX('"', Body, CHARINDEX('<img src="', Body) + 10) - CHARINDEX('<img src="', Body) - 10) AS Img,
    SUBSTRING(Body, CHARINDEX('"', Body, CHARINDEX('<img src="', Body) + 10), LEN(Body)) AS Rest
  FROM   Posts
  WHERE  Body LIKE '%<img src%'
         AND OwnerUserId = ##UserId##
  UNION ALL
  SELECT Id,
    SUBSTRING(Rest, CHARINDEX('<img src="', Rest) + 10, CHARINDEX('"', Rest, CHARINDEX('<img src="', Rest) + 10) - CHARINDEX('<img src="', Rest) - 10) AS Img,
    SUBSTRING(Rest, CHARINDEX('"', Rest, CHARINDEX('<img src="', Rest) + 10), LEN(Rest)) AS Rest
  FROM   CTE
  WHERE  Rest LIKE '%<img src%'
 )
SELECT Id AS [Post Link], Img AS [Image Link]
FROM CTE;

If you're only interested in finding images that are not hosted on i.stack.imgur.com, you can add WHERE Img NOT LIKE '%://i.stack.imgur.com/%' to the end. Here's a direct link.

11
  • 2
    I would have expected the SE staff to step up and fix these links automatically. It would probably not take much for a software engineer with access to the right APIs to automate it, while it is a lot of work for us users. Commented May 13, 2023 at 10:58
  • @FedericoPoloni We did (see my answer), it just required some planning, and wasn't an immediate priority because the images weren't broken yet.
    – Aaron Bertrand Staff
    Commented Jul 19, 2023 at 17:47
  • @AaronBertrand Great, many thanks! Small suggestion: the next time something similar happens, I suggest announcing that the SE staff are planning to take care of the task, to avoid other users doing duplicate manual work. Commented Jul 19, 2023 at 21:22
  • “Why doesn’t SE staff do something about it!” … “great, they did something about it, even though they didn’t have to do anything, but let’s find some other reason to complain!” It’s no wonder staff participation here has declined. Commented Jul 19, 2023 at 23:45
  • @Federico Did you tell anybody you were doing manual work? This page wasn’t updated in some time, so I don’t know how staff would know they were duplicating effort. Can you provide some examples of these manual edits people performed? Is it really wasted anyway? People get rep and/or badge progress for edits. If it’s too much work, don’t do it? Commented Jul 19, 2023 at 23:48
  • 1
    @Stuckat1337 I wasn't doing manual work myself. Users like Glorfindel and Zach Teller were, though. I started my comment with thanks and tried to keep it constructive; I am grateful that they fixed this problem. Commented Jul 20, 2023 at 6:31
  • 1
    @Stuckat1337 Also I don't get your logic about "This page wasn’t updated in some time": should we update this page every time we fix a link? Commented Jul 20, 2023 at 6:43
  • @Federico It comes across like “gee, thanks for the free sandwich, but next time use better bread, more meat, and would it kill you to include a free beer?!?” Of course I wouldn’t expect an update for every single link fix. But if you’re doing a lot, and the reason to ask staff for anything is to avoid bumping posts for a lot of edits, when you don’t say anything at all it seems unreasonable to blame staff for not somehow knowing that. Commented Jul 20, 2023 at 11:00
  • 1
    @Stuckat1337 I have never seen this as a "free sandwich" relationship. Users contribute free content to the network, which helps provide ad views, they moderate content, and keep the network alive. SE manages the hosting, monetizes the ads, sponsorships, and private teams, and pays staff to perform all sorts of network maintenance. It's a relationship where both provide something and gain something. Commented Jul 20, 2023 at 11:42
  • 2
    @FedericoPoloni Actually, we did reach out to Glorfindel directly. They didn't respond, I assumed because of the broader moderator action, not because they were too busy manually updating posts (I see no indication they were doing that at all - the answer you reference actually asks for staff help precisely so they wouldn't have to do that). And as it turns out, Zach was updating post(s) on one site, that were not duplicates at all, because they focused on failures/uncertainties, which I certainly didn't touch.
    – Aaron Bertrand Staff
    Commented Jul 20, 2023 at 11:45
  • @AaronBertrand Thanks for the info! Commented Jul 21, 2023 at 15:48

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .