Roadmap to HTTPS: serving and uploading HTTPS-images only

Question

This is a cross post from mSO, but affecting the whole network. 1. is already enabled network wide, 2. and 3. will be enabled shortly on mSE and another SE community (main site + meta) of our choosing for the first round of site-wide HTTPS testing. Read the full master plan!

As requested by @patrick, lets go kill dead images. We don't have on-demand review queues, and editing a list of posts works OK for small batches¹, let's take crowdcrafting.org / pybossa for a spin. So, without further ado:

I need a bosse²

2017-03-07 (meta.)security.SE processed, (only) 28 tasks added to ^^
2017-03-08 serverfault, superuser and their metas processed, 1069 tasks
2017-03-09 25 main SE sites processed, 7911 (somebody call the police) tasks
2017-03-09 deleted mSE posts have been removed from the crowdsourcing queue
2017-03-10 rest of the public SE sites done, 6225 tasks for the queue
2017-03-11 fixed 2 URLs on askubuntu.com which just received https support, but affected ~2k posts
child metas for .SE sites will be processed after the meta move

In the next 6-8 weeks, we'll be rolling out some changes to address the mixed content issue on the first page load, namely:

Imgur URLs are converted from HTTP to HTTPS. This will be a HTML baking change, so no Markdown will be affected. This has already shipped, and all the old posts were rebaked.
Prevent submission of posts that contain HTTP images. Instead of the HTTP image the markdown editor will show an additional error... ... with an option to bring up the image uploading tool with the URL pre-populated, so you can easily upload it to Imgur (we can't do that on your behalf, because cc-by-sa, and whatnot):
After this is enabled, old posts that have HTTP images, accessible via HTTPS, will be edited accordingly. This will be an actual markdown change, attributed to the "URL Rewriter Bot"; posts won't be bumped. (example)
Rebake remaining posts with HTTP images, so that images linking to HTTP addresses will become links - most of those are dead anyway. This will be a HTML baking change, so no Markdown will be affected, but it'll remove any mixed content even when viewing old revisions.

_{[1] ... and inspired by Let's rescue wayward resource requests! (trial run)} _{[2] ... depending on how this works out, we might use it for other stuff too}

@Stijn If we're being pedantic, lowercase-m is a more proper way. Prevents confusion of Meta Stack Exchange and Math Stack Exchange. — Undo, Commented Mar 6, 2017 at 16:13
@Undo Good thing we don't have any other sites with names that start with M or else we'd be in real trouble. — Adam Lear, Commented Mar 6, 2017 at 21:59
@FrenzyLi "Skip" would be better, "Done" throws it out of the queue, let me add some instructions there — m0sa, Commented Mar 7, 2017 at 9:04
@m0sa I didn't even think about this in the beginning and clicked "Done" for about about 20 cases. Is it possible to revert my actions? — Frenzy Li, Commented Mar 7, 2017 at 9:06
@FrenzyLi don't sweat it, it takes two "Done" responses for the task to be completed — m0sa, Commented Mar 7, 2017 at 9:23
@m0sa highly likely Frenzy Li and I could have clicked Done on the same posts, since the same posts repeated quite a few times for me. — muru, Commented Mar 7, 2017 at 9:36
The crowdcrafting site thing seems to have an awful lot of un-editable posts that turn out to have been migrated to MSU/MSO/whatever. There's a number of things that would bore me straight out of contributing, and getting four or five of these from the edit button is definitely one of them. — E.P., Commented Mar 7, 2017 at 15:44
that's totally my bad.. added another button so you can dismiss those, calls the api to see if the post is there or not, to make your life easy... — m0sa, Commented Mar 7, 2017 at 16:21
Okay, so I tried using the export here crowdcrafting.org/project/sehttpimagescleanup/tasks/export and I'm a little concerned the input data may not all be correct. It starts out sane: "info" : { "BaseHostAddress" : "meta.stackexchange.com", "PostId" : "150", ... but then looking for entries for cooking, I find: "info" : { "askubuntu.com" : "cooking.stackexchange.com", ... "149" : "1561" — Cascabel, Commented Mar 12, 2017 at 14:55
ugh, looks like I've imported a CSV without headers and it took the first row's values as column names... I've created a merged, cleaned up CSV file, but unfortunately the pybossa API dosen't have a bulk delete endpoint... So, instead, I made the UI great again, by forcing it to support the alt-columns... — m0sa, Commented Mar 12, 2017 at 16:26

Shadow Wizard · Accepted Answer · 2017-03-07 09:16:47Z

36

Nice work on finally bringing HTTPS support in!

I want to note something on point 4 (emphasis mine):

Rebake remaining posts with HTTP images, so that images linking to HTTP addresses will become links - most of those are dead anyway.

If most of those links are dead anyway, why not go through the hassle once and check if they are actually dead or not. If they are, just remove them! Why would we need to rely on the community to fix one by one if we can do this automated at once?

It seems automating this is difficult, so m0sa set up a crowd sourcing project to let us do the work manually. Please contribute if you can! About 2500 posts to go...

edited Mar 7, 2017 at 9:16

Shadow Wizard

172k32 gold badges424 silver badges841 bronze badges

answered Mar 6, 2017 at 15:37

Patrick Hofman

91.6k19 gold badges142 silver badges341 bronze badges

2

we do check, note point 3. - old posts that have HTTP images, accessible via HTTPS, will be edited accordingly, 4. is just cleaning up what's left
– m0sa
Commented Mar 6, 2017 at 15:39
10

we can't just remove them
– m0sa
Commented Mar 6, 2017 at 15:42
4

Oh yes you can :) . Why not?
– Patrick Hofman
Commented Mar 6, 2017 at 15:42
8

they have an alt text or something describing what is mean to be there
– m0sa
Commented Mar 6, 2017 at 15:43
1

we do put the alt text in the link text
– m0sa
Commented Mar 6, 2017 at 15:49
4

would you rather dig through revision history than have it right there?
– m0sa
Commented Mar 6, 2017 at 15:50
17

if we edit it out, the next person editing the post won't get the HTTPS only images warning, which might prod them into finding a replacement image
– m0sa
Commented Mar 6, 2017 at 15:54
3

Related analysis of dead links of SO (disclaimer...it's my post)
– Andy
Commented Mar 6, 2017 at 16:15
1

@PatrickHofman see update, TL;DR -> crowdcrafting.org/project/sehttpimagescleanup
– m0sa
Commented Mar 7, 2017 at 1:59
4

@PatrickHofman with the information, somebody who cares can find a different image or dig through the Wayback Machine to get it. Without the information, and (as in most cases) without meaningful alt text or sufficient description in the post body, we've got no idea.
– Monica Cellio
Commented Mar 7, 2017 at 3:14
1

It's a one off process, so they get processed as well, in case they ever get undeleted... But I guess in that case we're OK with a broken link? You're right though, I could've skipped the deleted ones when adding them to the queue. I'm totally planing to reuse the queue for other sites as well, so I'll exclude deleted posts next time
– m0sa
Commented Mar 7, 2017 at 8:08
2

Dead images can be fixed by providing a working URL, if it's clear what was supposed to be there.
– Raphael
Commented Mar 7, 2017 at 9:14
2

@m0sa Can you make a new list for us containing only visible posts? We are now spending tons of time on deleted posts. I haven't actually seen a visible post in the queue yet.
– Patrick Hofman
Commented Mar 7, 2017 at 11:01
3

After 10 deleted posts, I stopped running through the system. Better to exclude them if you do it again.
– Knossos
Commented Mar 7, 2017 at 11:34
1

@AdamKatz not really, since IMHO the steps listed above already get rid of mixed content... the crowdsourcing thing is just an afterthought, a replacement for editing lists of post ids in a markdown post (see footnote 1). It's basically my attempt of putting lipstick on a pig. Anyone from the community could've done it, even on SO, since the lists of changed post IDs are public there as well. So I apologize if the execution isn't ideal, but it's much better than what we did till now IMO.
– m0sa
Commented Mar 8, 2017 at 7:14

| Show 13 more comments

Martin Schröder · Accepted Answer · 2017-03-07 09:14:23Z

6

Please make it so that protocol relative URLs (i.e. //foo.bar/fum.png) will still be possible (you write that you will require https:) and are left alone.

edited Mar 7, 2017 at 9:14

answered Mar 7, 2017 at 6:41

Martin Schröder

1,2881 gold badge7 silver badges26 bronze badges

5

Some images can only be served over HTTP, others only over HTTPS. Using a protocol relative URL might break things.
– Patrick Hofman
Commented Mar 7, 2017 at 7:50
5

@PatrickHofman I think Martin is (also?) referring to posts that currently have images with a protocol-relative URL In them. Since these are presumably already available over HTTPS, it would make sense to leave this kind of URI alone (as in: don't convert them to a link) as they'd get loaded over HTTPS if the embedding page is also loaded over HTTPS, anyway.
– user2428118
Commented Mar 7, 2017 at 8:28
I could agree with that, although it is little work to update them too.
– Patrick Hofman
Commented Mar 7, 2017 at 8:31
1

@user2428118: Indeed. Also intra-stackexchange links in comments without the schema are shorter, which enables us to write longer comments. :-)
– Martin Schröder
Commented Mar 7, 2017 at 9:20
7

There is absolutely no sense in using protocol-relative URLs anymore. All clients support (or should) https, and TLS is not slow anymore. The only reason protocol-relative URLs were useful was to speed up load times when the page was loaded over http anyways. Now it's no use to use this hack syntax to explicitly downgrade the security of the asset being linked to (image) if it can be served over https. Plus it breaks the filesystem scheme. joonas.fi/2016/12/27/stop-using-protocol-relative-urls
– joonas.fi
Commented Mar 7, 2017 at 16:09
14

@MartinSchröder Did you even read the link in your answer? Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset.
– mbomb007
Commented Mar 7, 2017 at 16:47

Add a comment |

Daniel A. White · Accepted Answer · 2017-03-07 19:55:41Z

4

Is anyone else having problems with imageshack images? They keep coming back with failures.

answered Mar 7, 2017 at 19:55

Daniel A. White

3,50219 silver badges31 bronze badges

8

That's pretty stock for Imageshack, they are very fond of deleting vaguely old images as soon as they get an excuse. (One of the reasons we're so insistent on imgur hosting these days, in fact.)
– Nathan Tuggy
Commented Mar 7, 2017 at 21:13
Do you have the user script installed mentioned halfway this answer: meta.stackexchange.com/questions/263771/…
– rene
Commented Mar 9, 2017 at 12:15
Do know that I have gone over all imageshack images on MSE a couple of weeks ago and rescued what was still available.
– rene
Commented Mar 9, 2017 at 12:39

Add a comment |

Cai · Accepted Answer · 2017-03-13 15:01:42Z

I've written the following snippet (also posted at Help us fix broken images!) to filter and return the Crowdcrafting tasks by site.

It currently returns up to 100 tasks (the maximum the API allows). It looks like a lot of sites have fewer affected posts than that, there's no instant way to get further posts for the ones that do though. It is possible to paginate results through the API (see the last note here), so maybe I'll look at adding that later.

I've included links to the SE post (both view and edit links) and the Crowdcrafting task page so that you can hit "Done" on the task, which should eventually get you more tasks (it takes 2 "Done"s to remove the post from the queue I believe).

Just pick a site, hit "Get Tasks" and work through the links...

function getget(url, callback) {
  var xhr = new XMLHttpRequest();
  xhr.addEventListener('load', function() {
    var response = JSON.parse(xhr.responseText);
    callback(response);
  });
  xhr.open('GET', url);
  xhr.send();
}

function loadSites() {
  var url = 'https://api.stackexchange.com/2.2/sites?pagesize=300&filter=!*L6Sij27hkbD7Gso';
  getget(url, listSites);
}

function listSites(sites) {
  var goBtn = document.getElementById('getTasks');
  var sitesList = document.getElementById('sites');

  for (var i = 0; i < sites.items.length; i++) {
    var siteUrl = sites.items[i].site_url.replace(/^https?\:\/\//i, "");
    var opt = document.createElement('option');
    opt.value = siteUrl;
    opt.textContent = sites.items[i].name;
    sitesList.appendChild(opt);
  }

  goBtn.innerText = 'Get Tasks';
  goBtn.disabled = false;
}

function listTasks(tasks, el) {
  for (var i = 0; i < tasks.length; i++) {
    var task = tasks[i],
      info = task.info,
      taskID = task.id,
      // borked column headers again...
      postID = info['PostId'] || info['12'] || info['149'] || info['73'] || '',
      siteName = info.BaseHostAddress || info['meta.serverfault.com'] || info['askubuntu.com'] || info['sound.stackexchange.com'] || '';

    lastID = taskID;

    var span = document.createElement('span');
    span.innerText = 'Post ' + postID + ':';

    var seViewLink = document.createElement('a');
    seViewLink.href = '//' + siteName + '/questions/' + postID;
    seViewLink.innerText = 'View';

    var seEditLink = document.createElement('a');
    seEditLink.href = '//' + siteName + '/posts/' + postID + '/edit';
    seEditLink.innerText = 'Edit';

    var ccLink = document.createElement('a');
    ccLink.className = 'ccLink';
    ccLink.href = '//crowdcrafting.org/project/sehttpimagescleanup/task/' + taskID;
    ccLink.innerText = 'Crowdcrafting Task ' + taskID;

    var li = document.createElement('li');
    li.appendChild(span);
    li.appendChild(seViewLink);
    li.appendChild(seEditLink);
    li.appendChild(ccLink);
    results.appendChild(li);
  }
}

function init() {
  var goBtn = document.getElementById('getTasks');
  goBtn.addEventListener('click', function() {

    var results = document.getElementById('results');
    results.innerHTML = '';

    var site = document.getElementById('sites').value;
    var searchUrl = '//crowdcrafting.org/api/task?project_id=4667&limit=100&info=BaseHostAddress::' + site;
    getget(searchUrl, listTasks, results);

    // task columns are borked... let's run this a few times
    var searchUrl = '//crowdcrafting.org/api/task?project_id=4667&limit=100&info=meta.serverfault.com::' + site;
    getget(searchUrl, listTasks, results);
    var searchUrl = '//crowdcrafting.org/api/task?project_id=4667&limit=100&info=askubuntu.com::' + site;
    getget(searchUrl, listTasks, results);
    var searchUrl = '//crowdcrafting.org/api/task?project_id=4667&limit=100&info=sound.stackexchange.com::' + site;
    getget(searchUrl, listTasks, results);
  });
  loadSites();
}

// go!
init();

ul { list-style: none; margin: 1em 0; padding: 0; }
li { margin: 0; padding: .5em 0; }
span { display: inline-block; width: 6em; }
a { color: #fff; background-color: #03A7DD; border-radius: 4px; padding: .25em .5em; margin: 0 .5em 0 0; text-decoration: none; }
a.ccLink { background-color: #2B9884; }

<label>Site: <select id="sites"></select></label>
<button id="getTasks" disabled>Loading Sites...</button>
<ul id="results"></ul>

Note, links in Stack Snippets don't really work... just open them in a new tab (ctrl+click, middle-click, right-click+"Open in New Tab" or whatever)

E.P. · Accepted Answer · 2017-03-07 12:55:23Z

2

This is nice =).

It would be nicer if someone can write a SEDE query that can find old posts that need fixing - i.e. posts with non-stack.imgur images included.

And also, while we're here, it would be nice if you guys could eliminate dependences to profile images from cdn.facebook and other trackersy domains.

answered Mar 7, 2017 at 12:55

E.P.

19.5k4 gold badges49 silver badges82 bronze badges

2

crowdcrafting.org/project/sehttpimagescleanup. (does not really work right now, stuck on deleted posts, but it should bring all posts with dead images.)
– Shadow Wizard
Commented Mar 7, 2017 at 12:57
Indeed, as Shadow Wizard says. See my answer here and its related question about the progress.
– Patrick Hofman
Commented Mar 7, 2017 at 13:04
3

I've been using this query; does that help?
– Mithical
Commented Mar 7, 2017 at 13:24
the api endpoint for enumerating tasks is public, you can get the post ids from there
– m0sa
Commented Mar 10, 2017 at 12:18
@Mithrandir That seems to no longer work.
– E.P.
Commented Mar 21, 2017 at 16:07

Add a comment |

Ilmari Karonen · Accepted Answer · 2017-04-03 19:14:46Z

2

I just decided to try the crowdcrafting.org link in the question above, and the first task I received contained a link to https://meta.security.stackexchange.com/questions/227.

Obviously, that's not going to work (unless I edit the URL or manually add a security exception for the invalid certificate), since it's still using the old meta.security hostname. I guess that something that should be fixed?

^{Also, it turns out I can't handle that task anyway, since I'm 22 rep points short of having edit privileges on security.SE meta. But that's not really something the crowdcrafting site could possibly know.}

edited Apr 3, 2017 at 19:14

answered Apr 3, 2017 at 18:47

Ilmari Karonen

31.7k5 gold badges91 silver badges168 bronze badges

1

thanks, I've fixed the links, they now point to the old HTTP address, which redirects to https by default now.
– m0sa
Commented Apr 4, 2017 at 8:13

Add a comment |

Yet Another User · Accepted Answer · 2017-05-22 18:50:48Z

2

I found a bug: First task I got was http://security.stackexchange.com/questions/17393 which was complaining about an img tag which was in backticks so it wasn't actually an image. This img tag doesn't need fixing, but it ended up in the queue.

answered May 22, 2017 at 18:50

Yet Another User

1394 bronze badges

Add a comment |

Daniel A. White · Accepted Answer · 2017-03-10 12:00:39Z

0

I just wanted to say thanks for having the editor make it super easy to upload. It saves a lot of time...

I wish I wasn't rate limited on my edits because I was moving thru a lot...

answered Mar 10, 2017 at 12:00

Daniel A. White

3,50219 silver badges31 bronze badges

Add a comment |

Stack Exchange Network

Roadmap to HTTPS: serving and uploading HTTPS-images only

8 Answers 8

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
discussion
images
ssl
.

Linked

Hot Network Questions

Roadmap to HTTPS: serving and uploading HTTPS-images only

8 Answers 8

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged discussionimagesssl.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
discussion
images
ssl
.