119

This is a cross post from mSO, but affecting the whole network. 1. is already enabled network wide, 2. and 3. will be enabled shortly on mSE and another SE community (main site + meta) of our choosing for the first round of site-wide HTTPS testing. Read the full master plan!

As requested by @patrick, lets go kill dead images. We don't have on-demand review queues, and editing a list of posts works OK for small batches1, let's take crowdcrafting.org / pybossa for a spin. So, without further ado:

I need a bosse2

  • 2017-03-07 (meta.)security.SE processed, (only) 28 tasks added to ^^
  • 2017-03-08 serverfault, superuser and their metas processed, 1069 tasks
  • 2017-03-09 25 main SE sites processed, 7911 (somebody call the police) tasks
  • 2017-03-09 deleted mSE posts have been removed from the crowdsourcing queue
  • 2017-03-10 rest of the public SE sites done, 6225 tasks for the queue
  • 2017-03-11 fixed 2 URLs on askubuntu.com which just received https support, but affected ~2k posts
  • child metas for .SE sites will be processed after the meta move

In the next 6-8 weeks, we'll be rolling out some changes to address the mixed content issue on the first page load, namely:

  1. Imgur URLs are converted from HTTP to HTTPS. This will be a HTML baking change, so no Markdown will be affected. This has already shipped, and all the old posts were rebaked.

  2. Prevent submission of posts that contain HTTP images. Instead of the HTTP image the markdown editor will show an additional error... additional error ... with an option to bring up the image uploading tool with the URL pre-populated, so you can easily upload it to Imgur (we can't do that on your behalf, because cc-by-sa, and whatnot): image uploaded - pre-populated with url

  3. After this is enabled, old posts that have HTTP images, accessible via HTTPS, will be edited accordingly. This will be an actual markdown change, attributed to the "URL Rewriter Bot"; posts won't be bumped. (example)

  4. Rebake remaining posts with HTTP images, so that images linking to HTTP addresses will become links - most of those are dead anyway. This will be a HTML baking change, so no Markdown will be affected, but it'll remove any mixed content even when viewing old revisions.


[1] ... and inspired by Let's rescue wayward resource requests! (trial run) [2] ... depending on how this works out, we might use it for other stuff too

36
  • 20
    You have an odd way of writing MSO and MSE :)
    – Stijn
    Commented Mar 6, 2017 at 16:04
  • 14
    @Stijn If we're being pedantic, lowercase-m is a more proper way. Prevents confusion of Meta Stack Exchange and Math Stack Exchange.
    – Undo
    Commented Mar 6, 2017 at 16:13
  • 33
    @Stijn, given his name, he should write it as ms0. :P Commented Mar 6, 2017 at 16:37
  • 19
    @Undo Good thing we don't have any other sites with names that start with M or else we'd be in real trouble.
    – Adam Lear StaffMod
    Commented Mar 6, 2017 at 21:59
  • 1
    featured please? Pretty please? 0:)
    – muru
    Commented Mar 7, 2017 at 4:28
  • 3
    @FrenzyLi "Skip" would be better, "Done" throws it out of the queue, let me add some instructions there
    – m0sa
    Commented Mar 7, 2017 at 9:04
  • 1
    @m0sa I didn't even think about this in the beginning and clicked "Done" for about about 20 cases. Is it possible to revert my actions?
    – Frenzy Li
    Commented Mar 7, 2017 at 9:06
  • 2
    @FrenzyLi don't sweat it, it takes two "Done" responses for the task to be completed
    – m0sa
    Commented Mar 7, 2017 at 9:23
  • 1
    @m0sa highly likely Frenzy Li and I could have clicked Done on the same posts, since the same posts repeated quite a few times for me.
    – muru
    Commented Mar 7, 2017 at 9:36
  • 3
    i keep getting the same ones when skipping Commented Mar 7, 2017 at 13:02
  • 2
    The crowdcrafting site thing seems to have an awful lot of un-editable posts that turn out to have been migrated to MSU/MSO/whatever. There's a number of things that would bore me straight out of contributing, and getting four or five of these from the edit button is definitely one of them.
    – E.P.
    Commented Mar 7, 2017 at 15:44
  • 1
    that's totally my bad.. added another button so you can dismiss those, calls the api to see if the post is there or not, to make your life easy...
    – m0sa
    Commented Mar 7, 2017 at 16:21
  • 1
    @DanielA.White have you found a working image?
    – m0sa
    Commented Mar 8, 2017 at 9:13
  • 2
    Okay, so I tried using the export here crowdcrafting.org/project/sehttpimagescleanup/tasks/export and I'm a little concerned the input data may not all be correct. It starts out sane: "info" : { "BaseHostAddress" : "meta.stackexchange.com", "PostId" : "150", ... but then looking for entries for cooking, I find: "info" : { "askubuntu.com" : "cooking.stackexchange.com", ... "149" : "1561"
    – Cascabel
    Commented Mar 12, 2017 at 14:55
  • 2
    ugh, looks like I've imported a CSV without headers and it took the first row's values as column names... I've created a merged, cleaned up CSV file, but unfortunately the pybossa API dosen't have a bulk delete endpoint... So, instead, I made the UI great again, by forcing it to support the alt-columns...
    – m0sa
    Commented Mar 12, 2017 at 16:26

8 Answers 8

36

Nice work on finally bringing HTTPS support in!

I want to note something on point 4 (emphasis mine):

Rebake remaining posts with HTTP images, so that images linking to HTTP addresses will become links - most of those are dead anyway.

If most of those links are dead anyway, why not go through the hassle once and check if they are actually dead or not. If they are, just remove them! Why would we need to rely on the community to fix one by one if we can do this automated at once?


It seems automating this is difficult, so m0sa set up a crowd sourcing project to let us do the work manually. Please contribute if you can! About 2500 posts to go...

28
  • 2
    we do check, note point 3. - old posts that have HTTP images, accessible via HTTPS, will be edited accordingly, 4. is just cleaning up what's left
    – m0sa
    Commented Mar 6, 2017 at 15:39
  • 10
    we can't just remove them
    – m0sa
    Commented Mar 6, 2017 at 15:42
  • 4
    Oh yes you can :) . Why not? Commented Mar 6, 2017 at 15:42
  • 8
    they have an alt text or something describing what is mean to be there
    – m0sa
    Commented Mar 6, 2017 at 15:43
  • 1
    we do put the alt text in the link text
    – m0sa
    Commented Mar 6, 2017 at 15:49
  • 4
    would you rather dig through revision history than have it right there?
    – m0sa
    Commented Mar 6, 2017 at 15:50
  • 17
    if we edit it out, the next person editing the post won't get the HTTPS only images warning, which might prod them into finding a replacement image
    – m0sa
    Commented Mar 6, 2017 at 15:54
  • 3
    Related analysis of dead links of SO (disclaimer...it's my post)
    – Andy
    Commented Mar 6, 2017 at 16:15
  • 1
    @PatrickHofman see update, TL;DR -> crowdcrafting.org/project/sehttpimagescleanup
    – m0sa
    Commented Mar 7, 2017 at 1:59
  • 4
    @PatrickHofman with the information, somebody who cares can find a different image or dig through the Wayback Machine to get it. Without the information, and (as in most cases) without meaningful alt text or sufficient description in the post body, we've got no idea. Commented Mar 7, 2017 at 3:14
  • 1
    It's a one off process, so they get processed as well, in case they ever get undeleted... But I guess in that case we're OK with a broken link? You're right though, I could've skipped the deleted ones when adding them to the queue. I'm totally planing to reuse the queue for other sites as well, so I'll exclude deleted posts next time
    – m0sa
    Commented Mar 7, 2017 at 8:08
  • 2
    Dead images can be fixed by providing a working URL, if it's clear what was supposed to be there.
    – Raphael
    Commented Mar 7, 2017 at 9:14
  • 2
    @m0sa Can you make a new list for us containing only visible posts? We are now spending tons of time on deleted posts. I haven't actually seen a visible post in the queue yet. Commented Mar 7, 2017 at 11:01
  • 3
    After 10 deleted posts, I stopped running through the system. Better to exclude them if you do it again.
    – Knossos
    Commented Mar 7, 2017 at 11:34
  • 1
    @AdamKatz not really, since IMHO the steps listed above already get rid of mixed content... the crowdsourcing thing is just an afterthought, a replacement for editing lists of post ids in a markdown post (see footnote 1). It's basically my attempt of putting lipstick on a pig. Anyone from the community could've done it, even on SO, since the lists of changed post IDs are public there as well. So I apologize if the execution isn't ideal, but it's much better than what we did till now IMO.
    – m0sa
    Commented Mar 8, 2017 at 7:14
6

Please make it so that protocol relative URLs (i.e. //foo.bar/fum.png) will still be possible (you write that you will require https:) and are left alone.

6
  • 5
    Some images can only be served over HTTP, others only over HTTPS. Using a protocol relative URL might break things. Commented Mar 7, 2017 at 7:50
  • 5
    @PatrickHofman I think Martin is (also?) referring to posts that currently have images with a protocol-relative URL In them. Since these are presumably already available over HTTPS, it would make sense to leave this kind of URI alone (as in: don't convert them to a link) as they'd get loaded over HTTPS if the embedding page is also loaded over HTTPS, anyway. Commented Mar 7, 2017 at 8:28
  • I could agree with that, although it is little work to update them too. Commented Mar 7, 2017 at 8:31
  • 1
    @user2428118: Indeed. Also intra-stackexchange links in comments without the schema are shorter, which enables us to write longer comments. :-) Commented Mar 7, 2017 at 9:20
  • 7
    There is absolutely no sense in using protocol-relative URLs anymore. All clients support (or should) https, and TLS is not slow anymore. The only reason protocol-relative URLs were useful was to speed up load times when the page was loaded over http anyways. Now it's no use to use this hack syntax to explicitly downgrade the security of the asset being linked to (image) if it can be served over https. Plus it breaks the filesystem scheme. joonas.fi/2016/12/27/stop-using-protocol-relative-urls
    – joonas.fi
    Commented Mar 7, 2017 at 16:09
  • 14
    @MartinSchröder Did you even read the link in your answer? Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset.
    – mbomb007
    Commented Mar 7, 2017 at 16:47
4

Is anyone else having problems with imageshack images? They keep coming back with failures.

3
  • 8
    That's pretty stock for Imageshack, they are very fond of deleting vaguely old images as soon as they get an excuse. (One of the reasons we're so insistent on imgur hosting these days, in fact.) Commented Mar 7, 2017 at 21:13
  • Do you have the user script installed mentioned halfway this answer: meta.stackexchange.com/questions/263771/…
    – rene
    Commented Mar 9, 2017 at 12:15
  • Do know that I have gone over all imageshack images on MSE a couple of weeks ago and rescued what was still available.
    – rene
    Commented Mar 9, 2017 at 12:39
3

I've written the following snippet (also posted at Help us fix broken images!) to filter and return the Crowdcrafting tasks by site.

It currently returns up to 100 tasks (the maximum the API allows). It looks like a lot of sites have fewer affected posts than that, there's no instant way to get further posts for the ones that do though. It is possible to paginate results through the API (see the last note here), so maybe I'll look at adding that later.

I've included links to the SE post (both view and edit links) and the Crowdcrafting task page so that you can hit "Done" on the task, which should eventually get you more tasks (it takes 2 "Done"s to remove the post from the queue I believe).

Just pick a site, hit "Get Tasks" and work through the links...

function getget(url, callback) {
  var xhr = new XMLHttpRequest();
  xhr.addEventListener('load', function() {
    var response = JSON.parse(xhr.responseText);
    callback(response);
  });
  xhr.open('GET', url);
  xhr.send();
}

function loadSites() {
  var url = 'https://api.stackexchange.com/2.2/sites?pagesize=300&filter=!*L6Sij27hkbD7Gso';
  getget(url, listSites);
}

function listSites(sites) {
  var goBtn = document.getElementById('getTasks');
  var sitesList = document.getElementById('sites');

  for (var i = 0; i < sites.items.length; i++) {
    var siteUrl = sites.items[i].site_url.replace(/^https?\:\/\//i, "");
    var opt = document.createElement('option');
    opt.value = siteUrl;
    opt.textContent = sites.items[i].name;
    sitesList.appendChild(opt);
  }

  goBtn.innerText = 'Get Tasks';
  goBtn.disabled = false;
}

function listTasks(tasks, el) {
  for (var i = 0; i < tasks.length; i++) {
    var task = tasks[i],
      info = task.info,
      taskID = task.id,
      // borked column headers again...
      postID = info['PostId'] || info['12'] || info['149'] || info['73'] || '',
      siteName = info.BaseHostAddress || info['meta.serverfault.com'] || info['askubuntu.com'] || info['sound.stackexchange.com'] || '';

    lastID = taskID;

    var span = document.createElement('span');
    span.innerText = 'Post ' + postID + ':';

    var seViewLink = document.createElement('a');
    seViewLink.href = '//' + siteName + '/questions/' + postID;
    seViewLink.innerText = 'View';

    var seEditLink = document.createElement('a');
    seEditLink.href = '//' + siteName + '/posts/' + postID + '/edit';
    seEditLink.innerText = 'Edit';

    var ccLink = document.createElement('a');
    ccLink.className = 'ccLink';
    ccLink.href = '//crowdcrafting.org/project/sehttpimagescleanup/task/' + taskID;
    ccLink.innerText = 'Crowdcrafting Task ' + taskID;

    var li = document.createElement('li');
    li.appendChild(span);
    li.appendChild(seViewLink);
    li.appendChild(seEditLink);
    li.appendChild(ccLink);
    results.appendChild(li);
  }
}

function init() {
  var goBtn = document.getElementById('getTasks');
  goBtn.addEventListener('click', function() {

    var results = document.getElementById('results');
    results.innerHTML = '';

    var site = document.getElementById('sites').value;
    var searchUrl = '//crowdcrafting.org/api/task?project_id=4667&limit=100&info=BaseHostAddress::' + site;
    getget(searchUrl, listTasks, results);

    // task columns are borked... let's run this a few times
    var searchUrl = '//crowdcrafting.org/api/task?project_id=4667&limit=100&info=meta.serverfault.com::' + site;
    getget(searchUrl, listTasks, results);
    var searchUrl = '//crowdcrafting.org/api/task?project_id=4667&limit=100&info=askubuntu.com::' + site;
    getget(searchUrl, listTasks, results);
    var searchUrl = '//crowdcrafting.org/api/task?project_id=4667&limit=100&info=sound.stackexchange.com::' + site;
    getget(searchUrl, listTasks, results);
  });
  loadSites();
}

// go!
init();
ul { list-style: none; margin: 1em 0; padding: 0; }
li { margin: 0; padding: .5em 0; }
span { display: inline-block; width: 6em; }
a { color: #fff; background-color: #03A7DD; border-radius: 4px; padding: .25em .5em; margin: 0 .5em 0 0; text-decoration: none; }
a.ccLink { background-color: #2B9884; }
<label>Site: <select id="sites"></select></label>
<button id="getTasks" disabled>Loading Sites...</button>
<ul id="results"></ul>

Note, links in Stack Snippets don't really work... just open them in a new tab (ctrl+click, middle-click, right-click+"Open in New Tab" or whatever)

2

This is nice =).

It would be nicer if someone can write a SEDE query that can find old posts that need fixing - i.e. posts with non-stack.imgur images included.

And also, while we're here, it would be nice if you guys could eliminate dependences to profile images from cdn.facebook and other trackersy domains.

5
2

I just decided to try the crowdcrafting.org link in the question above, and the first task I received contained a link to https://meta.security.stackexchange.com/questions/227.

Obviously, that's not going to work (unless I edit the URL or manually add a security exception for the invalid certificate), since it's still using the old meta.security hostname. I guess that something that should be fixed?

Also, it turns out I can't handle that task anyway, since I'm 22 rep points short of having edit privileges on security.SE meta. But that's not really something the crowdcrafting site could possibly know.

1
  • 1
    thanks, I've fixed the links, they now point to the old HTTP address, which redirects to https by default now.
    – m0sa
    Commented Apr 4, 2017 at 8:13
2

I found a bug: First task I got was http://security.stackexchange.com/questions/17393 which was complaining about an img tag which was in backticks so it wasn't actually an image. This img tag doesn't need fixing, but it ended up in the queue.

0

I just wanted to say thanks for having the editor make it super easy to upload. It saves a lot of time...

I wish I wasn't rate limited on my edits because I was moving thru a lot...

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .