163

Background: SE currently uses the Google code-prettify library for syntax highlighting. The possibility of switching has been suggested in the past:

I want to update this discussion for 2016. Here's the scoop on highlight.js:

It's not too big

The current version of prettify served by SE is 30.9kB (12.7kB gzipped).

The current default cdnjs version of highlight.js is 42.0kB (17.5kB gzipped). This includes a default set of languages, but that can of course be customized.

It's way faster

I made a basic performance test in JSFiddle to see how highlight.js does in comparison to prettify.

prettify takes about 4x as much time as highlight.js for some Ruby code (a file from Rails):

ruby histogram

And about 10x as much time for Objective-C (a file from PLCrashReporter):

obj-c histogram

It's significantly better

highlight.js has a larger set of supported languages than google-code-prettify. (See a demo of several of them.)

The tags/classes it generates are quite extensive, and they are nicely nested, so you can do some cool things with CSS to make a really nice color scheme.

It can detect nested code blocks, such as CSS inside HTML, and highlight both languages correctly in the same snippet.

Here's a simple comparison where you can see highlight.js understands nested PHP, JS, and CSS, and has better knowledge of keywords, builtins, etc. than prettify:

demo='<!DOCTYPE html>\n<head>\n  <title><? echo "Hello $name!"; ?></title>\n\n  <style>\n    body {\n      width: 500px;  /* big enough */\n    }\n  </style>\n\n  <script type="application/javascript">\n    function someFunction() {\n      return true;\n      console.log("hello world!");\n    }\n  <'+'/script>\n\n<body>\n  <p class="something" id=\'12\'>Something</p>\n  <p class=something>Something</p>\n  <!-- comment -->\n  <p class>Something</p>\n  <p class="something" title="p">Something</p>\n</body>';

document.getElementById("prettify").innerText = demo;
document.getElementById("hljs").innerText = demo;

PR.prettyPrint();
hljs.highlightBlock(document.getElementById("hljs"));
pre { padding: 0.5em; background: #F0F0F0; }

/* prettify */
.prettyprint { color: #444; }
.str { color: #880000; }
.kwd { font-weight: bold; }
.com { color: #888888; }
.typ { color: #880000; }
.lit { color: #78A960; }
.tag { font-weight: bold; }
.atn { color: #bdb76b; }
.atv { color: #65b042; }
.dec { color: #3387CC; }

/* highlight.js styles */
.xml .css, .xml .javascript, .xml .php { opacity: 0.6; }
.hljs, .hljs-subst { color: #444; }
.hljs-tag .hljs-string { color: #65b042; }
.hljs-comment { color: #888888; }
.hljs-attr { color: #bdb76b; }
.hljs-keyword, .hljs-attribute, .hljs-selector-tag, .hljs-meta-keyword, .hljs-doctag, .hljs-name { font-weight: bold; }
.hljs-type, .hljs-string, .hljs-selector-id, .hljs-selector-class, .hljs-quote, .hljs-template-tag, .hljs-deletion { color: #880000; }
.hljs-title, .hljs-section { color: #880000; font-weight: bold; }
.hljs-regexp, .hljs-symbol, .hljs-variable, .hljs-template-variable, .hljs-link, .hljs-selector-attr, .hljs-selector-pseudo { color: #BC6060; }
.hljs-literal { color: #78A960; }
.hljs-number { color: #3B719A; }
.hljs-built_in, .hljs-bullet, .hljs-code, .hljs-addition { color: #397300; }
.hljs-meta { color: #1f7199; }
.hljs-meta-string { color: #4d99bf; }
.hljs-emphasis { font-style: italic; }
.hljs-strong { font-weight: bold; }
<script src="https://cdnjs.cloudflare.com/ajax/libs/prettify/r298/prettify.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.3.0/highlight.min.js"></script>

<table>
  <tr><th><tt>google-code-prettify</tt></th><th><tt>highlight.js</tt></th></tr>
  <tr>
    <td><pre id="prettify" class="prettyprint"></pre></td>
    <td><pre id="hljs"></pre></td>
  </tr>
</table>

It's actively developed

The commit activity for highlight.js is consistently high. Not so much for prettify (although it's not completely dead as some might have thought). Of course, SE can't update all the time, but more improvements can be pulled in every time the library is updated.

Some other benefits are discussed in this comparison from 2011 and in this post by the author.

Let's do it!

Given that Stack Overflow and other SE sites exist for the purpose of programming Q&A, good syntax highlighting is crucial. It seems worth the investment to make a switch like this, if it improves the user experience of each of the 48 million monthly visitors, which I'm certain will only keep growing with some of the new (syntax-heavy) features coming up. The community has shown a great deal of interest in syntax highlighting improvements over the years... I just hope the time is right!

17
  • 22
    It's time for a switch. Let's do it!
    – Daniel
    Commented Apr 9, 2016 at 2:55
  • 14
    We'll be reviewing this idea in our weekly Core (Q&A) team call next week.
    – Haney
    Commented Apr 11, 2016 at 20:14
  • 1
    Excellent, thank you @Haney. I (and many others) look forward to hearing what's discussed. :-)
    – jtbandes
    Commented Apr 11, 2016 at 20:15
  • 1
    @jtbandes no problem. PS: nice use of a Stack Snippet. ;)
    – Haney
    Commented Apr 11, 2016 at 20:16
  • 12
    @Haney don't hesitate to contact us for any assistance, either with a GitHub issue or by email [email protected]
    – isagalaev
    Commented Apr 13, 2016 at 16:48
  • 2
    Update: still trying to find time to review, but we're slammed with projects right now. Gonna try to get to it later this week, if not having a look in the next few weeks.
    – Haney
    Commented Apr 18, 2016 at 20:33
  • 1
    Nice CSS/color changes on the existing syntax highlighter, though ;)
    – jtbandes
    Commented Apr 26, 2016 at 22:03
  • 1
    I don't know how common it is across the network, but on Code Golf we make extensive use of the language tags in answers, e.g. <!-- language-all: lang-ruby -->. Would this still be recognized with Highlight.js? (Not that supporting this syntax would be a dealbreaker for me; it'd just be nice.)
    – Alex A.
    Commented Apr 28, 2016 at 6:13
  • 1
    I'm not sure whether the language tag/hint is something implemented by SO specifically, or built into prettify. It probably wouldn't be hard to parse them in order to inform the way hljs is activated.
    – jtbandes
    Commented Apr 28, 2016 at 6:19
  • 4
    @AlexA. yes, highlight.js can use those. We have a config option where you can dump all the SO tags and we will use those that we recognize as languages to constrain automatic detection. It was specifically designed to support such use case, in fact.
    – isagalaev
    Commented Apr 28, 2016 at 22:41
  • 9
    Update: the team is investigating and assessing the lib this week and next week
    – Haney
    Commented May 5, 2016 at 19:58
  • 2
    @Haney awesome news, thank you for the update! Please test matlab highlighting as well;) (SO support for it has been missing for ever, with more than 60k questions) Commented May 5, 2016 at 20:55
  • 1
    @YaakovEllis I wonder ehat tipped the scales in favour of using this library. The answer by Oded seems a very compelling reason not to.
    – Luuklag
    Commented Aug 26, 2020 at 6:49
  • 7
    There will be an official post in the next couple of weeks that will go into more details regarding the reasons for switching and the plan for the rollout. Commented Aug 26, 2020 at 8:07
  • 2
    That blog post, for reference: meta.stackexchange.com/questions/353983/…
    – jtbandes
    Commented Sep 9, 2020 at 17:33

1 Answer 1

34

The results of my testing were disappointing - we will not go ahead with highlight.js as a syntax highlighter for Stack Overflow and our other sites.

It is not too big

It is - I generated a custom set of languages that mirrors the exact set we currently support with prettify. Uncompressed it is 57kb, compression takes it down to 22kb - compared with the 42kb (17.4kb compressed) for prettify. That's an extra 5kb minimum for millions and millions of requests a day (which doesn't consider the css file and that css class names used by highlight.js are much longer than those prettify uses).

This size concern only grows with adding more languages. The full set of languages comes in at almost 500k (I did not bother to test gzip with that) - adding more languages is not free.

It's way faster

It isn't, not in my testing - using highlight.js on a few examples on my local machine and comparing performance (don't forget - we have a highly nested DOM, and many "benchmarks" are done on a very simple page - which is not indicative of performance on Stack Overflow).

In my tests, CPU time for highlight.js was anything between two and four times higher than for prettify (in some cases translating to > 120ms difference). This was actually noticeable with the highlighting flicker (when code changes from normal text to highlighted text) coming in later for highlight.js.

I have also tested by using console.time around our highlighting calls - highlight.js consistently performed worse than prettify.

It's significantly better / It's actively developed

"better" needs context - sure, it can detect nested languages and yes, it has many more supported languages. But these features come at a cost - it is larger and slower for us - and we are not willing to pay this cost.

I won't argue with "actively developed" - that is true. Hopefully, the developers will be able to make it smaller and faster, so much so that we would be able to replace prettify with it.


An additional concern is that there doesn't seem to be a way to add languages dynamically - we do this with mathematica.se, where we add mathematica highlighting just for that site, as an added feature (this is done as the highlighting file is big and doesn't compress well - mathematica has a ton of keywords).


We're not doing it (at the moment)

These differences mean we will not go forward with highlight.js at this time.

If size and performance of the library will improve and we can find a way to add languages dynamically, we can test again and possibly replace prettify.

19
  • Thanks for humoring us and for the detailed response. Sorry it didn't pan out — hopefully some of these issues can be eventually resolved, but I won't get my hopes up too much.
    – jtbandes
    Commented May 17, 2016 at 16:03
  • 6
    Thanks for evaluation. Indeed, our top priority is correctness and it does come at the expense of size, in particular. As for dynamically adding languages, it can totally be done by simply including <script src="<language-file>"></script> after the core module. And the core module can be stripped off languages altogether.
    – isagalaev
    Commented May 17, 2016 at 16:35
  • 3
    @isagalaev - thanks for letting me know of how adding languages works. I did not see that in the documentation and wasn't sure it was possible. That's one concern less... still - size and performance are outstanding concerns for us.
    – Oded
    Commented May 17, 2016 at 16:39
  • 2
    @Oded we don't advertise adding languages dynamically as it is slower than having a single package. But I understand your case, too. As for size, it is a deliberate choice. We do want that many keywords/built-ins listed explicitly. Doing it any more generically kills quality (like highlighting CamelCase irregardless of semantics, like prettify (and prism?) do.) So it's our feature, and I'm totally fine if it simply doesn't fit SO.
    – isagalaev
    Commented May 17, 2016 at 16:51
  • 2
    Thank you for checking it out. So...any hopes for MATLAB with prettify? Should we give up, or keep nagging you folks?:) Commented May 17, 2016 at 17:15
  • 3
    @AndrasDeak - adding the MATLAB prettify script blows up the size to 144kb, 51kb minified, so that's not happening. The author of the answer on that post says there is a version without the keywords, but that doesn't seem to exist on the GitHub repo - not sure how useful something like that would be, but a large part of the increase is in the keywords. Even with such a version, I would need to re-test.
    – Oded
    Commented May 18, 2016 at 10:38
  • @Oded thank you very much for the response. "It won't work" is perfectly fine, in that we now know what's going on, and what to expect:) Thank you once again for your time, I won't be nagging any of you with this in the future. Commented May 18, 2016 at 10:46
  • 3
    @AndrasDeak - if the overhead wasn't so high... that's why I am talking about a stripped down version (as mentioned in that answer). If that kinds of version was useful to MATLAB posts and would be small (for comparison - most language files for prettify are 1kb-4kb in size - before minification), we would certainly consider it.
    – Oded
    Commented May 18, 2016 at 10:53
  • 1
    @Oded the stripped down version is the last link in Amro's answer: github.com/amroamroamro/prettify-matlab/raw/no_functions/js/…
    – beaker
    Commented May 18, 2016 at 16:40
  • 1
    @beaker - ah, thanks - couldn't figure that one out myself. Will test when time allows.
    – Oded
    Commented May 18, 2016 at 16:41
  • what about asvd.github.io/microlight ? 2.2k for any programming language (well, kind of :-))
    – asvd
    Commented Jun 2, 2016 at 15:52
  • "That's an extra 5kb minimum" -- maybe my question is naive, but is that really an issue given local and CDN caching?
    – Raphael
    Commented Jan 11, 2017 at 9:26
  • 14
    @Oded I guess I don't understand why all (?) the JS you serve comes with max-age=0.
    – Raphael
    Commented Jan 11, 2017 at 12:24
  • 4
    Is there any particular reason you don’t do the syntax highlighting as part of rendering the Markdown on the server-side (and caching it alongside)?
    – idmean
    Commented Jul 30, 2018 at 12:53
  • 4
    You include every language on every page? Seems like easy optimization pickings if you're really concerned about 5kb.... It's worth noting that @isagalaev's comment pretty much addresses the size concerns, at the expensive of programmer to time to dynamically load languages as needed . The performance concerns remain of course :)
    – DylanYoung
    Commented Nov 6, 2018 at 21:21

Not the answer you're looking for? Browse other questions tagged .