317

Update 2020-09-24

This is now live network-wide.

Update

This is now live on Meta Stack Exchange and Meta Stack Overflow. Any bugs and feedback can be posted here as an answer.


I’m Ben and I’m a dev on the Teams team here at Stack Overflow - we're the team focused on building the private Teams experience on SO. I’ve recently been working on our post editing experience and I’d like to show off some preliminary work that’s coming to the network soon.

TL;DR

We’re switching our code block highlighting library from Google Prettify to highlight.js. All your favorite languages are still supported and you won’t need to change how you write posts at all. The only major change is how we render highlighted code blocks. In addition, we’re taking this opportunity to introduce our new highlighting theme as well. We’re rolling this out in stages, starting with MSE/MSO with other sites to follow. (See the FAQ at the bottom of this post for dates)

Some history on Prettify / code block highlighting

I tried to do some digging on when we first adopted Prettify, but it seems that its history goes allll the way back to site’s earliest days. The earliest reference I could find was from back in ‘08. I asked around internally too and the best answers I could get were along the lines of:

¯\_(ツ)_/¯ - Everyone

Ask Atwood - Dean

and

If I had to guess, it was something along the lines of "there's not a lot of options, and this is used by Google so it's probably fine" - Kevin

Eventually the wonderful Tim Post pointed me to Stack Overflow Podcast #11, aired June 2008, where Jeff and Joel talk about how incredible it was for the time and how Google uses it themselves for syntax highlighting in Google Code (RIP). They also put out a call for alternatives, which I’d have to assume came up short.

Why the change?

Google Prettify hasn't been under active development for a while, and was officially discontinued by Google in April, as you all have let us know repeatedly. This means that no new language syntaxes1 are being supported and that existing language syntaxes aren’t getting updated to support all their new features. It’s time to move on to something that supports modern front-end workflows (such as providing an npm package, for starters) and continues to evolve to meet the needs of developers.

What’s changing about how I write posts?

Absolutely nothing :). There is absolutely no change to how posts are written. We still support all the Prettify language aliases you know and love, along with the new aliases from highlight.js. However, we are not adding support for any new languages at this time, instead choosing to keep the initial changeset simple and aiming for current feature parity instead. All the current markdown syntax is still supported, along with determining code highlighting from tags and site defaults.

So what is changing?

The “only” changes are visual. We are updating the client-side code block renderer that styles your code in posts (Questions, Answers, etc) and in the editor preview. Syntax autodetection when a language is not specified should be much better overall, along with syntax highlighting coverage in general. The biggest outward facing change for the typical user is going to be our new theme (see below for details).

Why highlight.js? Why not…

Why did we pick highlight.js over Prettify? Well, first off, you asked for it specifically. More convincingly, it’s open source, actively maintained, and overall just a solid product.

We’re extremely concerned about performance here at SO (both on the client and on the server), so we needed to ensure that this major change on our hottest page on the site didn’t negatively impact our users. There was some prior investigation into highlight.js's performance back in 2016, but I figured we should give it another shot.

In our internal performance benchmarks highlight.js scored better than Prettify consistently across all browsers (except macOS Safari 13.1, where it was actually a bit slower)2. It is a tad heavier than Prettify3, weighing in at an extra ~17kB (over the wire) after including all the languages we support across the network. This extra weight gain was acceptable to us as a tradeoff for what we were getting in return.

Why did we pick highlight.js over other contenders? Simply put, it was the best option that served our needs. We needed a library that we could easily control for use in the browser (deferred loading, theming specific elements), while also being simple to consume via a npm package, not needing specific build steps or a special babel plugin to pull in only the parts we need. Additionally, we could run it on the server (via Node.js) to unify our syntax highlighting in our Stacks documentation, giving us a single syntax highlighter across our products. Also a major plus was the ability to tokenize the highlighting result for use in our new editor (stay tuned!).

What are some potential drawbacks?

The most obvious not-quite-a-drawback is that language autodetection is different from Prettify. In general, it will be much more accurate, but will possibly end up with a different result that what Prettify would give us. This isn’t so much a bad thing, as it is just a thing that might take some getting used to if you’re a Prettify power user.

As mentioned earlier, the overall code bundle size is a bit bigger too. The vast majority of users wouldn’t even notice the change, which would only affect the first fetch since the browser will cache the file locally for subsequent hits anyways.

The last item is a bit of a personal preference. highlight.js tends to not highlight punctuation, which makes it a bit less colorful than other highlighters. This is considered a feature. Not a deal breaker by any means, but something I should mention regardless.

Designing the new theme

To offer some insight into how the new theme was designed, I reached out to the author, principal design systems designer Aaron Shekey.

Since we’re upgrading, we wanted to take this opportunity to design a Stack Overflow-flavored theme that takes advantage of newer tech like CSS variables that are aware of both light and dark modes. While we’ve improved it over the years, it’s highly likely that the current production theme simply used the stock colors provided by Prettify.

We’d need a theme that could work in both light and dark modes, was informed by Stack Overflow’s branded colors, and introduced a bit more contrast throughout.

Thankfully, we weren’t starting from scratch. When we built our Stacks documentation, we’d spent some time making our Jekyll theme display code snippets that got pretty close to accomplishing those goals. However, this was before dark mode was a thing, and we’d only built a single theme that assumed a fixed dark background. We’d have to extend this theme to light mode and revisit contrast along the way.

Using the Stacks documentation as a playground, we’ve now got themes in both light and dark modes that look like Stack Overflow and add or maintain contrast levels. We did our best to accomplish a contrast level of AAA, with a few variables dipping into AA. You can see the exact measurements commented in our colors constants file.

Here are a few screencaps of the new theme taken from my local development environment (click on the images to expand them). You can preview more languages (with an easy dark/light mode toggle) over at the Stacks docs.

Before

hljs before

After

hljs after

FAQ

  • Q: When is the rollout happening?

    A: We're planning to roll this out to meta.stackexchange and meta.stackoverflow on Thursday, September 10th. Rollout to the rest of the network is scheduled for September 24th, after the initial testing period. This is a soft rollout date, dependent on any bugs/feedback we get from the community during the testing period.

  • Q: What if I find a bug?

    A: Report bugs in an answer (one per answer) to this question. We'll keep this open for a couple/few weeks (until Friday, October 2nd) to address any immediate issues and then we'll update this post and ask you to post bugs as new questions after that time.

Footnotes

1 I checked, plural of syntax is syntaxes. Take that spell-checker!

2 Client-side benchmarks being what they are, we measured anywhere from ~49%-60% increase in the rate of ops/second depending on the machine and browser. Outliers being Safari 13.1 which had a ~29% decrease (favoring prettify) and Edge “legacy” scoring a ~279% increase over prettify!

3 Size comparisons were done comparing the prettify-full.en.js file taken from production vs the new highlight.pack.js bundle. Both were minified and served via a webpack-dev-server instance with the compress flag set (enabling gzip support). They were then included onto a regular html page with script tags and measured using the built-in browser dev tools. At the time of measurement, prettify landed at 23.3kB over the wire (meaning that the file was minified + gzipped) vs highlight.js at 40.7kB. This is a 17.4kB increase or about a ~74% increase in file size.

52
  • 61
    Good to see that the Teams devs are also bringing features to the public network!
    – Luuklag
    Commented Sep 8, 2020 at 18:40
  • 13
    I have some updating to do then, for example on this one: What is the default language for the syntax highlighter? ...
    – rene
    Commented Sep 8, 2020 at 18:56
  • 5
    @rene I have a (incomplete, I'm sure) list of posts that need updating after this goes live. I'll add this one to the list. Thanks for bringing it to my attention
    – Ben Kelly StaffMod
    Commented Sep 8, 2020 at 19:04
  • 7
    I hope the syntax highlighting FAQs (both here and on MSO) are part of that list as well. Commented Sep 8, 2020 at 19:18
  • 2
    @SonictheMaskedWerehog Yup, they both are! Thanks for checking in though. Better safe than sorry with those posts since they're the "source of truth" for this feature.
    – Ben Kelly StaffMod
    Commented Sep 8, 2020 at 19:29
  • 13
    How badly does it choke when asked to highlight the omni-glot? (Yes, that's a single piece of code that's a different valid program in each of 294 languages.)
    – Mark
    Commented Sep 8, 2020 at 20:08
  • 10
    @Mark Honestly, not too bad. Speed was (visually, didn't actually benchmark it) comparable to my test page that had ~16 different languages on it. For the curious, the autohighlight detection marked that code snippet as bash.
    – Ben Kelly StaffMod
    Commented Sep 8, 2020 at 20:25
  • 15
    @SonictheMaskedWerehog why invest resources into something that isn't the core product?
    – Braiam
    Commented Sep 8, 2020 at 22:41
  • 2
    @Braiam I thought SO was their core product. Syntax highlighting is a core part of that. Commented Sep 8, 2020 at 23:03
  • 31
    Q&A is the core product, @Sonic, not syntax highlighters. Syntax highlighting is just a small portion of that, and not in any way critical to the platform's success. It isn't something Stack Overflow needs ownership or control of. Just as you shouldn't develop your own JavaScript framework, you shouldn't develop all your own tools when there's something already out there that does the job well. Commented Sep 9, 2020 at 0:07
  • 4
    Given that highlighting occurs client side, how about a feature that allows the user to specify the highlight.js template(s)? It would be very cool if code on SO looked the same as my IDE. Familiar (color) styling makes code easier to read.
    – Mike
    Commented Sep 9, 2020 at 1:02
  • 6
    Please, ask your questions as answers, as that gives Ben more room to respond individually!
    – Catija
    Commented Sep 9, 2020 at 6:19
  • 13
    Whatever you do, please don't change the line-height for code blocks
    – hkotsubo
    Commented Sep 9, 2020 at 12:53
  • 3
    Don't forget to update this page to mention highlight.js instead of Google Prettify.
    – Clonkex
    Commented Sep 17, 2020 at 1:43
  • 2
    @Clonkex Already good to go! You can see the update on the meta version of that page (happens automatically when turned on). Good looking out though
    – Ben Kelly StaffMod
    Commented Sep 17, 2020 at 19:21

38 Answers 38

182

Can Stack Exchange please update to newer versions of Highlight.js on a regular cycle, rather than only on request?

As I said in my prior post reporting Google's discontinuation of Prettify, the process of filing bugs and feature requests with syntax highlighting would be quite drawn out and take a needlessly long time. The process was like this:

  1. File a bug with Prettify, which would take 6-8 months to be resolved, if at all. (I filed a bug in 2014 and it still hadn't been resolved by the time Google put the project to rest.)
  2. Once the request was resolved in Prettify, file another feature request here on Meta to request that SE update to the newer version. This would take the typical 6-8 week response time, but would often take longer than most requests because as far as I can best tell, they were only actioned when a developer happened to stumble on them.

As far as I can tell, Highlight.js is very actively maintained and requests with it are resolved fairly quickly, so #1 isn't an issue anymore (at least not in the current term). However, #2 will still remain an issue if SE sticks with their pre-existing model of only updating to newer versions on request.

Can Stack Exchange please actively update to newer Highlight.js versions on a regular cycle (not necessarily immediately after they're released, as I understand that'd be too onerous), rather than only updating to newer highlighter versions upon request? This would eliminate the problem in #2 and make the overall process significantly faster as one need only file the bug or feature request with Highlight.js and it'd be fixed in SE fairly quickly.

1
  • 2
    Comments are not for extended discussion; this conversation has been moved to chat.
    – Ben Kelly StaffMod
    Commented Oct 30, 2020 at 21:29
118

I found it rather frustrating that I couldn't easily see how the before/after pictures differed, so I did a bit of cutting and pasting so I could look at the before/after side by side to compare them more easily. Then it occurred to me that others might like to do the same, so feel free to have a look. Should be the same basic info as in the question, but arranged for more meaningful viewing.

First dark mode:

enter image description here

And then light mode:

enter image description here

Sorry, my cutting wasn't quite perfect, so (especially in light mode) you can see some dark lines that really shouldn't be there. But even if there's a little extra junk, at least you can do a real comparison so the changes are reasonably apparent.

To me, the new color scheme appears to have at least a couple of different general types of problems.

One is technical accuracy (i.e., accuracy in the tokenization itself). For example, looking at the Python example, if is in one color, and None in a different color (which appears to be the same color for 0, 1, and 0b101 and for someFunc and SomeClass). if and None are both keywords, so it would appear reasonable that they both be the same color. It doesn't seem reasonable or useful for two keywords to be in clearly different colors, and one of them in the same color as some identifiers and literal values.

Another is the choices of colors themselves. Generally speaking, for comfortable viewing we want to balance between two extremes. If there's too little difference between colors, it's not always clear whether two things are the same or different colors. When colors can't be distinguished easily, we lose much of the benefit of using coloring to start with.

At the same time, we don't want too much contrast, especially when two things are immediately adjacent to each other. If we do this, viewing simply becomes uncomfortable1.

In this case, we see what may be some of the first problem. As previously mentioned, in the Python example, None, someFunc, SomeClass, 1, 0 and 0b101 are all shown in what looks like the same color. It's possible that this isn't really a parsing problem--maybe it's assigning a unique color to each, and they just happen to be so similar that we can't distinguish them.

The old color scheme also differentiates between the class name and the function name, where the new one seems to use the same color for both. Given that they're both syntactically identifiers, it's open to argument that this doesn't affect accuracy (as such), but it seems pretty clear to me that the old scheme is providing more useful information.

In the dark mode pictures, we see at least a few clear-cut cases of excessive contrast. The most obvious are the parameters (param1 and param2) shown in bright white against a deep-black background. In this case, we've pretty clearly gone beyond the level of contrast that most people can look at comfortably. As an aside, there are a few cases where it's a bit more reasonable to break or at least bend this rule a bit. For example, if you're coloring something with a very small area (e.g., a period or comma) you can often get away with a bit higher contrast than if the area were larger.

At least in my opinion, the light mode version of the new coloring fares at least somewhat better in this respect. We still have None colored to match the identifiers and literals, and mismatching if. On the other hand, the background in this case is a light grey, and the parameter names are in a somewhat darker grey, so the contrast range is considerably more manageable.

Given a wide audience, we'd also like the color schemes to work well for people with color impaired vision. The most common color vision impairment is called Deuteranomaly. If we run the pictures through a filter, we can see a simulation of approximately how these would look. For example, here's the light-mode Python code with simulated deuteranomaly vision:

enter image description here

Here we see that in the new color scheme, the comment is only barely distinguishable from the preceding code, and even less so from the literals (e.g., 'gre\'ater') It might not not so close that I'd consider it a clear failure in this regard, but it's enough to make me at least a little uncomfortable (and at least with respect to serving people with color vision deficiencies, pretty close to an outright failure).

The old color scheme is clearly superior in this regard--although contrast is certainly reduced in some cases, everything that started out as a separate color remains quite easily distinct.

There are, of course, other forms of color vision deficiency, up to and including truly complete color blindness. Fortunately, that's pretty rare. Deuteranomaly is the most common, and dealing well with it will frequently also work out well for most of the other somewhat less common cases (e.g., Protanomaly, Tritanomaly, etc.)

Unfortunately, it's fairly difficult to do automated testing of when colors have enough contrast for the difference to be easily visible. There are computations for "delta E" to tell you how much difference there is between two colors, but eyes are easily deceived, so (for example) the surroundings can make two areas with identical colors look obviously different, or areas with different colors difficult to distinguish. About the best we can hope for in a case like this (retrofitting to a system, affecting far too many pages to review each individually) is to get rid of obvious problems.


  1. Now rarely relevant, but back in the days of CRTs you could get away with more in this regard, because individual pixels tended to have some degree of gradient at the edges, so even the brightest white against the darkest black still had at least some degree of gradient from one to the other. That's much less true with LCDs though, so we have to be more careful as the technology no longer covers for our mistakes.
13
  • 7
    Looking beyond the individual colors in isolation, however, we see far too high of contrast. The bright white of the variable names on the dark background is literally painful. The Haloween orange used for the function name also contrasts much too strongly with the aforementioned putrid green. Commented Sep 9, 2020 at 6:48
  • 11
    The new color scheme is also problematic from a functional viewpoint. For example, it appears to use the same color for the name of the class, the function, the numeric literals, and the None (in the Python example). Some of it doesn't even may sense syntactically--for example, in Python both if and None are keywords, but the new colorizer has given them different colors. Commented Sep 9, 2020 at 6:54
  • Oh, and just in case the subject should arise: yes, I calibrate my monitors on a regular basis, and I keep the background lighting dim enough to assure that my eyes are adjusted almost exclusively to the white point of the monitors, not the room (though it's at a fairly reasonable approximation of the standard D55 illuminant in any case). Commented Sep 9, 2020 at 7:07
  • 11
    I prefer the old colors, because they look like Visual Studio. The new colors feel really weird (especially the green strings, which I always associate to comments).
    – Métoule
    Commented Sep 9, 2020 at 8:01
  • 7
    I disagree that the new scheme is at all garish, though obviously that is a subjective consideration, and they'll never keep everyone happy with any change (or no change!) in that area. As mentioned by @Métoule though, green for string literals could be an issue given several very common tools use green for comments. I like that you are keeping comments grey though. In any case I find what is highlighted and the contrast levels far more important than the specific colours, I quickly get used to new colour schemes as long as I can see the key things I find helpful to see identified. Commented Sep 9, 2020 at 9:49
  • 11
    Having sucessfully ranted against a design change - the specific issues probably should be in the questions, especially things like "Some of it doesn't even may sense syntactically--for example, in Python both if and None are keywords, but the new colorizer has given them different colors." One finds things like insipid and garish are often a matter of personal taste though Commented Sep 9, 2020 at 10:37
  • 3
    @Métoule the IDEs I use have green for strings, grey for comments.
    – OrangeDog
    Commented Sep 9, 2020 at 13:48
  • 11
    Thank you for creating the side-by-side before/after shots. I didn't consciously create them in any certain manner other than "I took the old shots at the same time and the new shots at the same time". I'll pass your feedback on the color scheme to the designer. I do see your point about some colors being potentially mismatched or not matching other IDEs. I'll do a bit of cross referencing and see if we can't make some tweaks there for a quick win.
    – Ben Kelly StaffMod
    Commented Sep 9, 2020 at 14:28
  • 11
    Side note: I found the info in your comments to be very helpful. I'd recommend adding it to your post to help flesh out the statement/request a bit more and to help deter drive-by downvotes for "personal taste" vs "specific issues"
    – Ben Kelly StaffMod
    Commented Sep 9, 2020 at 14:30
  • 41
    Re Python specifically, None and if are both keywords, but they have different semantic roles: if is control structure, None is a constant. Highlighting them in different colors is common in IDEs and I actually think it's better that way.
    – zwol
    Commented Sep 10, 2020 at 12:59
  • @zwol: I can see at least some merit to that line of thinking. My personal preference would be to start from a high level and make things more or less hierarchical, so (for example) key words are in shades of green and values in shades of blue.In this case, I can see where it would make sense for None to be more or less a teal color to accurately reflect that it's a keyword that's typically used in a value context, so to speak. Commented Sep 10, 2020 at 18:10
  • In IDLE, repl.it (online IDE), and PyCharm, if and None are the same color by default.
    – Rainbolt
    Commented Sep 14, 2020 at 14:12
  • 1
    If the old colour scheme is better, as it seems to me that it is, is it possible to create a "theme" for the new highlighter to match the old highlighter's colour scheme? Commented Sep 14, 2020 at 22:19
60

I'd like to say that I appreciate this post.

It is clear, very informative, very detailled, and to me shows that person's concern for the community.

Of course, there will always be different opinion on the result ("I prefer the former highlighting" "I prefer the new one!") but that is inevitable.

I find the reasons to change (and the choice) compelling enough, and the resulting highliting is pleasing to the eye.

(I see some concerns about having several things showing up with the same color: this is inevitable. The highlighting is there to have successive part in a different color, thus making transitions visible, and the overall structure appear, and not to have everything with its own specific color)

Thank you, @ben-kelly, for the information

5
  • 5
    Thank you for the kind words! I'm glad the post resonated with you. If have any specific concerns about certain items looking funny (either overall or with a specific language), I'm happy to field those as well
    – Ben Kelly StaffMod
    Commented Sep 9, 2020 at 14:56
  • @BenKelly: thank you! And about corrections: I'll let others much better than me answer that (and I personnally find it fine so far. And it is probably good to keep things as 'default/vanilla' as possible for now until it is validated and fully stress-tested, unless some glaring problem is pointed out) Commented Sep 9, 2020 at 15:07
  • 2
    @BenKelly: I'd just advise (it's probably done): test with several different persons having several type of 'color vision deficiencies', as it is a common problem and this affects a lot of people (some webpages or charts rely only on color to convey some informations, and they are not useable by all). Whenever possible other means (graphics, signs, italics, etc) help as well (for syntax coloring, there is not much option, but contrast or thickness may help to help differentiate some colors. I'm quite sure the devs of the .js took this into account: it may even explain their color choices?) Commented Sep 9, 2020 at 15:16
  • 2
    Believe it or not, we did actually do some testing with Chrome's new-ish Emulate vision deficiencies tool. You can see some of my comments (w/ pics) on the public Stacks PR. Obviously, emulation is not perfect, but definitely better than not checking at all.
    – Ben Kelly StaffMod
    Commented Sep 9, 2020 at 15:29
  • 1
    @BenKelly: Good to know!
    – V2Blast
    Commented Sep 11, 2020 at 0:46
30

Current maintainer of Highlight.js here, though I'd add a few quick comments.

highlight.js tends to not highlight punctuation, which makes it a bit less colorful than other highlighters. This is considered a feature. Not a deal breaker by any means, but something I should mention regardless.

This is something I'm open to improving if someone wants to work on PRs and figure out a good way to go about handling this (work with existing themes, not be invasive, etc). https://github.com/highlightjs/highlight.js/issues/2500

I assure you that Mathematica Stack Exchange will be supported at launch. Due to the large size of the mma language definitions, the language is actually split out from the rest.

Some languages MIGHT also be possible to Highlight with a wildcard vs a list of EVERY keyword under the sun... I'm not sure if Mathematica would be one such language or not. Some of our languages are quite heavy because the keyword approach was just simpler (and more accurate). That said just breaking out the files and loading them (as needed) is probably the best solution for some of the less popular languages. And would also help with auto-detect speed.

For example, looking at the Python example, if is in one color, and None in a different color (which appears to be the same color for 0, 1, and 0b101 and for someFunc and SomeClass). if and None are both keywords,

We've always highlighted literals and keywords differently. For Python False, None, and True are currently defined as literals.

the first 5 inline comments are not parsed as comments at all.

Definitely a bug (and should be an easy fix), a GitHub issue would be appreciated. :-)

Language auto-detection for assembly language seems to be broken.

Auto-detect is on a "best effort" basis... the smaller the snippet the worse the auto-detect, but some languages are also much harder to auto-detect than others. If you really think there is an OBVIOUS issue (a huge snippet that is constantly flagged incorrectly, etc) then a GitHub issue would be more than welcome...

Different flavours of assembly language use different comment characters, so this is a somewhat thorny problem.

Indeed, and why have multiple assembly grammars, not just a single one. I have no idea if it would be possible to have a single grammar for exactly this reason.

6
  • 2
    Off topic comment: the links to your GH profile on this page are broken, they point to github.com/yyyc514 which does not exist. Can you please fix it? Commented Sep 21, 2020 at 14:45
  • 1
    I'm not sure I can update the website itself (I'll ask - it's still managed by the original author), but I've updated the CHANGES.md file on GitHub and it'll be correct in future releases notes. Also added a placeholder/pointer profile so no one should get too lost in the future. Thanks for pointing it out! Commented Sep 21, 2020 at 15:02
  • 1
    Nice trick, @Josh. Thanks! Commented Sep 21, 2020 at 15:30
  • 2
    @JoshGoebel Thanks so much for checking in! Really excited to be using highlight.js. I've written a couple of projects against it and I can say that I'm really digging it for the most part. Looking forward to giving back to the project as well soon. Once I can find the time, I have a few suggestions / PRs to file against it ;)
    – Ben Kelly StaffMod
    Commented Sep 21, 2020 at 19:35
  • @JoshGoebel I think a fair few SO users would like to see the colouring match that of the usual IDE for whatever language. Currently on SO, it does not (e.g. strings in green instead of purple for C# and VB.NET). Is the colouring due to highlight.js or due to customisation by SO? Commented Sep 30, 2020 at 15:52
  • 1
    @AndrewMorton In general SO controls the colors with their custom theme. I imagine the goal is for it to mostly look consistent across the platform - not for each language to have an entirely custom look. That's my understanding. Commented Oct 1, 2020 at 16:56
28

<!-- language-all: lang-none --> hint doesn't seem to work anymore

This post has a <!-- language-all: lang-none --> hint at the top of the post to prevent all the code blocks in it from being highlighted. I tried changing lang-none to none and it still didn't work. (As you say in your post, Prettify identifiers will still continue to work even after the change.)

We were told at the time of the CommonMark migration that <!-- language-all: [language] --> hints would continue to be supported, unlike the <!-- language: [language] --> syntax which was being deprecated.

This issue seems to be specific to the lang-none and none hints as part of this style of HTML comment; other ones seem to be working fine. As an example, this post contains such a comment to indicate C as the highlighting language, and the below snippet is highlighted in C:

#include <stdio.h>

(To test, I also changed the comment to indicate Python and it highlighted the above as Python.)

It seems to work for individual code blocks, using the code fence notation (i.e. ```none and ```lang-none):

#include <stdio.h>

In summary: <!-- language-all: lang-none --> and <!-- language-all: none --> don't seem to work to disable syntax highlighting for a particular post.

4
  • 5
    @BenKelly One week later, what came of the review? Commented Sep 19, 2020 at 5:08
  • 1
    Sorry, forgot to respond to this one. This is currently working identically to other sites where highlight.js is not turned on yet. For instance, try on Stack Overflow (just the preview, don't need to post!). language-all only works on blocks that do not have an override language set, such as via the code fence string or by a regular language comment.
    – Ben Kelly StaffMod
    Commented Sep 21, 2020 at 19:49
  • @BenKelly FYI our equivalent of none language wise is "plaintext" and CSS wise is `no-highlight'. Of course you're free to handle it on the SO side also and simply do nothing. Commented Oct 7, 2020 at 17:02
  • @JoshGoebel Yup! I added an alias from none to plaintext. Easy fix to bridge the gap there
    – Ben Kelly StaffMod
    Commented Oct 7, 2020 at 17:41
22

C is not supported

There is no C syntax highlighter in highlight.js. highlight.js uses the C++ highlighter for C, and it is a nightmare. It actually makes code harder to read than not having any highlighting. I saw a post on Stack Overflow wherein the same two tokens struct List are coloured in 3 different ways:

enter image description here

Yes, I've checked that lang-c is in use.

I presume there is some logic that detects that the clause starting with struct List is a declaration and then colours the entire line brown:

struct List *newnode = (struct List *)malloc(size * sizeof(struct List));

But this is not helpful in any way, and if you actually used a typedef List, then it would be coloured differently:

List *newnode = (struct List *)malloc(size * sizeof(struct List));

Every other C language highlighter I have seen colours token classes, context-free. For example the token struct, a keyword, should always have the same colour.

(Though, since in struct X, X is a tag, it could be distinguished from X that is a typedef, or a variable or function name)

8
  • 6
    This is an open issue: splitting C and C++ should happen (and C++ also needs to be drastically improved). As with pretty much all language-level improvements of the highlighter, this is “help wanted”, i.e. probably won’t happen without somebody submitting a pull request. Commented Sep 30, 2020 at 10:38
  • @KonradRudolph Why would SO not want this fixed? Or do you mean it's help wanted from the lib's perspective?
    – Rob Grant
    Commented Oct 2, 2020 at 10:03
  • 1
    @RobGrant Help wanted from the lib’s perspective. Of course SO would want this fixed, but I’m not seeing them devote any development resources towards highlight.js at the moment. Commented Oct 2, 2020 at 10:16
  • Thanks; makes sense. I would definitely say that if SO are using this lib instead of writing their own highlighter, it's on them to add this support if they want their syntax highlighting to work.
    – Rob Grant
    Commented Oct 4, 2020 at 11:07
  • Actually I’ve submitted a new issue for this specific bug, since none of the existing issues directly address it: github.com/highlightjs/highlight.js/issues/2736 — As a further explanation, highlight.js does not try to be context free (and, contrary to your assertion, Antti, nor do many other widespread highlighters). In particular, it attempts to recognise declarations of functions and classes/structs, and highlight them separately. This improves readability, it’s just hard to get right. Currently it merely looks for the presence of a struct keyword, which is insufficient. Commented Oct 5, 2020 at 8:54
  • @KonradRudolph most of those that do, also parse the source code and colour according to the AST not just tokens. In C++ especially it is impossible to get right without parsing the entire source code. int a(x) is a function declaration if x is a type, it is a variable declaration with a constructor call if x is an object, and you know this only by processing all the includes. Commented Oct 8, 2020 at 11:42
  • @Antti I’m well aware of the impossibility of tokenising C++ code. But, contrary to what you’ve said, many modern syntax highlighters try to assign meaning to tokens without parsing. GitHub’s syntax highlighter and VS Code’s (parse-less mode), in particular, fundamentally have the same issue; they do a slightly better job but they still get it wrong (see second link). Commented Oct 8, 2020 at 13:52
  • 1
    Anyway, splitting the C and C++ tokenisers would allow us to get at least C 100% right, and I’m working on it. Commented Oct 8, 2020 at 13:52
20

😄 Thanks for doing this! I'm happy about this outcome, as I was a major proponent of switching to highlight.js back in 2016.

Great! …but what changed?

To satisfy my own curiosity, I'm wondering if you have an explanation or theory for what changed between 2016 and now to make the switch feasible. Oded's performance analysis seemed to raise some major issues, and your post indicates they are no longer issues, but I don't see an explanation for why things changed. For example:

Size in 2016:

It is [too big] … an extra 5kb minimum for millions and millions of requests a day … This size concern only grows with adding more languages.

Size now:

… an extra ~17kB (over the wire) after including all the languages we support across the network. This extra weight gain was acceptable to us as a tradeoff for what we were getting in return.

Speed in 2016:

… (don't forget - we have a highly nested DOM, and many "benchmarks" are done on a very simple page - which is not indicative of performance on Stack Overflow). … In my tests, CPU time for highlight.js was anything between two and four times higher than for prettify … I have also tested by using console.time around our highlighting calls - highlight.js consistently performed worse than prettify.

Speed now:

In our internal performance benchmarks highlight.js scored better than Prettify consistently

Is this size difference acceptable now because of changes in browsers/networks/CDNs, or just because different people were making the decision? Surely the number of requests per day has only increased since 2016?

Do you have information about what performance tests Oded ran in 2016 and why your results now are so different? Is the internal performance testing infrastructure new? Have there been underlying technical changes to the "highly nested DOM" to make highlighting more efficient? Or have there been significant performance improvements in highlight.js itself?

Again, I'm glad the change was made now — I'd just like to know if there was a legitimate reason to wait 4 years and what changed in that time. Was there something we could have done differently to encourage adoption sooner?

6
  • 3
    Well, for one thing, version 10 was released, which users EC2015, which is not supported by IE. However, as SE has long retired IE support (and removed all IE-specific code last year), that's not an issue. Commented Sep 9, 2020 at 18:00
  • 6
    What also happened is Yaakov. I've managed to get some low-hanging fruit Prettify FR past him in January and that might have triggered him enough to realize they were on a dead-end. We're blessed with him as Community Advocate.
    – rene
    Commented Sep 9, 2020 at 19:22
  • 4
    I assume a very large factor was the matter of prettify not being available as a "modern" package and no longer being maintained. That has a major impact on the maintainability of the application that's not directly obvious. The caching infra also changed between 2016 and now and four years are quite enough time for an OSS team to improve performance... Commented Sep 10, 2020 at 10:51
  • 7
    I've seen this post and have not forgotten about it. I'm touching base with our Architecture team to draft up a satisfying answer since they are the ones that made the final decision on the changes, given the benchmarks and size delta.
    – Ben Kelly StaffMod
    Commented Sep 10, 2020 at 14:13
  • 1
    @rene Don't entirely think that was responsible. My later post in June about Prettify being discontinued by Google wasn't responsible either, in my opinion, as it originally alluded to creating a new home-grown SE syntax highlighter, which probably resulted in that post being internally demoted. I think the real trigger was Ham Vocke's post on CommonMark, on which someone had asked if syntax highlighting would be changed, to which Ham's response was "we'll think about it once CommonMark is fully rolled out", and I think it was placed on their internal radar at that time. Commented Sep 11, 2020 at 19:21
  • Main thing that changed is that Prettify has been discontinued. If that had been the case when I was going through my analysis, it would certainly not have been a choice and I would have had to evaluate a few different options. I trust @BenKelly made the right call here, and I am certain the architecture team assisted with that to ensure minimal impact. Fact is - a discontinued library is something that has to be replaced.
    – Oded
    Commented Oct 29, 2020 at 21:44
18

What to do if highlight.js supports a language but Stack Exchange doesn't?

There's an entire Mathematica StackExchange, and so Mathematica syntax highlight is clearly very important to us. But when I did some digging to find the highlight.js bundle that SE seems to be serving, Mathematica isn't in the registerLanguage("...", ...) blocks, even though it is in the set of aliases that StackExchange seems to be defining, i.e. this line is in the bundle

StackExchange.highlightjs=function(){var e={..."mma":"mathematica",...} ...}

It's vaguely annoying to be forgotten by the company to whose platform we've contributed so much, of course, but it'd be way more annoying for our nice syntax highlighting to disappear all of a sudden. We've been getting by with custom support for Google Prettify as written by one of our mods. The lack of Mathematica support is extra confusing when we consider that highlight.js already supports it.

So...what's the protocol for adding highlighting for a language that Stack Exchange, the company, need do nothing extra to support, since highlight.js already has it.

Sample Code

For reference, the following block is fenced with lang-mathematica as the spec. As of when I write this, it renders un-highlighted.

pot =
  Evaluate@With[
     {
      n = 4,
      l = 1,
      c = .25,
      s = .075,
      scale = 4,
      broad = 5
      },
     scale*(JacobiP[n, l, l, #/broad] + .2 JacobiP[2, l, l, #/broad])*
       PDF[
        MixtureDistribution[
         {1, 1},
         {
          NormalDistribution[-c, s],
          NormalDistribution[c, s]
          }
         ], 
        #/broad
        ] - PDF[NormalDistribution[0, .35], #](*+(#/broad)^2*)
     ] &
(* Out: *)
-1.1398350868612364/E^(4.081632653061225*#1^2) + 4*(2.659615202676218/E^(88.8888888888889*(-0.25 + #1/5)^2) + 
    2.659615202676218/E^(88.8888888888889*(0.25 + #1/5)^2))*(5 + 0.2*(3 + (15*(-1 + #1/5))/2 + (15*(-1 + #1/5)^2)/4) + 
    35*(-1 + #1/5) + 70*(-1 + #1/5)^2 + (105*(-1 + #1/5)^3)/2 + (105*(-1 + #1/5)^4)/8) &
6
  • 4
    I would say it is a bug that it's not included. I don't understand highlight.js enough myself, but for some reason the MMA language JS file is in a separate additional-langs folder (by itself) that doesn't seem to get included when the file is built out. It seems like it is supposed to be included, especially given it is listed in the keywords.
    – animuson StaffMod
    Commented Sep 18, 2020 at 4:17
  • 3
    Excellent question. I assure you that Mathematica Stack Exchange will be supported at launch. Due to the large size of the mma language definitions, the language is actually split out from the rest. We do this currently with Prettify as well (check the network logs for lang-mma.min.js). I've already ensured this bundle has been created and works on my local machine. Thanks for checking in on this, I appreciate the concern and understand any potential anxiety around the matter.
    – Ben Kelly StaffMod
    Commented Sep 18, 2020 at 17:20
  • 7
    Some extra comments here for "fun": I did actually attempt to add mma support to the main bundle, but it added 31.1k after gzipping! That means a 75% bundle size increase for a single language. Unfortunately, that wasn't going to fly, so we're continuing to serve it to Mathematica.SE as a separate js file.
    – Ben Kelly StaffMod
    Commented Sep 18, 2020 at 17:26
  • 2
    I personally definitely think that's fair enough and I'm glad to hear that it'll work on our site! Do you have a rough idea of what you would be happy with to add it to the main bundle? Our old highlighting makes use of a trie to save as many bytes as possible and I wonder if it would be worth spending the time lowering the impact of the highlight.js implementation. (Also, just a note that MMA adds symbols regularly that need highlighting, so having a regular update schedule would be really great)
    – Carl Lange
    Commented Sep 18, 2020 at 22:28
  • 1
    Would it be possible to lazily load definitions for "exotic" languages? I.e. at least for fenced code blocks that explicitly specify a language. That way, at least "power users" can get highlighting for their "exotic" languages, without putting a serious burden on everybody else?
    – Max Horn
    Commented Oct 1, 2020 at 8:11
  • @Max It's definitely technically possible... SO would have to write some small amount of custom JS to handle things like that... Highlight.js is designed so that grammar modules can be loaded/fetched stand-alone.... so it's definitely possible for a single page of Stack Overflow to load an obscure language just to highlight a single snippet - if SO made that possible. Commented Oct 20, 2020 at 1:35
18

Will the default code markdown be changed to code fences?

Currently if you click the code ({}) icon in the editor, then the selected text is still indented (or unindented) by 4 white spaces.

enter image description here

As the indent method's way of defining a language for a specific code block (<!-- language: python --> for example) is deprecated¹, then shouldn't the default functionality of the button be to wrap the code in a code fence (```) instead?

1

The former method of specifying a highlighting language can still be used for HTML code blocks: place an HTML comment <!-- language: lang-or-tag-here --> before the <pre><code> tags and it will work.

Also, this former method hasn't been completely removed for four-space indented code blocks, but merely deprecated. While it will still work for the time being on four-space indented code blocks, it may/will be removed in the future.

8
  • Hi, thanks for the report! At this time, no changes were made to the editor itself or how you write markdown.
    – Ben Kelly StaffMod
    Commented Sep 25, 2020 at 15:02
  • @BenKelly appreciate that no changes have been made, I'm more specifically asking if they will be, due to the deprecation of the related features (specifically the old language specification). Thanks.
    – Larnu
    Commented Sep 25, 2020 at 15:08
  • 10
    I would say that changes to the current markdown editor are unlikely, considering we have a "new editor" on the public roadmap. I can confirm that fences are the default there. Stay tuned for more updates on the matter.
    – Ben Kelly StaffMod
    Commented Sep 25, 2020 at 15:11
  • Great, thanks @BenKelly that answers the question (answer?). :)
    – Larnu
    Commented Sep 25, 2020 at 15:13
  • Same for Stack Snippets.
    – Bergi
    Commented Sep 30, 2020 at 8:37
  • I feel like Ben Kelly should deliver all announcements from now on.
    – Rob Grant
    Commented Oct 2, 2020 at 10:09
  • 2
    My preference is always to use the code fence (```) followed by the language I want to use (or 'none'). For instance, in Swift I do ```swift<newline>my code<newline>``` whereas for C#, I do ```chsarp<newline>my code<newline>``` and if I want to show output that doesn't have formatting, I use ```none<newline>my code<newline>``` because leaving it blank would use inference based on the question and tags. It would be good in the new editor to let us specify it there too since so many don't know you can even do this, letting you mix code in a single post. Commented Dec 23, 2020 at 23:52
  • It's been almost two years. Could I possibly see any evidence of progress on this new editor? Commented Jun 1, 2022 at 3:33
16

Apparently asm / assembly has never(?) been supported as a syntax-highlighting language, and the somewhat decent highlighting we got in the past was from auto-detection (presumably as some other language with # or ; comment characters.)

highlight.js auto-detection happens to produce way worse results for assembly than whatever prettify.js did, so in practice there is a real regression here.

Assembly language really doesn't need much highlighting to be readable; it's already syntactically simple and has a regular line format. But it does benefit significantly from fading comments into a colour that stands out less than the rest of the code.

The rest of this answer was written without realizing asm wasn't (ever?) supported; the highlight.js languages including x86asm are not enabled on Stack Overflow / SE, so of course using them doesn't help.


Language auto-detection for assembly language seems to be broken. For example, note the lack of highlighting in the question on Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf. After editing my answer to use ```lang-assembly on the main code block, that block has highlighting but the others don't. (And does actually look decent.)

Separately, syntax highlighting for NASM (a different asm syntax that uses ; as the comment character) is (was?) broken. In Unexpected result of subtracting a NASM macro in an expression, ```lang-nasm or lang-assembly leads to a mess that's arguably worse than nothing, because of single-quote used as English punctuation in a comment. Same result with <!-- language: lang-assembly -->.

(Update: no longer as bad as a couple weeks ago. An apostrophe in comments seems to only affect the end of the contracted word, not the entire rest of the block. But this NASM syntax is block is still not very usefully highlighted, e.g. comments aren't grayed, and only the 0 in 0x.. is in red. At least it's not clearly or much worse than nothing. x86asm is listed in the supported languages and highlight.js's x86 asm highlighter is supposed to be for NASM syntax. x86asm results in no highlighting; I had to use lang-x86asm to get the current highlighting.)

section .rodata           ; groups read-only together inside the .text section
    msg: db "Thank you"
    var: db 0x34, 0x33, 0x32, 0x31   ; dd 0x31323334  ; Since you're passing these bytes to write(2), writing them separately is probably less confusing.  (x86 is little-endian, so they will come out "backwards")

    ;; apparently you want to include var as part of the message, for some reason
    msglen equ $ - msg    ; $ is the current position

Previously, this meta answer wasn't getting any syntax highlighting; that's changed now.


SO asm answers tend to be more heavily commented than you'd do in real life, because the target audience is people that don't understand the basics of asm. And SO code blocks are more cramped horizontally than a normal text editor so it encourages leaving comments closer to the end of the code, making it worse if they're visually harder to ignore. (Especially in some poorly formatted beginner questions and answers where comments are ragged and literally no space is left after instructions.)


Different flavours of assembly language use different comment characters, so this is a somewhat thorny problem. Some use # to decorate numeric literals (e.g. ARM), so treating ;, #, and @ as comment characters won't always work.

As discussed in comments, highlight.js has highlighters for a few different asm syntaxes, no generic asm.

By looking at tags like [arm] as well as [assembly], Stack Overflow should (in theory) be able to pick the right asm syntax highlighting.

For cases like explicit lang-asm overrides in the markdown (which isn't explicit enough: doesn't say which flavour), Stack Overflow could (in theory) still auto-detect which syntax to highlight for based on the ISA tag. e.g. for a post with [c] [x86] tags, a lang-asm block could still pick x86 highlighting.

Except that doesn't disambiguate MASM vs. NASM vs. [gnu-assembler] syntax, with GAS using a different comment character (#) than most other x86 assemblers (;). Many questions aren't tagged with a specific assembler syntax name, just x86. (Most non-x86 ISAs only have one syntax in wide usage; this is mostly an x86 problem.)

To make matters more complicated GAS .intel_syntax noprefix still uses GAS directive and comment character, same as when GAS is in AT&T syntax mode. So [att] syntax questions aren't the only ones where # is the right comment character, even if we could rely on all questions that happen to use AT&T syntax being tagged [att].

But unless / until that happens, I guess we should be tagging asm blocks with one of:

  • lang-x86asm
  • lang-armasm (I guess this is Keil's ARMASM for directive syntax, not GAS? Instruction syntax is the same between both, though.)
  • lang-avrasm

I haven't dug into how Stack Overflow dispatches anything to those internally supported highlight.js things.

27
  • 2
    "SO asm answers tend to be more heavily commented than you'd do in real life..." I see you haven't read much of my real-world assembly code. :-) To be fair, I do often write for a person who doesn't understand the basics, since that person is often me 6-8 months later. Commented Sep 23, 2020 at 0:44
  • Actually, on my system your example code is highlighted, but this highlighting doesn't make any sense at all. See the screenshot: the hex literals aren't recognized; even if I change them to Intel-style 34h, the h is still black. The words like to, of, for, is are blue, which also makes no sense for assembly.
    – Ruslan
    Commented Oct 1, 2020 at 19:49
  • @Ruslan: yeah, it looks that way for me now, but didn't before. (Now matching what I see on SO if I add <!-- language: lang-asm --> to the linked answer in the edit box, which I haven't saved because the highlighting is still worse than nothing for that answer). Perhaps changed when P.Mort. saved an edit, or silently changed rendering at some point without re-saving the question. Clearly highlight.js is being worked on. Commented Oct 1, 2020 at 21:43
  • 1
    FYI: Highlight.js has no generic "assembly" grammar. We currently have ARM asm, AVR asm, MIPS asm, and x86 asm. So for best results SO would have to include all of these (or whichever is popular) or someone would have to create a 3rd party generic "assembly" grammar and SO would have to use that for code tagged "assembly". Commented Oct 7, 2020 at 17:06
  • @JoshGoebel: All questions tagged [assembly] should also be tagged with an ISA, as per assembly tag-usage guidelines. I already edit asm question that are missing such a tag to add it. (The question volume is low enough to keep up with and see literally every asm question.) A few are tagged with multiple ISAs, for questions about the ARM equivalent of some x86 thing, for example. Commented Oct 7, 2020 at 20:43
  • @JoshGoebel: But for x86 specifically, there are many different syntaxes. Even using different comment characters (gnu-assembler for x86 using # vs. almost everything else using ;) and separately AT&T vs Intel. (GAS .intel_syntax still uses # as a comment character). Questions are often not tagged gnu-assembler or att even if they happen to use that syntax. There are also some syntaxes like goasm that are somewhat different. I don't know how significant some of those differences are for syntax highlighting. But for non-x86, tags should disambiguate nicely. Commented Oct 7, 2020 at 20:46
  • Ah our x86 seems to be NASM syntax, or at least that's what the comment says. github.com/highlightjs/highlight.js/blob/master/src/languages/… @PeterCordes I don't know what you mean with all the talk of tagging... I was just clarifying that Highlight.js does not have a single "asm" or "assembly" grammar... so that label doesn't mean anything unless SO chooses to alias is to one of our individual grammars - which I'm not sure is a great idea. Commented Oct 7, 2020 at 21:01
  • @JoshGoebel: That was referring to how Stack Overflow uses it, not upstream highlight.js. When there's no specific language override for highlighting, Stack Overflow detects languages via tags on the question. By looking at tags like [arm] as well as [assembly], Stack Overflow should (in theory) be able to pick the right asm syntax highlighting. For cases like explicit lang-asm overrides in the markdown (which isn't explicit enough: doesn't say which flavour), Stack Overflow could (in theory) auto-detect which syntax to highlight for. Commented Oct 7, 2020 at 21:30
  • 1
    @PeterCordes that language is not supported by SE. They only support the languages in the other link: meta.stackexchange.com/questions/184108/…. So this is no bug, but a feature request to add a certain language to the highlighter.
    – Luuklag
    Commented Oct 8, 2020 at 9:26
  • 2
    @Luuklag: Wait a minute, so lang-asm / lang-assembly has never been supported on SO? And it just happened to get auto-detection based on syntax and do something which turned out to be useful in most cases with prettify.js, unlike with highlight.js? I guess in hindsight it was probably treating GAS code like some other language with # comments. Anyway, in practice it's still a regression in practical results, however we want to classify it and maybe do something about it in the future. At least that explains why x86 NASM syntax highlighting isn't working. Commented Oct 8, 2020 at 9:40
  • 1
    Assembly code was always randomly coloured on StackOverflow. No one seemed to care though. Despite another answer saying no colour is better than wrong colour most people just seem to want pretty colours. The current everything is green syntax highlighting that assembly usually gets I suppose is technically better, but doesn't give people the pretty colours they want, or provide any useful syntax highlighting.
    – Ross Ridge
    Commented Oct 8, 2020 at 17:00
  • 1
    @PeterCordes It would be super interesting if someone had a way to find out which Prettify grammar was actually being used in the past if that seemed to work so well... SE could always choose to force assembly to any grammar they wanted that provided good highlighting results (if they choose not to build in the explicit assembler grammars). Commented Oct 28, 2020 at 3:07
  • 1
    @PeterCordes I think long-term SE will load more grammars dynamically... (which is the real solution here)... but if not they should probably consider some tiny custom grammars... if all you MOSTLY care about is comments that's only a tiny amount code to make a grammar that would just only highlight comments, and leave everything else alone with 0 false positives. Commented Nov 6, 2020 at 8:53
  • 1
    @JoshGoebel: As I pointed out in my answer, different assembly languages have different comments characters (and some use # as part of the syntax for numeric literals in some contexts), but yes a couple different simple grammars for different comment chars could work. And yes, doing nothing but comments would be totally fine and vastly better than nothing; good asm style already makes most other things visually easy. A few refinements could include a color for label: labels. Some other easy things I can think of would be anti-helpful in badly-formatted asm (like insns not indented). Commented Nov 6, 2020 at 8:57
  • 1
    I've started an effort to add x86asm as a supported highlight language on SO: meta.stackoverflow.com/questions/412543/….
    – BeeOnRope
    Commented Oct 25, 2021 at 22:47
12

I've just tried the following piece of JavaScript code (from this answer of mine in Code Golf) because Google Prettify was not parsing the regular expression followed by an inline comment correctly. That's why I used alternate slash characters in the original post (I've turned them back into regular slash characters below).

But this is much worse now, as the first 5 inline comments are not parsed as comments at all.

f = (                // f is a recursive function taking:
  [c,                //   c   = next digit character
      ...a],         //   a[] = array of remaining digits
  o = '',            //   o   = output string
  S = new Set        //   S   = set of solutions
) =>                 //
  c ?                // if c is defined:
    f(               //   do a recursive call:
      a,             //     pass a[]
      o + c,         //     append c to o
      o ?            //     if o is non-empty:
        f(           //       do another recursive call
          a,         //         pass a[]
          o + [, c], //         append a comma followed by c to o
          S          //         pass S
        )            //       end of recursive call (returns S)
      :              //     else:
        S            //       just pass S as the 3rd argument
    )                //   end of recursive call (returns S)
  :                  // else:
    S.add(           //   add to the set S:
      o.replace(     //     the string o with ...
        /\d+/g,      //       ... all numeric strings
        n => +n      //       coerced to integers to remove leading zeros
                     //       (and coerced back to strings)
      )              //     end of replace()
    )                //   end of add() (returns S)

I'm sure this is going to be fixed quickly, so here's a picture of the current rendering for later reference. :-)

rendering

5
  • 6
    This looks like a bug upstream in the language processor itself. The first few comments are getting marked as being included in the params of the function. I'd recommend filing a bug or submitting a PR against highlight.js itself
    – Ben Kelly StaffMod
    Commented Sep 14, 2020 at 14:25
  • 7
    @BenKelly At any rate, there are probably a couple of good test cases on Code Golf as you'll find there plenty of unusual (yet valid) ways of writing and commenting code, at least for the few standard languages that are actually recognized by highlight.js. (We obviously don't use syntax highlighting at all for all the esolangs we're using.)
    – Arnauld
    Commented Sep 14, 2020 at 14:46
  • 2
    @Arnauld I made a PR for this and also fixed a long standing Scala issue that was similar while I was at it. Thanks. Will make it into the next minor release. Commented Sep 22, 2020 at 15:15
  • @JoshGoebel Excellent! Thank you.
    – Arnauld
    Commented Sep 22, 2020 at 15:21
  • This should be resolved in 10.3 (just released). Commented Oct 17, 2020 at 17:40
12

In LaTeX code, @ should be treated as a letter. There are any number of examples on tex.stackexchange, but

Undefined control sequence on \beamer@leftmargin indentation

\begin{frame}[fragile]{E}
\makeatletter
\hskip-\beamer@leftmargin
\makeatother
\lipsum[2]
\end{frame}

\beamer@leftmargin is a single token, but beamer is coloured and @leftmargin is left as unstyled text which makes the code very hard to read.

Technically, @ is not always a letter, but it is almost always a letter when appearing in code sections and is a far better default in a syntax highlighter.

3
  • 3
    This seems like something you should file to the Highlight.js repository. Commented Sep 25, 2020 at 21:25
  • 5
    @SonictheMaskedWerehog I could, although I wasn't sure if stackexchange were collecting things up here and then reporting upstream rather than a million stackexchange users suddenly swamping the highlight.js repo with individual bug reports... Commented Sep 25, 2020 at 22:12
  • 1
    Major thanks for the much improved LaTeX support in 10.3.0 (ground up rewrite) contributed by @schtandard. I just released 10.3.0. Commented Oct 17, 2020 at 17:39
11

There has been times that I've turned off code highlighting with <!-- language: lang-none --> because Prettify was getting it wrong, and no highlighting is better than wrong highlighting. (The example that comes to mind was a Bash snippet where # wasn't a comment indicator, but Prettify thought it was.) After this change goes through, should I go back over those posts and turn code highlighting on again? Is it better?

I suppose I can test it.

10
  • 5
    Honestly, I'd recommend simply adding the language any time you consciously think about it. If you use the code fence syntax (my personal preference) you can easily set the language. That being said, give it a shot! I have a sibling post on Meta Stack Overflow that explains how autohighlighting is changed (at 5000 ft anyways).
    – Ben Kelly StaffMod
    Commented Sep 9, 2020 at 14:33
  • @BenKelly you think it's OK to force us to manually set the lang on each of 10 code snippets in a question, say? even then, it's erratic, works for some snippets, doesn't work for others. examples here.
    – Will Ness
    Commented Sep 26, 2020 at 17:01
  • This was marked as bash, @BenKelly, but Prettify didn't pick it up. I'll try to find the post again now, and edit it. Can't remember where it was.
    – TRiG
    Commented Sep 26, 2020 at 20:08
  • Yup. It works. stackoverflow.com/a/6482403/209139
    – TRiG
    Commented Sep 26, 2020 at 20:11
  • @WillNess I think right now there is some confusion about which languages SO actually supports (with the new Highlight.js support). You can find a complete list on the post where they talk about how highlighting works. meta.stackexchange.com/questions/184108/… If a language isn't listed there no amount of proper tagging it will help. Some people are confused because Highlight.js supports a particular language but SO do not. Commented Oct 20, 2020 at 1:38
  • @JoshGoebel in e.g. pastebin.com it just works. in SO, it doesn't. I'll let the paid stuff worry about the rest. :)
    – Will Ness
    Commented Oct 20, 2020 at 6:06
  • @WillNess Single purpose sites are always simpler. :-) If SO fixes their core implementation a bit and then people pitch in to report remaining issues I imagine things can improve greatly. I don't think going back to Prettify is an option, so unless they want to switch to something else or roll their own the best outcome for all is to improve Highlight.js and fix any obvious bugs. :-) Are you super familiar with Prettify? When I took a look (at the original code) it just looked to be a very simple pattern matcher/highlighter... is that all there is to it? Commented Oct 20, 2020 at 18:20
  • @JoshGoebel pastebin supports lots of languages too; how good, I have no idea. but e.g. Haskell looks super nice there. SO could just use what they are using, and if that's in-house, just buy the thing. no, I know absolutely nothing about implementing this thing. (on a completely tangential note, what are the chances for someone (me) to see a new, somewhat unusual/remarkable surname for the first time in their life, twice in one day! I do mean yours and someone from 1980s named Randy Goebel whom I saw mentioned today in connection with "Waterloo Prolog".... what are the chances!!? :) )
    – Will Ness
    Commented Oct 20, 2020 at 18:57
  • (.... by pure chance, I mean). what a coincidence, huh!?
    – Will Ness
    Commented Oct 20, 2020 at 19:12
  • @WillNess pastebin.com is server-side looks like... I imagine it was very important for SO to let the clients do the heavy work of highlighting vs doing it on their own servers... and for simple (small footprint) JS based highlighting the two big players I know of are Highlight.js and Prism. I assume they would have also considered Prism, though I'd be curious to know what led them to choose us over Prism. (we do try and be a bit more ambitious than Prism in our highlighting) Commented Oct 20, 2020 at 19:33
10

We have been waiting for Verilog and SystemVerilog (SV) highlighting for a long time. Apparently we will have Verilog support with highlight.js, but SV will continue to be unsupported. Still much better than before. I'm happy with the change and appreciate your effort.

Let me put some Verilog code (from highlight.js demo) here to see the result after the roll-out. I assume the language code will be lang-verilog.

EDIT: We haven't got Verilog support as Ben Kelly mentioned in the comments. The following snippet has no language code, thus we see the result of auto detection.

`timescale 1ns / 1ps

/**
 * counter: a generic clearable up-counter
 */

module counter
    #(parameter WIDTH=64, NAME="world")
    (
        input clk,
        input ce,
        input arst_n,
        output reg [WIDTH-1:0] q
    );
    
    string name = "counter";
    localparam val0 = 12'ha1f;
    localparam val1 = 12'h1fa;
    localparam val2 = 12'hfa1;

    // some child
    clock_buffer #(WIDTH) buffer_inst (
      .clk(clk),
      .ce(ce),
      .reset(arst_n)
    );

    // Simple gated up-counter with async clear

    always @(posedge clk or negedge arst_n) begin
        if (arst_n == 1'b0) begin
            q <= {WIDTH {1'b0}};
            end
        else begin
            q <= q;
            if (ce == 1'b1) begin
                q <= q + 1;
            end
        end
    end

    function int add_one(int x);
      return x + 1;
    endfunction : add_one

`ifdef SIMULATION
initial $display("Hello %s", NAME);
`endif
endmodule : counter

class my_data extends uvm_data;
  int x, y;

  function add_one();
    x++;
    y++;
  endfunction : add_one
endclass : my_data
12
  • 2
    Hurrah. About ....ing time. Commented Sep 10, 2020 at 11:17
  • 1
    If you want SV support, I suggest consulting the highlightjs dev documentation and creating a pull request. I’m in the process of doing this for multiple languages. Commented Sep 10, 2020 at 13:37
  • 9
    I hate to be the bearer of bad news, but this update has not added any new language support (officially. Technically sublanguages like Less/scss got snuck in with the css support). I tested this code snippet on my dev environment and it autodetects as scala. That being said, adding more languages to our bundle is technically trivial, but we need to be mindful of bloating the deliverable size. I'll keep this suggestion in mind once we revisit and decide whether to add additional language support.
    – Ben Kelly StaffMod
    Commented Sep 10, 2020 at 14:10
  • @BenKelly Thanks for information. I removed the language code, but there is no highlighting for scala either.
    – user361230
    Commented Sep 11, 2020 at 14:37
  • 2
    @BenKelly To avoid bloated bundles, could new synxaxes not be loaded dynamically depending on what's present on the page? (I'm holding out hope for ```diff and ```elixir support...) Commented Sep 11, 2020 at 15:18
  • 3
    @LionelRowe In theory, yes. That being said, I've not looking into the possibilities there or how it would impact us (from a bandwidth/hosting perspective) or our users (from a UX/performance perspective). This is definitely something we'll keep in mind if we decide to expand our supported languages in the future.
    – Ben Kelly StaffMod
    Commented Sep 11, 2020 at 17:35
  • 4
    I suppose we've been collectively waiting 7 years, whats a few more. ¯\_(ツ)_/¯ Commented Sep 11, 2020 at 19:56
  • @TomCarpenter You must be referring to that question. We have 15 days to the 7th anniversary. :)
    – user361230
    Commented Sep 11, 2020 at 21:34
  • @BenKelly If performance is a concern, perhaps maintain two separate files: highlightjs-compact (contains only core languages, e.g. C++, Java, Python ...) and highlightjs-all (contains a larger selection of languages, including Kotlin, Fortran, SystemVerilog, ...). Commented Sep 19, 2020 at 2:54
  • 1
    @BenKelly There is an issue open for "dynamic loading" of languages on the Highlight.js GitHub, but it's not a huge priority right now. Someone has a plugin also, but I wouldn't call it "solid" yet... (not sure it's the proper way to go about this) The best way to do this is likely to "wrap" HLJS (or at least highlightBlock)... if the language is valid (but not bundled and not loaded yet), issue a fetch... and then on competition just call highlightBlock again in the promise success callback. While loading you'd apply the hljs class (so the blocks appear as unhighlighted code). Commented Sep 21, 2020 at 15:51
  • 1
    I had a demo/prototype of something like this working for our codebase, but the problem isn't just the size of the bundle, but it's about number of network requests too. Adding even a single extra request to our most hit page on our most hit site (Question view on SO) would be murder on our CDN bill. Not insurmountable by any means, but it just needs extra thought/engineering at our scale. We wanted to get the base out first and make sure it works well before we really started messing with the process.
    – Ben Kelly StaffMod
    Commented Sep 21, 2020 at 19:30
  • @BenKelly Someone mentioned a web worker for local caching... of course the same kind of thing would also be possible with just fetch/localStorage or something... Language grammars should be 100% frozen between patch releases (which should greatly simply their caching)... so once you cached say "10.3.1/obscure_lang.js' on a given client it would never need to be fetched again until your version of Highlight.js was bumped, invalidating the "10.3.1" cache key... To lower the number of total requests you could have a "secondary pack" that loaded multiple languages. Commented Oct 20, 2020 at 1:43
10

Will lazy loading of syntaxes be supported?

This would allow for syntax highlighting of less common languages that don't need to be eager-loaded on every page load.

Here's a proof-of-concept that doubles as a Tampermonkey user-script:

Highlight.js lazy-loading proof of concept

Naturally it's a little hacky, but it works on all of the following examples:

```lang-diff

- print('failure')
+ print('success')

```lang-elixir

spawn_link(fn ->
  send(current_process, {:msg, "hello world"})
end)
  
receive do
  {:msg, contents} -> IO.puts(contents)
end

```lang-bf

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
11
  • Does it work on CSS? The Google Prettify highlighter has always had trouble recognizing CSS. Commented Sep 21, 2020 at 23:49
  • @user289905 It works only for languages specified with lang-XXX. Shouldn't be required for CSS, though, because that's already eager-loaded by default. Commented Sep 22, 2020 at 8:02
  • When I run this I just end up with a whole bunch of console warnings and no visible effect: Could not find the language '_____', did you forget to load/include a language module?, fill in the blank with none, assembly, diff, elixir, and bf.
    – zcoop98
    Commented Sep 22, 2020 at 22:54
  • The concept is surely nice however, I really like the idea of being able to run a script user side that would highlight more languages.
    – zcoop98
    Commented Sep 22, 2020 at 22:55
  • @zcoop98 Those warnings are generated when hljs does its initial pass, not when the userscript runs. I've amended it to run on the load event for the original hljs script (which is itself loaded dynamically), but this still doesn't suppress the warnings. Can you confirm you're seeing no color at all on these code blocks after running the script? It's a little difficult to see at the moment, especially for the Elixir example, due to the hljs color scheme. Commented Sep 23, 2020 at 9:12
  • 1
    @LionelRowe I can see colors now!
    – zcoop98
    Commented Sep 23, 2020 at 15:01
  • 1
    Fixed to work with the latest update - the eagerly loaded run now highlights unknown but specified languages with what I assume is a best-guess attempt? Anyway, v0.4 of the user-script will now override these. Commented Sep 25, 2020 at 20:02
  • 1
    @LionelRowe This is awesome. Don't think you can suppress the warnings since you probably have no way to load and change anything BEFORE Highlight.js makes it's first pass... but they aren't really causing any harm either. Commented Oct 18, 2020 at 2:38
  • @LionelRowe Why is MutationObserver necessary? Commented Oct 28, 2020 at 3:46
  • @JoshGoebel At least when I made this, the hljs script was not present at the time DOMContentLoaded was fired, necessitating the MutationObserver. I left the 3 conditions at the bottom in case the means of loading was changed. Commented Oct 28, 2020 at 7:54
  • 1
    @LionelRowe Here is a proof of concept Chrome extension: github.com/joshgoebel/se_highlightjs :-) Nice thing about an extension is you can bundle the files, no need for any CDN or external fetches at all. Commented Oct 29, 2020 at 0:15
9

Visual Basic code is no longer highlighted

The error in the console is:

Could not find the language 'vb', did you forget to load/include a language module?
Falling back to no-highlight mode for this block. <pre class="lang-vb s-code-block">
13
  • 2
    Seems VBA, VB.Net and VBScript all affected, even though all 3 are uniquely supported in highlight.js (unlike prettify, where I thing VBA just mapped to one of the other two)
    – Greedo
    Commented Sep 24, 2020 at 21:33
  • Interestingly vbs, vb, vbnet as tags for code fences all result in some kind of default highlighting that some gibberish tag dfbdfkbk also produces, as if they aren't recognised - however vba suppresses highlighting altogether, so slightly different
    – Greedo
    Commented Sep 24, 2020 at 21:37
  • 5
    Thanks for the report. I may have missed a language alias here. I'll double check and get the changes into my first round of fixes going out today.
    – Ben Kelly StaffMod
    Commented Sep 25, 2020 at 15:05
  • vb and vbs now will set the language to vbscript. I had somehow missed aliases for literally just these two languages :facepalm:. Must be some sort of repression mechanism from the first half of my career writing primarily in VB.Net ;). vbnet and vba (not being supported aliases) will result in the code autohighlighting instead.
    – Ben Kelly StaffMod
    Commented Sep 25, 2020 at 19:44
  • @BenKelly Thank you. However, the current highlight does not look like VB to me at all :(
    – GSerg
    Commented Sep 25, 2020 at 19:53
  • @GSerg Do you have an example post you're looking at that I can use to reproduce what you're seeing?
    – Ben Kelly StaffMod
    Commented Sep 25, 2020 at 20:00
  • @BenKelly Anything, really? E.g. stackoverflow.com/q/64017640/11683. The type declarations (As String) should be blue, including the As. String literals should not be green. True and False should be blue.
    – GSerg
    Commented Sep 25, 2020 at 20:05
  • @GSerg Thank you for the update. The As not getting highlighted is a shortcoming of highlight.js's language definitions for vbscript. As for the colors, what item should be what color is left to the theme. Part of the highlight.js rollout was a new theme, which might be causing the confusion
    – Ben Kelly StaffMod
    Commented Sep 25, 2020 at 20:11
  • 4
    @BenKelly That is not surprising really because vbscript does not have As in the first place. It also does not have other things, which I assume will then also not be highlighted. Arguably, out of the VBA, VB.NET and VBScript, the latter is the least good candidate for representing the other two. It is certainly a step back for all three then :(
    – GSerg
    Commented Sep 26, 2020 at 6:52
  • 4
    @GSerg I've taken your comments into consideration and I've decided to switch the syntax highlighter from vbscript to vbnet instead. There's no specific reason it was vbscript to start, just that's what I chose. In hindsight, vbnet is the obvious choice. Thanks for bringing this up! The change should go live sometime tomorrow.
    – Ben Kelly StaffMod
    Commented Sep 28, 2020 at 21:03
  • @BenKelly One consideration after looking at the highlight files for the different languages (vbscript, vba, vbnet) is the latter two contain "illegal" attributes such as '//|{|}|endif|gosub|variant|wend|^\\$ ', /* reserved deprecated keywords */ in vbnet. I believe these illegal attributes only affect default highlighting by telling the language recogniser to bail early if it finds wend - since that couldn't possibly match Vb.Net - but of course wend is valid in VBA/VbScript. IIUC SO tags override default, so will matter? Some valid VBA is never auto-detected essentially
    – Greedo
    Commented Sep 29, 2020 at 16:30
  • Basically what I mean is, if SO relies on highlight.js auto-detection, then using vbnet (or vba) to highlight all the vb family, will result in some false negatives, eg. where code is valid in VBA but not VB.Net. VBScript is the most lenient in syntax, because it has the least of it (and thus does a poor job highlighting VBA and VB.Net, but a good job detecting them). If SO doesn't need autodetection much then this is a non issue and the most keyword rich vbnet is certainly superior as a catch-all (ofc all 3 individual highlighters would be most optimal)
    – Greedo
    Commented Sep 29, 2020 at 16:40
  • @Greedo Can you tell me in what contexts VBScript (vs VBA or VB.net) is still alive and being used today? Commented Oct 7, 2020 at 17:32
8

Will the <!-- language: [language] --> hint be disappearing?

Back when SE was switching to CommonMark, we were told that the old <!-- language: [language] --> syntax had been deprecated and was subject to removal in the future. (Prior to the implementation of code fences, this was the proper syntax to force a single block of code to be highlighted differently from the rest of the post.) With this change, will that comment style be removed once this is rolled out to all the sites?

It seems to work fine at the moment. The following is specified as a C code block:

#include <stdio.h>

...and here's the same text, but as a Python code block:

#include <stdio.h>

Are there plans to remove it, or will it remain for the foreseeable future? If it is going to be removed, will it still be that posts rendered prior to its removal will still honor it until they're edited, as we were told at the time?

4
  • 2
    At this moment, there is no plan to completely remove support for the language comment syntax. If this were to change, we'd make an announcement on meta first. As much as I am not a fan of the syntax, we don't have a good replacement for language-all in commonmark or our supported extended syntax. I've not changed anything in the implementation for the highlightjs switch, so any issues are to be considered bugs. (I see the one you reported above, thanks for reporting!)
    – Ben Kelly StaffMod
    Commented Sep 11, 2020 at 14:10
  • 3
    @BenKelly To be clear, language-all would continue to be supported; only language had been deprecated. As you say, there isn't a good replacement for language-all, but code fence hints exist as the preferred alternative to language today. Commented Sep 11, 2020 at 17:17
  • 2
    I should have specified: There is at this time no plan to completely remove either language or language-all. One or both could possibly be removed in the future (with language potentially going first since there's a real, widely accepted alternative). If we do decide to do this in the future, then we'll broadcast the change ahead of time in order to gather feedback from the community.
    – Ben Kelly StaffMod
    Commented Sep 11, 2020 at 17:33
  • @BenKelly I just read further into that post, and it seems like the plan wasn't to remove language entirely, but to remove its support only for four-space indent code blocks and retain it for HTML <pre><code> blocks. I think that may be too technically difficult to implement, so I think the best solution would be to retain it as is. (It seems to me that Ham intended to remove it entirely at that time, but ran into issues with it, so he simply marked it as "deprecated, subject to removal" instead.) Commented Sep 13, 2020 at 21:25
8

PowerShell and batch syntax highlighting is off all around and neither works correctly (left: Stack Exchange; right: Microsoft's Visual Studio Code) Screenshot1

  • It appears batch and PowerShell syntax have been linked to each other, which simply doesn't work for either due to the different ways variables and other characters are used between the two:
    • PowerShell comments use #, whereas batch uses :: Screenshot 2
    • PowerShell variables use $, whereas batch uses %<variable>%
      Screenshot3
    • PowerShell doesn't support linking commands via &||&&, using ; instead, which batch doesn't support
  • PowerShell syntax only syntax highlights if first letter of command or parameter is capitalized, leading to a ridiculous amount of posts not syntax highlighted unless edited, as it doesn't syntax highlight if the entire command/parameter is in all lowercase or all uppercase (latter also affecting batch), which it should since PowerShell isn't case aware
    • PowerShell and batch syntax highlights don't apply as they should when code fences are used (also an issue with other languages), regardless whether the syntax is specified after the code fence or not - the only way to reliably have it syntax highlighted is to use HTML syntax comment <!-- language-all: lang-powershell --> or lang-bat (it was also an issue with Prettify)
2
7

SQL Formatting issues

As I almost exclusively stick to SQL Server related tags, I've picked up on a few issues/features with the sql language formatting.

Hash character incorrectly interpreted as comment character in SQL

In the below example, on the first line, everything after the # in VIN# is a coloured as a comment. On the third line, everyting after the # in #TempTable is. This doesn't, however, occur within the literal string, does within brackets ([]) (used by T-SQL as a delimit identifier), and doesn't within double quotes (") (the ANSI SQL delimit identifier).

SELECT VIN#, NTT.fID, GETDATE(),
       SYSDATETIME()
FROM #TempTable TT
     JOIN dbo.NonTempTable NTT ON TT.ID = NTT.fID
WHERE Description = 'Hello#there' AND NTT.Val = 3
  AND [VIN#] > 7
   OR "VIN#" < -12;
--This is an actual single line comment
/* 
This is a
Multiline
Comment
*/

# isn't even a comment character in SQL. Single line comments are defined with -- and multiple with /* ... */.

This is actually quite a problem, especially when temporary objects start with a #, and are used frequently with DDL and DML examples.


Further edit

Brackets ([]) not treated as delimit identifier

In T-SQL (as stated above) Brackets ([]) are the default delimit identifier, rather that double quotes ("), which are the ANSI delimit Identifier.

If a key work is within brackets, it is highlighted incorrectly. For example:

SELECT [name]
FROM dbo.[Table] T
     JOIN dbo."VIEW" V ON T.ID = V.IDl

I did decide to check, and there isn't a T-SQL variant option:

SELECT [name]
FROM dbo.[Table] T
     JOIN dbo."VIEW" V ON T.ID = V.IDl

Another edit:

The @ character isn't recognised as a variable identifier

Variable names aren't highlighted, or "immune" to other highlighting. Variable names are prefixed with an @ in SQL. For example:

DECLARE @variable varchar(10),
        @Table table (ID int),
        @Date datetime2(0),
        @1 int,
        @NonReservedWord sysname;

Notice that all the variable names, apart from NonReservedWord, receive incorrect syntax highlighting.

9
  • Highlight.js "SQL" language is intended to be very "baseline". It's not going to handle every possible SQL variant well. Currently it includes too much (and on the list to be scaled back)... if someone wants "better/proper" MS SQL support then the correct solution is for someone to provide a 3rd party grammar module for MS SQL variant. Currently we already have PostgreSQL (core) and Transact-SQL (3rd party) grammars in addition to the plain jain "SQL". Commented Oct 7, 2020 at 14:50
  • @JoshGoebel that doesn't explain most of the above. Yes the brackets ([]) are T-SQL specific, however, # is a very different story, as only -- and /*...*/ are the ANSI methods for comments. @ is also the common theme for variables. The majority of this is not T-SQL specific. The misinterpretation of # as a comment is especially detrimental in my opinion.
    – Larnu
    Commented Oct 7, 2020 at 14:53
  • Re:Comments. This is because there yet exists no "MySQL" SQL grammar so that support has been squeezed into SQL... MySQL does indeed allow # comments. What needs to happen here is for someone to creates a separate MySQL grammar and then the SQL grammar can be simplified (removing # comments). Does ANSI SQL even have variables? Variables would be something pretty trivial to add to the existing SQL support (since it currently includes too much). I'd be open to a PR against SQL adding variables. Commented Oct 7, 2020 at 15:08
  • According to this answer from 2014, it does (did) not, @JoshGoebel . Though, not all of the languages use @. For example T-SQL (SQL Server), MySQL use @, where asPL\SQL (Oracle) and pgSQL (PostGreSQL) do not. SQLite doesn't actually support Variables at all; by the looks of it. I do admit that all the RDBMS using (vastly) varying implementations of SQL does pose many problem from the perspective of a generic "SQL" language implementation.
    – Larnu
    Commented Oct 7, 2020 at 15:14
  • 1
    I will add variable support to SQL since that will prevent false keywords also. github.com/highlightjs/highlight.js/pull/2740/commits/… Commented Oct 7, 2020 at 17:14
  • 2
    Sadly the SQL grammar improvements didn't make it into 10.3 because it turns out we're going with a complete rewrite of the SQL grammar. The new grammar should hopefully do much better overall on snippets but will drop vendor-specific (non-standard) things like MySQL comments. We could still use any contributors who might want to create (and help maintain) 3rd party modules for MySQL and Oracle to support the nuances of those vendor's SQL grammars. Commented Oct 17, 2020 at 17:48
  • Thanks for the update @JoshGoebel, appreciate it.
    – Larnu
    Commented Oct 17, 2020 at 18:22
  • I'd be happy to help out with Oracle SQL and PL/SQL, if Oracle Corp can't spare anyone :) Commented Nov 2, 2020 at 23:13
  • 1
    SQL improvements finally landed in 10.5 for a holiday release! SE just updated to 10.4.1 so I'm not sure when the next update will be. CC @BenKelly Commented Dec 23, 2020 at 21:05
7

(Manually changing this from bug to status-bydesign given my discoveries documented below.)

I searched around, but I couldn't find any previous posts referencing regular expressions.
Regular expressions are stated to be currently supported, but it is not in the list of languages supported by highlight.js (it was supported by Prettify).

There are some weird effects when highlighting complex expressions, e.g., from this answer:

(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

It sometimes italicizes the characters between asterisks *, and other times fails to highlight character lists inside square brackets, [].

If it's not supported by highlight.js, where is this highlighting scheme even coming from?See update Are regular expressions included in the FAQ list by mistake1? I notice that the default highlighter for the tag on SO is lang-default rather than lang-regex.


Update

So I've done a little digging, and it appears what's really going on here is that the regular expression in this post is getting auto-recognized as Markdown, even when specified as regex.

Setting the identifier of the same snippet as lang-markdown has an identical effect as regex:

(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

This leads into the discovery I made, which largely revolves around the last sentence of my original post:

I notice that the default highlighter for the tag on SO is lang-default rather than lang-regex.

As described in this post by @T.J.Crowder, and backed up by the help center, there is a difference between identifying a code block as lang-X vs. just X.

As per the help center (emphasis mine):

You can use either one of the supported language codes, like lang-cpp or lang-sql, or you can specify a tag, and the syntax highlighting language associated with this tag will be used.

This was news to me! I had been under the impression, which I'm sure many others are as well, that ID X was simply a shortcut to lang-X. This is incorrect.

Therefore, ID'ing a snippet as regex is really saying "identify this snippet as the defined identifier for ". This happens to be lang-default, which is really a shortcut to tell the highlighter to "guess" what the correct highlight should be, which in this specific case, becomes Markdown.

So it's going regex ==> lang-default ==> lang-markdown.

Popping open the console to take a look at the first snippet here will still show class="lang-regex s-code-block hljs", even though it's getting highlighted as Markdown. I believe this is due to how highlight.js works. It appears it never actually changes the identifier class name itself, but rather injects the child syntax classes underneath it in regardless.


1 - It looks like it was added back into the list in the FAQ post on Sept. 28 (Rev. 100), and given my discoveries below, the answer is yes, it is a mistake.

1
  • 1
    @BenKelly I really wonder if you should not blocklist some tags completely from highlighting. There is NO good guess for highlighting regex... you shouldn't even try. The result is guaranteed to be poor. This probably also applies to some other more unique languages also, like brainfuck, etc... Commented Oct 20, 2020 at 1:45
6

Does highlight.js support emphasis in blocks formatted as "code" (ie indented 4 spaces)?

Paraphrasing an MSE question:

In-code highlighting (anything will do), would be a great way to emphasise the important parts.

Currently, the best people can do is ASCII art arrows, eg:

printf("%5s", "foo")
         ^--- add a width value

which happens often enough and is probably not done more because it's a pain and ugly.

It would be great to be able to highlight (in this case) the 5 by making it red, bold, or whatever by surrounding it with some special chars, maybe like !5! or whatever works.

Awesome would be highlighting with a comment that isn't selected when copy-pasting the code block.


I'm putting my hand up to donate my time and coonsiderable software engineering skills to make this happen. Let me know when you have a github repo up, you've added me as a contributor, and you have a task management system up (eg Trello, please not jira!)

5
  • There's nothing like this in highlight.js proper that I'm aware of. Technically, we could support diff highlighting, but that brings its own set of drawbacks. highlight.js does support plugins, so this could be something that we could potentially write a plugin for. This is not currently on our radar as far as I know, but it is definitely something to think about for the future.
    – Ben Kelly StaffMod
    Commented Sep 18, 2020 at 17:31
  • It wouldn't help in the code block situation since the HTML tag would be displayed rather than treated as a tag, but it would be nice if <mark> were a supported HTML tag on StackExchange so that relevant potions of text could be highlighted.
    – M. Justin
    Commented Sep 18, 2020 at 18:42
  • 4
    @ben something that both amazes and saddens me is SO not tapping into the 1000’s of top class developers that would willingly donate their time (they already do) to writing code to deliver features. For some a reason I can’t fathom, SO is dev is a closed shop. In this particular case, surely there are 100’s of devs that are members of this community who could write such a plugin. Why not just ask them to do it, and it would get done pronto.
    – Bohemian
    Commented Sep 18, 2020 at 18:47
  • @M.Justin yes, there must be a way. If we write a plugin, we can pick what we like. Eg <mark comment="some comment that is visible, but not selected when copying code">some.code()</mark>
    – Bohemian
    Commented Sep 18, 2020 at 18:53
  • 1
    <pre><code> blocks can include arbitrary HTML (IF the editor permits AND they're using the traditional highlightBlock)... so you could just wrap it in <span class='highlight'>my code</span> and assuming there was a highlight class the HTML is preserved and will be "passed thru" while the code is highlighted... so when finished that span will still be in place. Requires a CSS class though. Long-term a solution like <mark> via some sort of plugin would probably be better though. Commented Sep 21, 2020 at 15:24
6

Syntax highlighting isn't always present in the entire code block

This is an odd one. I've noticed this in a few languages, not just SQL, but sometimes the highlighting just doesn't work on the entire code block. This appears to happen more when the code snippet isn't complete on its own (and so isn't valid syntax on its own).

Take the below SQL snippet for example:

SUM(CASE WHEN SIPCOD in ('001','500') or (SIPCOD = '013' and SISHCD = 'OTA')
         THEN 1
         ELSE 0
    END) -
SUM(CASE WHEN SIPCOD in ('501','502') and SIHRS >= 3.0
         THEN 0.5
         ELSE 0
    END) as [Days Worked]

Even with the language defined (both with sql or lang-sql), the first line to receive syntax highlighting is the fourth line (END) -); the prior lines have no highlighting. The image below is from SO Dark Theme:

Enter image description here

I'll try and repro this with some other languages and edit it in, or if I see other examples (I'm sure I've seen at least one C# and PowerShell example over the weekend on my mobile).

This is SQL again. However, this one doesn't highlight the last line, for some reason:

IF EXISTS (SELECT 1 FROM [135.282.123.12].tempdb.sys.tables WHERE [name] = N'##Tmp1')
    PRINT N'YES';
ELSE
    PRINT N'No';

Enter image description here


Apologies, this is SQL again, but the highlighting is all kinds of wrong in this code block. It starts, then suddenly stops, and then picks up again it the oddest place:

CREATE TABLE dbo.RealTable (ID int IDENTITY);
GO

DECLARE @SQL nvarchar(MAX);
--Good attempt
EXEC dbo.CreateNewColumn @TableName = N'RealTable',
                         @ColumnName = N'SomeString',
                         @sql_dtype = N'nvarchar',
                         @length = '255',
                         @SQL = @SQL OUTPUT;

PRINT @SQL;
--Another good attempt
EXEC dbo.CreateNewColumn @TableName = N'RealTable',
                         @ColumnName = N'SomeInt',
                         @sql_dtype = N'int',
                         @SQL = @SQL OUTPUT;

PRINT @SQL;
GO
DECLARE @SQL nvarchar(MAX);
--A bad attempt
EXEC dbo.CreateNewColumn @TableName = N'RealTable',
                         @ColumnName = N'AChar',
                         @sql_dtype = N'char',
                         @length = N'CREATE USER test WITHOUT LOGIN',
                         @SQL = @SQL OUTPUT;

PRINT @SQL;
GO
DECLARE @SQL nvarchar(MAX);
--Bad parameters
EXEC dbo.CreateNewColumn @TableName = N'RealTable',
                         @ColumnName = N'SomeNumeric',
                         @sql_dtype = N'decimal',
                         @length = 7, --This should be precision and scale
                         @SQL = @SQL OUTPUT;
GO
DECLARE @SQL nvarchar(MAX);
--Good parameters
EXEC dbo.CreateNewColumn @TableName = N'RealTable',
                         @ColumnName = N'SomeNumeric',
                         @sql_dtype = N'numeric',
                         @Precision = 7, --This should be precision and scale
                         @Scale = 2,
                         @SQL = @SQL OUTPUT;

SELECT *
FROM dbo.RealTable;
GO
DROP PROC dbo.CreateNewColumn
DROP TABLE dbo.RealTable
8
  • Here's one for JS: stackoverflow.com/a/16497499/2415524
    – mbomb007
    Commented Oct 1, 2020 at 20:19
  • Quotes containing multiple keywords seem to confuse it. Possibly the same issue as mine - meta.stackexchange.com/a/354944/394486 Commented Oct 4, 2020 at 19:20
  • That’s “by design”: the current SQL definition explicitly only attempts to correctly highlight complete SQL chunks. There’s arguably no good reason for this — it’s a debatable design decision. The same design decision causes the issue observed by @WilliamRobertson. Commented Oct 5, 2020 at 9:10
  • That's possibly ok on sites that include complete code solutions like Code Review, @KonradRudolph , however, Stack Overflow code blocks often don't include full code solutions, as they don't need to. Enforcing code blocks to be a complete, and valid, rather than a snippet is a huge detriment to the syntax highlighting.
    – Larnu
    Commented Oct 5, 2020 at 9:14
  • 1
    @Larnu I obviously agree. My comment is explaining, not defending, the behaviour. Commented Oct 5, 2020 at 9:15
  • Fair enough, @KonradRudolph, I may have misinterpreted the intent of your comment.
    – Larnu
    Commented Oct 5, 2020 at 9:16
  • Let's be clear "by design" doesn't always mean "intentional". I'm not sure if there is a reason the SQL syntax was written the way it is, but I'm not opposed to improving it to be less context aware (which would improve the highlighting of tiny SQL snippets - IF they are flagged as sql). Small snippets of course are always hard to correctly auto-detect. Commented Oct 7, 2020 at 15:01
  • If the syntax highlighting simply "stops" at some point that's likely a deficiency with the grammar ruleset. IE, it's matching the beginning of a rule and never finding the expected end, which results in the remaining content not being highlighted. Commented Oct 7, 2020 at 15:03
5

Questions which do not have any tags associated with any languages do not get their code blocks automatically highlighted at all. Examples:

Preloader is not working on Angular universal SSR app

How to Get a List of Members in a Guild Discord.js

Note that questions will get highlighted if they have at least one tag with a "Highlight Language" in their wiki, even if that language is default - like with regex. Questions with at least one such tag will get their code blocks automatically highlighted. In contrast, a question only with tags like discord.js which has no highlight language (not even default) will not have any code blocks highlighted.

I think when no tags have languages associated with them, the question's code blocks should be highlighted automatically. Maybe remove the difference between the association with default highlighting and a non-existing language association while you're at it, unless it's needed for something. (Or give all tags a default language association.)

All questions should at least have something like

<div style="display:none" id="js-codeblock-lang">default</div>

but it should never be empty, or auto-highlighting won't work:

<div style="display:none" id="js-codeblock-lang"></div>

This issue is pretty similar to a related standalone question: Improving syntax highlighting language auto-detection.

5

Bash highlighting seems to be broken.

echo "$(true)"
echo $(true)

As you can see, the command in the first subshell is not highlighted, presumably due to the quotes, but the second is (no quotes). Both should be highlighted.

Adding a PNG image in case this gets fixed.

enter image description here

3
  • Thanks for the report! This looks like an upstream issue. The language is getting correctly detected as bash, so the issues are due to the highlightjs syntax definitions themselves.
    – Ben Kelly StaffMod
    Commented Oct 6, 2020 at 15:17
  • Correct, currently $(...) inside a string is only matched as a "subst" (special code within a string) there is no further recursive processing for subshells... This is something that should be fixable... Commented Oct 7, 2020 at 15:23
  • After Ben's message, I looked at highlight.js issues, and it looks like a similar problem was already reported. Highlighting for this specific case helps users to notice mistakes (even if they should try their code before submitting an answer ;)). Commented Oct 7, 2020 at 23:19
5

There is something a bit odd with PL/SQL (or SQL - I'm not sure whether PL/SQL is actually supported. It seems to be sadly unpopular with syntax highlighting plugins.)

A quoted SQL statement seems to defeat the quoting, but only when an earlier line ends with a semicolon.

select blah into blahblah from blahblahblah;  -- Semicolon here seems to do it

xxx := 'select select';

Quoting is now reversed.

Looking at other SQL-related issues, I see Syntax highlighting isn't always present in the entire code block also has an example where quoting is broken by a quote that includes a SQL keyword.

Screenshot for posterity:

highlight-js glitch with SQL

The actual post where this came up is here: https://stackoverflow.com/a/64183788/230471

Edit: Marking as Lua seems to work better with quoting:

select blah into blahblah from blahblahblah;  -- This is a comment

xxx := 'select select';

Quoting is not reversed.
5
  • 1
    highlight.js doesn’t support PL/SQL, and Stack Overflow in particular only supports SQL, which is a shame, since highlight.js has a separate definition for pgSQL (which includes PL/pgSQL!) that’s of vastly superior quality to the SQL definition. Commented Oct 5, 2020 at 9:01
  • 1
    We support PL/SQL via PL/pgSQL... if StackOverflow wished to provide that variant, they could. "SQL" is intended to be very basic (although sadly it's currently got MySQL mixed in with it). Really the SQL variants are so different that to get the best results you really need to use one specific to the server variant you're using - which is why they should all be broken out into separate grammars. There are also size concerns (which might be why Stack Overflow just doesn't want to add everything in the kitchen sink)... I'm definitely open to making our "simple" SQL support better. Commented Oct 7, 2020 at 15:13
  • 1
    I'm making a small change to SQL that will always match strings (even if outside an apparent SQL statement) that will prevent examples like those above from being totally broken. github.com/highlightjs/highlight.js/pull/2740 Commented Oct 7, 2020 at 15:41
  • I'm guessing the "MySQL mixed" is why only the first declaration after a declare keyword is highlighted. In PL/SQL declare marks the start of a section containing multiple items and ending with begin. Commented Oct 10, 2020 at 9:15
  • Since PL/SQL is a wrapper for SQL (among other things) perhaps there is a way to include a common SQL grammar and extend it with begin/end/loop etc for PL/SQL. But then I've never written a JavaScript syntax parser so I have no idea what I'm talking about (though I did extend my fork of code-prettify to include PL/SQL. Commented Oct 10, 2020 at 9:16
4

What are we supposed to do if syntax highlighting doesn't turn on at all?

In a question with a very simple code block, with only the tag, this is what I see:

enter image description here

For the block of code:

MapperConfiguration config = new MapperConfiguration( cfg => cfg.CreateMap<Source, Dest>()
    .ForMember( k => k.Sector, opt => opt.MapFrom<MyResolver>() ) );

Mapper.Initialize( config );

The only highlight is on new for some reason. The rest of the c# questions have the proper highlighting for me.
If it makes any difference, using latest Firefox on Windows and no console errors.

1
  • This is the expected behavior. There are no other keywords there to highlight. Commented Oct 18, 2020 at 2:40
4

I noticed that on this question the syntax highlighting for some C++ code stops partway thru.

In particular, it gets tripped up by this bit of code:

template <class T>
ostream& operator<< (ostream& os, const skg::Triplet<T>& p_t) ;
void other_stuff_that_isnt_colored();

If the operator is changed from << to something else, the coloring continues

template <class T>
ostream& operator+(ostream& os, const skg::Triplet<T>& p_t) ;
void other_stuff_that_is_colored();

but the color for the operator keyword is the identifier color, and not the keyword color.

If the template <class T> part is removed the coloring is correct.

ostream& operator<< (ostream& os, const skg::Triplet<T>& p_t) ;
void other_stuff_that_is_colored();
3
4

LaTeX highlight in TikZ environments is wrong.

Look at https://tex.stackexchange.com/a/564540/38080:

enter image description here

It seems that a newline in a macro arguments desynchronize the parser...

Thanks!

PS: could be this:https://github.com/highlightjs/highlight.js/issues/2709 ...

1
  • 2
    The problem is not the newlines, the issue is with the nested braces. After spacing}* the highlighter thinks that it is no longer inside an argument and starts outer-level (i.e., orange) highlighting. Compare \foo{\bar} \foo{{}\bar}. For another clear example see tex.stackexchange.com/questions/564420/….
    – Marijn
    Commented Sep 28, 2020 at 20:14
4

No Objective-C highlighting

I commented to say how disappointing highlighting of Objective C is, but I was told to open a bug as it is not an issue of Highlight.js, but of Stack Overflow not applying it (it applies C highlighting instead, and with that what I see highlighted makes sense).

4
  • 1
    Thanks for the report! This is actually due to the objective-c language being split out from the c-like languages and the tag still pointing to the old language code. I'll get a fix in for this today. Nice catch!
    – Ben Kelly StaffMod
    Commented Oct 5, 2020 at 17:14
  • 2
    This fix is now live. Any post with the objective-c tag will now correctly assign the lang-objectivec syntax instead of lang-c.
    – Ben Kelly StaffMod
    Commented Oct 6, 2020 at 21:06
  • @BenKelly It is better, it now recognises class names and declarations, however it does not highlight methods & properties, so it still looks quite a bit un-highlighted. Could it have a choice of styles perhaps, or is highlight.js just not very good for objective-c?
    – Ecuador
    Commented Oct 7, 2020 at 16:19
  • Traditionally (as a policy) we have not highlighted method dispatch or properties, but we're now more open to this and welcoming PRs in cases where it's simple enough to detect. See github.com/highlightjs/highlight.js/issues/2500 There is work to be done deciding how they should be highlighted, whether we need to add new CSS classes, etc... so if you want to help out or join the discussion (or work on a PR) feel free to do so. :-) Commented Oct 18, 2020 at 2:48
3

There are two problems with Groovy syntax highlighting:

  1. The old syntax was not automatically converted to the new syntax, i.e. thousands of Groovy-related questions and answers lost syntax highlighting.

  2. Groovy syntax highlighting via ```groovy does not work in many cases (e.g. here), only in some.

For details please read this question and its comments.

4
  • Added a comment to the question. Will mark as status-review here as well
    – Ben Kelly StaffMod
    Commented Sep 25, 2020 at 15:00
  • 1
    See response to linked question. Groovy syntax is still not explicitly supported, but it should now at least auto-highlight to whatever similar language highlight.js thinks it might be
    – Ben Kelly StaffMod
    Commented Sep 25, 2020 at 19:40
  • 1
    We do have a grammar for Groovy if you're talking about groovy-lang.org. I assume StackOverflow simply isn't loading that yet? Commented Oct 7, 2020 at 15:14
  • I know that and also mentioned it in my question. It don't know why SO does not just use something that already exists. I also asked in a comment how to go about making a feature request to explicitly support Groovy, but received no answer. Slightly off-topic, IMO syntax highlighting in general (also for Java) became worse after switching the highlighting framework. My own answers are way less readable than they were before, fewer elements are highlighted and where they are they don't stand out much, which also affects Groovy indirectly.
    – kriegaex
    Commented Oct 8, 2020 at 1:19

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .