142

Prettify is used to make code snippets look good, but unfortunately the current version used by Stack Overflow seems quite old and many language keywords aren't supported (e.g. async and yield in C#). Since support for these keywords was supposedly added over a year ago, could we please get the Prettify lib updated so they get highlighted?

(Unfortunately it seems Prettify is slightly dead in regards that there are no new releases since 2013, but the repo has been updated since then, so pulling it and running the grunt task should give the requisite minified JS file.)

11
  • 90
    I'd rather see SO move to a new library. One that is a) maintained, b) extensible (add support for new languages easily) and c) small and fast. See the feedback Oded gave last time someone suggested Highlighter.js. Commented Jul 30, 2018 at 12:13
  • 50
    I'm thinking we should see if Prism can meet those criteria. Library size would appear to be comparable; core plus the 32 currently supported language tags, gzip compressed is about 18kb. If rendering speed is comparable then we’d have a viable replacement! Commented Jul 30, 2018 at 12:14
  • I wonder if they can just host it on cdnjs.
    – Braiam
    Commented Jul 30, 2018 at 13:23
  • 1
    Would prefer to keep Prettify, it might not be actively worked on but they accept signed commits in pull requests.
    – user692942
    Commented Jul 30, 2018 at 19:59
  • @MartijnPieters Anecdotal, I know, but I'm using Prism and it seems just as "fast" as any other highlighter I've used.
    – Tieson T.
    Commented Jul 30, 2018 at 19:59
  • @MartijnPieters do you think your suggestion would be better as an answer here or a separate FR? The upvotes on your comments would make it seem worthwhile. Commented Jul 30, 2018 at 21:41
  • 1
    @AndrasDeak: not much time to create an FR, but that's what I've been thinking. I'd need to conduct some performance comparisons. Commented Jul 30, 2018 at 23:01
  • 3
    The worst thing about prettify is that it isn't even possible to determine what the latest "release" is... Apart from that, there are more or less regularly commits / PRs merged. Commented Jul 31, 2018 at 7:26
  • 1
    In C#, more than just the coloring of new keywords is broken: well-formed @ and $ prefixed string literals can cause the highlighter to treat all of the following text as a string literal like here before the edit Commented Jul 31, 2018 at 11:32
  • I'm really curious how much traffic the library actually accounts for. I get that bandwidth is important, but surely the cache time on this stuff is a week/a month?
    – mbrig
    Commented Jul 31, 2018 at 14:54
  • 1

5 Answers 5

79

Stack Overflow should start its own fork of code-prettify, or move to a different solution.

  • Most pull requests don't get reviewed or merged in the Google repository anymore.
  • There are more missing C# keywords, as seen by my own pending pull request on the matter made on 24 Mar 2018.
  • Google's Security Engineering team has a process requirement that makes it impossible to deal with parallel pull requests because they require submitters to update a distribution ZIP file themselves, which consequently will always be in conflict with other pull requests. And I wonder if they really unzip the file for each review or if it's room for a Trojan horse.
1
  • why C# is not in a specific file like other language and C and JAVA and ... Oo my PR for Rust already been merge cause of that I think
    – Stargateur
    Commented Apr 5, 2019 at 15:17
14

Whatever update or replacement is chosen, please make sure it deals well with dark corners of the supported languages, because we do get questions about those! Examples that the existing highlighter gets wrong:

var abc = 3; // vars in most languages get differently-coloured
var Def = 6; // names if they start with an uppercase letter
// line comments in the C family \
   are continued by backslash-newline
\u002f\u002f can be used to start a comment in Java
// and \u000a can be used to end one
struct Tree<'id>(Option<(Index<'id>, Index<'id>)>);

(Please edit this to add more examples if you know them.)

8
  • 3
    I'm not sure that I'd consider case 1 wrong in general. The highlighter doesn't have enough information to definitively know if something is a type or variable, in languages like C# where the convention is classes start with a capital and variables with a lower case that behavior is a reasonable best guess attempt IMO even if the existing of properties (used like variables by typically capitalized) does mean that there's often unavoidable breakage. Commented Jul 31, 2018 at 15:32
  • 13
    If you're attempting to list the issues with current code-prettify, you'll likely find them at github.com/google/code-prettify/issues.
    – Cœur
    Commented Jul 31, 2018 at 17:23
  • 2
    Try these out with Prism.js; the first is solved, the second and third I have never seen handled properly by syntax highlighters, prism.js included. But you could try your luck with a bug report for prism.js, I suppose. Commented Aug 1, 2018 at 9:19
  • 3
    Don’t ask for these corner cases, when not even the general case is handled, i.e. the highlighted keywords do not reflect the selected programming language at all. E.g. in Java code, you still get auto, delete, extern, union, inline, operator, register, signed, sizeof, struct, typedef, and unsigned highlighted as keywords. And while it simply considers names starting with lowercase letters to be variable names and names starting with uppercase letters to be type names, it takes the extra effort to treat, e.g. wchar_t as a type. In Java. Very helpful…
    – Holger
    Commented Aug 1, 2018 at 13:26
  • 1
    @Holger I feel that we should be looking for accurate handling of corner cases in the syntax highlighter for Stack Overflow, because we do get questions about them and it helps. (FWIW current versions of both emacs and vim get case 2 right, albeit not case 3.)
    – zwol
    Commented Aug 1, 2018 at 16:56
  • @zwol don’t get me wrong. I’d really like a formally correct syntax highlighter, I even wrote one of my own for highlighting Java source code on web pages and it doesn’t take that much effort. However, for site owners who don’t even care to highlight the correct keywords, despite copying it from the specification takes a few seconds, correct syntax highlighting doesn’t seem to have a high priority…
    – Holger
    Commented Aug 2, 2018 at 10:19
  • 1
    @zwol Doing that basically requires writing a compiler for that language. That's many orders of magnitude more work than a few regexes that use some common heuristics to make a reasonable guess. As much as getting it exactly right is useful, it's infeasible difficult in most context. Even the performance costs of computing it, even if you already have all of the code, would be a real issue in the context of an SO post. There's also the fact that users are often posting snippets on SO, so you'll never get those right 100% of the time; you don't have enough information.
    – Servy
    Commented Aug 16, 2018 at 19:27
  • see my answer/comment
    – Amro
    Commented Aug 16, 2018 at 19:33
5

If this does get changed, is there any chance of getting support for fenced code blocks in the same manner as GitHub? The required 4 spaces of indentation per line can become difficult to edit when the original code uses tabs and was copy-pasted.

Also, our current list of supported languages is rather lacking when compared to GitHub... maybe it could be expanded?

1
  • 3
    Fenced code blocks would be a feature of the Markdown engine, not the code highlighter; this should probably be a separate feature request. I endorse the idea, though.
    – zwol
    Commented Aug 2, 2018 at 0:26
-3

This was meant as a comment to @zwol post, but became too long for one...


You seem to be missing the point of code-prettify (similarly for prism.js for that matters). It is a regex-based syntax highlighter, it does not implement a full parser for each language it supports. This is what makes it lightweight and fast, and by default works passably for a range of languages even ones not explicitly defined in the library (should work in a best-effort way for most C-like or HTML-like languages). But it also means it can fail in certain cases that only a proper parser could handle (think contextual keywords, or constructs like string interpolation difficult to do right with regexps alone). Not to mention languages notoriously difficult to parse (say Perl).

Keep in mind that in code-prettify there's a "default" handler (selected when you don't specify a language, or a set a non-existing one), plus variations in the core lib for languages like "lang-cpp", "lang-java", "lang-py", etc. They are similar but not the same.

In fact if you look at the implementation, C/C#/Java/Python/JS/etc. are all handled by the same function sourceDecorator which accepts options to customize things like:

  • list of keywords
  • type of comments (hash or c-style)
  • type of strings (single/multi-line, verbatim, tripled-quoted, etc.) and such.

Even then the list of keywords is cleverly built by starting with common ones, and adding each time language-specific ones on top, in order to keeps the library size to a minimum. As a result, some keywords would be missed or falsely highlighted.

You'll also find a rule in the common highlighter where it considers identifiers that start with a capital letter as a type (PR_TYPE styled differently). Again this was an ad-hoc rule following the CamelCase naming convention. Short of actually parsing the code to properly detect class/type names, this is a cheap alternative.

While you can start adding code to handle special cases fixing specific languages, it will certainly start to increase the code size. You might say let's just do this for C# or insert-you-favorite-language, but then where do you draw the line?

The same idea applies to the whole library. It was designed with the goal that it should work decently for most languages using just the base lib prettify.js. And yet it allows you to extend it by registering extra handlers lang-xxx.js for languages where you want more control at the expense of bigger size (more JS to download).

If you look in the directory of contributed modules, you'll find handlers for languages which the base lib mostly fail on (say lisp-like langs). But you'll also find somewhat redundant handlers for languages that the base lib can actually deal with by default (to a degree), but their authors felt they wanted a more specific/correct handling of said language syntax. The biggest offender here is the list of keywords.

For a site like Stack Overflow where you don't want a bloat of JS files, it's up to SO to pick and choose what extra language handlers to include...

3
  • 1
    ... I'm not sure I believe that an accurate syntax highlighter has to be huge, but even if it did, that could be rendered completely irrelevant by simply running the highlighting process on the server, which would be a good idea anyway, so, excuse not accepted :-P
    – zwol
    Commented Aug 16, 2018 at 19:41
  • Excuse? I was not arguing in favor of or against what library to use on Stack Overflow. I'm simply explaining some of the design choices of code-prettify (which predates SO), since there seem to be a lot of confusion about it. They are free to choose whatever library they want or implement their own, even server-side (like GitHub)...
    – Amro
    Commented Aug 16, 2018 at 19:56
  • Also, SO snipped are not big, because they should be minimal, so I don't think it' would be very slow to render even with a "slow" true parser for each language. we are talking about 10-30 lines to parse by post
    – Stargateur
    Commented Apr 5, 2019 at 16:14
-48

Bring in Monaco! Really fast and could open for more features in the future. https://microsoft.github.io/monaco-editor/

7
  • 21
    At least don't make the post look like spam .....?
    – user202729
    Commented Jul 31, 2018 at 6:37
  • 21
    Also consider this line for mobile SO view: "The Monaco editor is not supported in mobile browsers or mobile web frameworks.". Commented Jul 31, 2018 at 7:14
  • 14
    It appears to be 7mb in size when minified! Here's the previous post where 22kb is too big ... meta.stackexchange.com/questions/278141/…
    – user310988
    Commented Jul 31, 2018 at 7:41
  • 40
    We are talking about code highlighting of code examples in posts, not a full-blown editor. Commented Jul 31, 2018 at 8:14
  • 4
    @user202729 How on earth does this look like spam?
    – user247702
    Commented Jul 31, 2018 at 13:14
  • 25
    @Stijn Bring in Keto Plus Diet! Really fast and could lose you more weight in the future. [Link to a supplement site]
    – iBug
    Commented Jul 31, 2018 at 15:10
  • 5
    The answerer doesn't seem affiliated with Monaco so this must be spam! Commented Jul 31, 2018 at 20:05

Not the answer you're looking for? Browse other questions tagged .