18
$\begingroup$

New users of a language will spend a lot of their time staring at error messages, and realistically so will the language designers. These messages should be an early consideration, given that adding new information might require extensive compiler changes, and good error messages save time.

The range of message quality seems very wide, from the infamous C++ errors (here shown deliberately cryptic):

Crop from a long C++ error message

to Nushell's eye candy:

Error message in the Nu shell programming language showing various colors, text formatting, and Unicode arrows.

Given the effort that even mainstream languages are spending on this topic, and the warm reception to these improvements, what are the best practices for displaying error messages?


Note that this question is about the user interface of the compiler. Suggesting "report as many errors as possible" is ok, but specifics on how to implement such a compiler are out-of-scope. Syntax and runtime error messages can be included if there are special considerations.

$\endgroup$
5

5 Answers 5

21
$\begingroup$

Unique Error Code

Often overlooked, but very necessary for non-English speakers like me.

Many problems have been solved in detail in mainstream languages, but localized errors are probably not solved in localized search engines.

And with a unified error code, you can use Rust "E0308" to search for the cause of the error and the solution in other languages.

The error code also facilitates downgrading to C language FFI.

Where the Error Occurred

It is best to be a hyperlink, click to jump to the location where the error occurred, which is convenient for debugging.

The format is generally

mopno_project/sub_project/module/code.c:line:col

Note that whenever a path appears, it should be relative to the project root.

Otherwise, it is very easy to leak privacy or secrets.

When the Error Occurred

If it is a runtime error, show the event when the error occurred, using a human-friendly output form, such as RFC3339.

Why the Error Occurred and How to Fix

The main cause of the error needs to be given, so that the programmer can quickly understand the problem.

If possible, give possible fixes, or templates for fixes.

If it is an html environment like jupyter, it can support copying by click.

Simple internal errors IDE will usually prompt you to use quick-fix.

But if it is an external error, the IDE generally does not support automatic repair. For example, if there is a Regex syntax problem, then the library author needs to provide a repair solution.

This method is not as convenient as quick-fix, but at least I haven't encountered a language that can use libraries to define IDE behavior.

Stack Backtrace

In debug mode a stack trace should be generated, printing each step of the call.

I think actually print should also generate a stack trace in debug mode.

But the overhead is huge, so don't use it as the default behavior, and don't print it in release unless specified by the user.

$\endgroup$
2
  • $\begingroup$ The point about fixes is a potentially risky one. For that to actually be useful information, the person writing the error detection needs to be certain about where the error actually is, what the error is, and what the developer actually meant to do. This is easy for syntax errors if you write your parser correctly, but can be very difficult for runtime errors, and if the ‘fix’ is incorrect it’s counterproductive. $\endgroup$ Commented Jul 1, 2023 at 2:31
  • $\begingroup$ One thing I'd note there. Once an error code has been allocated it should never, under any circumstances, be changed: doing so would render all archived documentation including textbooks and all web searches obsolete. In principle, even a change to an error should result in the existing code being retired, i.e. "don't expect to see this after version x.y". All of which makes organising them by topic and keeping them short a tricky undertaking, $\endgroup$ Commented Jul 2, 2023 at 9:48
8
$\begingroup$

To complement Aster's answer:

Include code location excerpt

Saying error_code.cpp:8:89 is useful for tools and links, but the user shouldn't have to search for every code mention. Include an excerpt of the code location, preferably with a bit of context.

Rust example:

 --> src/main.rs:4:48
  |
3 | fn main() {
4 |     let greeting_file = File::open("hello.txt")?;
  |                                                ^

Note the ^, used to highlight the specific location of the error. This is important for long lines, or ambiguous locations.

If possible, include the line where the surrounding function was declared, or at least its name.

Colors are nice, but make sure that the output is understandable without them.

A flag to report errors as structured data

The compiler output might be consumed by automated tools and tests, so it's important to give them an output format that is parsable and stable.

See Rust's -error-format={human,json,short} CLI option.

List as many errors as possible

It's tempting to stop parsing/compilation when the first error is found, but this leads to a frustrating dev experience. Include as many errors as possible in the report, taking care to avoid duplication.

Help typos by suggesting similar names

If the compilation error is that variable "user_pos" is not defined, and there's a variable userpos in that scope, that's worth mentioning in the error message. Levenshtein distance provides an easy-to-compute way of finding the similarity of two strings, be they variables, functions, or attributes.

$\endgroup$
7
$\begingroup$

It depends on the language

When running with a debugger, or the language is interpreted and in interactive mode, less information is necessary upfront. For example, there's no need to dump the stack to screen, or what the source file was, if this information is available upon request. In my opinion, it is better to keep the initial information clean and concise, with just enough information to tell me what went wrong and where. As an example, take APL, which (for over half a century!) has been printing a brief error message, the function name and line number, with the code line, and an indicator as to where on the line the error happened:

      foo
DOMAIN ERROR
foo[42] 1÷0
         ∧

Newer systems add a bit more help in the message:

DOMAIN ERROR: Divide by zero
foo[42] 1÷0
         ∧

Sometimes, it pinpoints which part of the data was bad:

DOMAIN ERROR: JSON export: item "bad[4]" of the right argument (⎕IO=1) cannot be converted
      1 ⎕JSON ns
        ∧

Or conveys OS errors:

FILE NAME ERROR: /tmp/badfile.txt: Unable to open file ("The system cannot find the file specified.")

Either way, the programmer now navigate up and down the stack, and ask for more information about the error from a special "extended diagnostic message" object. This can contain the specific error codes, who is responsible for the erroing code, the source location in the implementation language, and even a URL where the user can get help.

$\endgroup$
4
$\begingroup$

Recognise that source code might trigger a compiler internal error

The message should clarify if it is an internal error in the compiler, .v.s. an error in the user's source code.

While most errors are expected to be in the user's source code, sometimes the input causes an internal compiler error which needs to be fed back to the compiler writer.

An internal error in a compiler could potentially be caused by invalid or valid source code.

The Texas Instruments compilers give a message to contact customer support if an internal error occurs. Some examples are:

  1. Incorrect use of _symval causes TI MSP430 v15.12.2.2.LTS internal compiler error which was caused by an error in the source code.
  2. Compiler/TMS320F28377D which looking at the compiler bug report Using --vcu_support may lead to INTERNAL ERROR: Register allocation failed was caused by valid source code.

Another example is AdaCore documenting in GNAT Abnormal Termination or Failure to Terminate how a user can investigate the language construct which triggers an internal error in the compiler.

$\endgroup$
3
$\begingroup$

As someone who has to read a lot of errors, the most useful words are a severity indicator in one line.

  • DEBUG: ....details....
  • INFO: blah
  • WARNING: blah
  • ERROR: but continuing
  • ERROR: but continuing (but with significant component disabled)
  • FATAL ERROR: terminating (and why)

Don't confuse an error message with a backtrace - they're different because an error should show succinctly what's wrong, and the backtrace is more "how we got to this point"

(aside) Yes these severity levels map onto syslog's hierarchy tree. That's a well defined and understood scale which can be applied to any importance-ranking.

(aside2) Why should all the important info one line? Log analyser software has an easier job without having to keep state of multiple logging lines/output.

$\endgroup$
4
  • 2
    $\begingroup$ How often are compiler errors going into a syslog rather than being read immediately on the console output, though? $\endgroup$
    – kaya3
    Commented Jul 1, 2023 at 1:16
  • $\begingroup$ @kaya3-supportthestrike the syslog reference was just about the well-known relative scale of "debug-info-warn-error"... etc I didn't mean that running syslog itself was part of the answer. $\endgroup$
    – Criggie
    Commented Jul 1, 2023 at 1:26
  • $\begingroup$ @kaya3-supportthestrike I don't think this question is specific to compilers, and for interpreted languages, you could easily have errors popping up in something running in the background logging to something other than a console $\endgroup$ Commented Jul 1, 2023 at 3:17
  • $\begingroup$ @RydwolfPrograms The question says: "Note that this question is about the user interface of the compiler." That said, you have a point, but I still think this answer needs to be clarified, since the part about log analyser software is probably not relevant to compilers. $\endgroup$
    – kaya3
    Commented Jul 1, 2023 at 13:42

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .