251

A lot of people claim that "comments should explain 'why', but not 'how'". Others say that "code should be self-documenting" and comments should be scarce. Robert C. Martin claims that (rephrased to my own words) often "comments are apologies for badly written code".

My question is the following:

What's wrong with explaining a complex algorithm or a long and convoluted piece of code with a descriptive comment?

This way, instead of other developers (including yourself) having to read the entire algorithm line by line to figure out what it does, they can just read the friendly descriptive comment you wrote in plain English.

English is 'designed' to be easily understood by humans. Java, Ruby or Perl, however, have been designed to balance human-readability and computer-readability, thus compromising the human-readability of the text. A human can understand a piece of English much faster that he/she can understand a piece of code with the same meaning (as long as the operation isn't trivial).

So after writing a complex piece of code written in a partly human-readable programming language, why not add a descriptive and concise comment explaining the operation of the code in friendly and understandable English?

Some will say "code shouldn't be hard to understand", "make functions small", "use descriptive names", "don't write spaghetti code".

But we all know that's not enough. These are mere guidelines - important and useful ones - but they do not change the fact that some algorithms are complex. And therefore are hard to understand when reading them line by line.

Is it really that bad to explain a complex algorithm with a few lines of comments about it's general operation? What's wrong with explaining complicated code with a comment?

22
  • 15
    If it's that convoluted, try refactoring it to smaller pieces. Commented Sep 1, 2014 at 0:37
  • 153
    In theory, there is no difference between theory and practice. In practice, there is. Commented Sep 1, 2014 at 2:13
  • 5
    @mattnz: more directly, at the time you write the comment you are steeped in the problem this code solves. Next time you visit, you will have less capability with this problem. Commented Sep 1, 2014 at 9:13
  • 27
    "What" the function or method do should be obvious from its name. How it does it is obvious from its code. Why is it done this way, what implicit assumptions were used, which papers one need to read in order to understand the algorithm, etc. - should be in comments.
    – SK-logic
    Commented Sep 1, 2014 at 9:36
  • 11
    I feel many of the responses below are purposefully misinterpreting your question. There's nothing wrong with commenting your code. If you feel you need to write an explanatory comment, then you need to.
    – Tony Ennis
    Commented Sep 1, 2014 at 13:48

16 Answers 16

421

In layman's terms:

  • There's nothing wrong with comments per se. What's wrong is writing code that needs those kind of comments, or assuming that it's OK to write convoluted code as long as you explain it friendly in plain English.
  • Comments don't update themselves automatically when you change the code. That's why often times comments are not in sync with code.
  • Comments don't make code easier to test.
  • Apologizing is not bad. What you did that requires apologizing for (writing code that isn't easily understandable) is bad.
  • A programmer that is capable of writing simple code to solve a complex problem is better than one that writes complex code and then writes a long comment explaining what his code does.

Bottom line:

Explaining yourself is good, not needing to do so is better.

28
  • 96
    It's frequently impossible to justify spending employer's money rewriting code to be more self-explanatory, when a good comment can do the job in much less time. A dutiful programmer must use her/his judgment each time.
    – aecolley
    Commented Sep 1, 2014 at 1:27
  • 35
    @aecolley Writing self-explanatory code to begin with is better yet. Commented Sep 1, 2014 at 1:30
  • 133
    Sometimes self-explanatory code isn't efficient enough to solve a problem with today's HW&SW. And business logic is notoriously ... twisty. The subset of problems that have elegant software solutions is significantly smaller than the set of problems that are economically useful to solve. Commented Sep 1, 2014 at 2:16
  • 66
    @rwong: conversely I often find myself writing more comments in business logic, because it's important to show exactly how the code lines up with the stated requirements: "this is the line that prevents us all going to jail for wire fraud under Section Whatever of the penal code". If it's just an algorithm, well, a programmer can figure the purpose out from scratch if absolutely necessary. For business logic you need a lawyer and the client in the same room at the same time. Possibly my "common sense" is in a different domain from the average app programmer's ;-) Commented Sep 1, 2014 at 9:17
  • 29
    @user61852 Except that what's self-explanatory to the you that just wrote that code and spent the last $period immersed in it might not be self-explanatory to the you that has to maintain or edit it five years from now, let alone all the possible people that aren't you that may have to look at it. "Self-explanatory" is a nebulous holy grail of definitions. Commented Sep 1, 2014 at 9:38
112

There's a bunch of different reasons for code to be complicated or confusing. The most common reasons are best addressed by refactoring the code to make it less confusing, not by adding comments of any kind.

However, there are cases where a well-chosen comment is the best choice.

  • If it is the algorithm itself that is complicated and confusing, not just its implementation—the kind that get written up in math journals and are ever after referred to as Mbogo's Algorithm—then you put a comment at the very beginning of the implementation, reading something like "This is Mbogo's Algorithm for refrobnicating widgets, originally described here: [URL of paper]. This implementation contains refinements by Alice and Carol [URL of another paper]." Don't try to go into any more detail than that; if someone needs more detail they probably need to read the entire paper.

  • If you have taken something that can be written as one or two lines in some specialized notation and expanded it out into a big glob of imperative code, putting those one or two lines of specialized notation in a comment above the function is a good way to tell the reader what it's supposed to do. This is an exception to the "but what if the comment gets out of sync with the code" argument, because the specialized notation is probably much easier to find bugs in than the code. (It's the other way around if you wrote a specification in English instead.) A good example is here: https://dxr.mozilla.org/mozilla-central/source/layout/style/nsCSSScanner.cpp#1057 ...

    /**
     * Scan a unicode-range token.  These match the regular expression
     *
     *     u\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})?
     *
     * However, some such tokens are "invalid".  There are three valid forms:
     *
     *     u+[0-9a-f]{x}              1 <= x <= 6
     *     u+[0-9a-f]{x}\?{y}         1 <= x+y <= 6
     *     u+[0-9a-f]{x}-[0-9a-f]{y}  1 <= x <= 6, 1 <= y <= 6
    
  • If the code is straightforward overall, but contains one or two things that look excessively convoluted, unnecessary, or just plain wrong, but have to be that way because of reasons, then you put a comment immediately above the suspicious-looking bit, in which you state the reasons. Here's a simple example, where the only thing that needs explaining is why a constant has a certain value.

    /* s1*s2 <= SIZE_MAX if s1 < K and s2 < K, where K = sqrt(SIZE_MAX+1) */
    const size_t MUL_NO_OVERFLOW = ((size_t)1) << (sizeof(size_t) * 4);
    if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
        nmemb > 0 && SIZE_MAX / nmemb < size)
      abort();
    
19
  • 28
    That's an outrage, 4 should be CHAR_BIT / 2 ;-) Commented Sep 1, 2014 at 9:06
  • @SteveJessop: Would anything preclude an implementation where CHAR_BITS was 16 and sizeof(size_t) was 2, but the maximum value of size_t was e.g. 2^20 [size_t containing 12 padding bits]?
    – supercat
    Commented Sep 2, 2014 at 13:29
  • 3
    @supercat I don't see anything that obviously precludes that in C99, which means that example is technically incorrect. It happens to be taken from (a slightly modified version of) OpenBSD's reallocarray, and OpenBSD generally does not believe in catering to possibilities that don't happen in their ABI.
    – zwol
    Commented Sep 2, 2014 at 13:37
  • 3
    @Zack: If the code is designed around POSIX assumptions, using CHAR_BITS might give the impression that the code could work with values other than 8.
    – supercat
    Commented Sep 2, 2014 at 13:45
  • 2
    @Zack: For exact-width unsigned types to be useful, their semantics would need to be defined independent of the size of int. As it is, given uint32_t x,y,z;, the meaning of (x-y) > z depends upon the size of int. Further, a language designed for writing robust code should allow programmers to distinguish between a type where computations are expected to exceed the range of the type and should silently wrap, versus one where computations exceeding the range of the type should trap, versus one where computations aren't expected to exceed the range of the type, but...
    – supercat
    Commented Sep 3, 2014 at 16:04
62

So what's wrong with explaining complicated code with a comment?

It's not a question of right or wrong, but of the 'best practice', as defined in Wikipedia article:

A best practice is a method or technique that has consistently shown results superior to those achieved with other means, and that is used as a benchmark.

So the best practice is to try to improve the code first, and to use English if that is not possible.

It's not a law, but it's much more common to find commented code that requires refactoring than refactored code that requires comments, the best practice reflects this.

3
  • 44
    +1 for "it's much more common to find commented code that requires refactoring than refactored code that requires comments"
    – Brandon
    Commented Sep 1, 2014 at 1:42
  • 8
    Okay, but how often is that comment: //This code seriously needs a refactor Commented Sep 5, 2014 at 11:21
  • 2
    Of course, any so-called best practice not backed up by a rigorous scientific study is merely an opinion.
    – Blrfl
    Commented Feb 7, 2015 at 12:57
59

A day will come when your beautiful, perfectly crafted, well structured and readable code won't work. Or it won't work well enough. Or a special case will arise where it doesn't work and needs adjusting.

At that point, you will need to do something that changes things so it works correctly. Particularly in the case where there are performance problems, but also often in scenarios where one of the libraries, APIs, web services, gems or operating systems you are working with doesn't behave as expected, you can end up making suggestions that are not necessarily inelegant, but are counter-intuitive or non-obvious.

If you don't have some comments to explain why you have chosen that approach there is a very good chance that someone in future ( and that someone may even be you ) will look at the code, see how it could be "fixed" to something more readable and elegant and inadvertently undo your fix, because it doesn't look like a fix.

If everyone always wrote perfect code then it would be obvious that code that looks imperfect is working around some tricky intervention from the real world, but that isn't how things work. Most programmers often write confusing or somewhat tangled code so when we encounter this it is a natural inclination to tidy it up. I swear my past self is an actual idiot whenever I read old code I have written.

So I don't think of comments as an apology for bad code, but maybe as an explanation for why you didn't do the obvious thing. Having // The standard approach doesn't work against the 64 bit version of the Frobosticate Library will allow future developers, including your future self, to pay attention to that part of the code and test against that library. Sure, you might put the comments in your source control commits too, but people will only look at those after something has gone wrong. They will read code comments as they change the code.

People who tell us that we should always be writing theoretically perfect code are not always people with a lot of experience of programming in real-world environments. Sometimes you need to write code that performs to a certain level, sometimes you need to interoperate with imperfect systems. That doesn't mean that you can't do this in elegant and well written ways, but non-obvious solutions need explanation.

When I am writing code for hobby projects that I know nobody else will ever read I still comment parts that I find confusing - for example, any 3D geometry involves maths which I'm not entirely at home with - because I know when I come back in six months I will have totally forgotten how to do this stuff. That's not an apology for bad code, that's an acknowledgement of a personal limitation. All I would do by leaving it uncommented is create more work for myself in future. I don't want my future self to have to relearn something unnecessarily if I can avoid it now. What possible value would that have?

7
  • 5
    @Christian is it? The first line references that statement, certainly, but beyond that it is a little broader as I understand it.
    – glenatron
    Commented Sep 1, 2014 at 13:11
  • 9
    "I swear my past self is an actual idiot whenever I read old code I have written." Four years into my development career and I find this is an occurrence that happens whenever I look at anything older than 6 months or so.
    – Ken
    Commented Sep 4, 2014 at 18:18
  • 6
    In many cases, the most informative and useful historical information relates to things which are considered but decided against. There are many cases where someone chooses approach X for something and some other approach Y would seem better; in some of those cases, Y will "almost" work better than X, but turn out to have some insurmountable problems. If Y was avoided because of those problems, such knowledge can help prevent others from wasting their time on unsuccessful attempts to implement approach Y.
    – supercat
    Commented Sep 4, 2014 at 19:39
  • 4
    On a day to day basis I use work in progress comments a lot too- they aren't there for the long run, but dropping in a TODO note or a short section to remind me what I was going to do next can be a useful reminder in the morning.
    – glenatron
    Commented Sep 5, 2014 at 9:08
  • 1
    @Lilienthal, I don't think that last para is restricted to personal projects—he said "...I still comment parts that I find confusing."
    – Wildcard
    Commented Nov 30, 2015 at 18:38
31

The need for comments is inversely proportional to the abstraction level of the code.

For example, Assembly Language is, for most practical purposes, unintelligible without comments. Here's an excerpt from a small program that calculates and prints terms of the Fibonacci series:

main:   
; initializes the two numbers and the counter.  Note that this assumes
; that the counter and num1 and num2 areas are contiguous!
;
    mov ax,'00'                     ; initialize to all ASCII zeroes
    mov di,counter                  ; including the counter
    mov cx,digits+cntDigits/2       ; two bytes at a time
    cld                             ; initialize from low to high memory
    rep stosw                       ; write the data
    inc ax                          ; make sure ASCII zero is in al
    mov [num1 + digits - 1],al      ; last digit is one
    mov [num2 + digits - 1],al      ; 
    mov [counter + cntDigits - 1],al

    jmp .bottom         ; done with initialization, so begin

.top
    ; add num1 to num2
    mov di,num1+digits-1
    mov si,num2+digits-1
    mov cx,digits       ; 
    call    AddNumbers  ; num2 += num1
    mov bp,num2         ;
    call    PrintLine   ;
    dec dword [term]    ; decrement loop counter
    jz  .done           ;

    ; add num2 to num1
    mov di,num2+digits-1
    mov si,num1+digits-1
    mov cx,digits       ;
    call    AddNumbers  ; num1 += num2
.bottom
    mov bp,num1         ;
    call    PrintLine   ;
    dec dword [term]    ; decrement loop counter
    jnz .top            ;
.done
    call    CRLF        ; finish off with CRLF
    mov ax,4c00h        ; terminate
    int 21h             ;

Even with comments, it can be quite complicated to grok.

Modern Example: Regexes are often very low abstraction constructs (lower case letters, number 0, 1, 2, new lines, etc). They probably need comments in the form of samples (Bob Martin, IIRC, does acknowledge this). Here is a regex that (I think) should match HTTP(S) and FTP URLs:

^(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|m
+il|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.
+\,\;\?\'\\\+&amp;%\$#\=~_\-]+))*$

As the languages progress up the abstraction hierarchy, the programmer is able to use evocative abstractions (variable name, function names, class names, module names, interfaces, callbacks, etc) to provide built-in documentation. To neglect to take advantage of this, and use comments to paper over it is lazy, a disservice to and disrespectful of the maintainer.

I am thinking of Numerical Recipes in C translated mostly verbatim to Numerical Recipes in C++, which I infer began as Numerical Recipes (in FORTAN), with all the variables a, aa, b, c, cc, etc maintained through each version. The algorithms may have been correct, but they did not take advantage of the abstractions the languages provided. And they p*** me off. Sample from a Dr. Dobbs article - Fast Fourier Transform:

void four1(double* data, unsigned long nn)
{
    unsigned long n, mmax, m, j, istep, i;
    double wtemp, wr, wpr, wpi, wi, theta;
    double tempr, tempi;

    // reverse-binary reindexing
    n = nn<<1;
    j=1;
    for (i=1; i<n; i+=2) {
        if (j>i) {
            swap(data[j-1], data[i-1]);
            swap(data[j], data[i]);
        }
        m = nn;
        while (m>=2 && j>m) {
            j -= m;
            m >>= 1;
        }
        j += m;
    };

    // here begins the Danielson-Lanczos section
    mmax=2;
    while (n>mmax) {
        istep = mmax<<1;
        theta = -(2*M_PI/mmax);
        wtemp = sin(0.5*theta);
        wpr = -2.0*wtemp*wtemp;
        wpi = sin(theta);
        wr = 1.0;
        wi = 0.0;
        for (m=1; m < mmax; m += 2) {
            for (i=m; i <= n; i += istep) {
                j=i+mmax;
                tempr = wr*data[j-1] - wi*data[j];
                tempi = wr * data[j] + wi*data[j-1];

                data[j-1] = data[i-1] - tempr;
                data[j] = data[i] - tempi;
                data[i-1] += tempr;
                data[i] += tempi;
            }
            wtemp=wr;
            wr += wr*wpr - wi*wpi;
            wi += wi*wpr + wtemp*wpi;
        }
        mmax=istep;
    }
}

As a special case about abstraction, every language has idioms / canonical code snippets for certain common tasks (deleting a dynamic linked list in C), and regardless of how they look, they shouldn't be documented. Programmers should learn these idioms, as they are unofficially part of the language.

So the take away: Non-idiomatic code built from low-level building blocks that can't be avoided needs comments. And this is necessary WAAAAY less than it happens.

6
  • 2
    Nobody should really be writing a line like this in assembly language: dec dword [term] ; decrement loop counter. On the other hand, what your assembly language example is missing is a comment before each "code paragraph" explaining what the next block of code does. In that case, the comment would typically be equivalent to a single line in pseudocode, such as ;clear the screen, followed by the 7 lines it actually takes to clear the screen. Commented Sep 2, 2014 at 16:57
  • 2
    Yes, there are what I would consider some unnecessary comments in the assembly sample, but to be fair, it is pretty representative of 'Good' Assembly style. Even with a one or two line paragraph prologue, the code would be really hard to follow. I understood the ASM sample better than the FFT example. I programmed an FFT in C++ in grad school, and it didn't look anything like this, but then we were using the STL, iterators, functors an quite a few method calls. Not as fast as the monolithic function, but a lot easier to read. I will try to add it to contrast to the NRinC++ sample.
    – Kristian H
    Commented Sep 2, 2014 at 17:09
  • Did you mean ^(((ht|f)tps?)\:\/\/)?(www\.)*[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(\/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*$ ? Be aware of numeric addresses.
    – izabera
    Commented Sep 4, 2014 at 13:51
  • More or less my point: some things built from very low level abstractions aren't easy to read or verify. Comments (and, not to get too far off track, TESTS) can be useful, and not a detriment. At the same time, not using higher level abstractions that are available (:alpha: :num: where available) makes it harder to understand, even with good comments, than using the higher level abstractions.
    – Kristian H
    Commented Sep 4, 2014 at 15:18
  • 3
    +1 : "The need for comments is inversely proportional to the abstraction level of the code." Pretty much sums up everything right there.
    – Gerrat
    Commented Sep 4, 2014 at 21:16
22

I don't believe there's anything wrong with comments in code. The idea that comments are somehow bad in my opinion is due to some programmers taking things too far. There's a lot of bandwagoning in this industry, particularly towards extreme views. Somewhere along the way commented code became equivalent to bad code and I'm not sure why.

Comments do have problems - you need to keep them updated as you update the code they refer to, which happens far too infrequently. A wiki or something is a more appropriate resource for thorough documentation about your code. Your code should be readable without requiring comments. Version control or revision notes should be where you describe code changes you made.

None of the above invalidates the use of comments, however. We don't live in an ideal world so when any of the above fail for whatever reason, I'd rather have some comments to fall back.

19

I think you're reading a little bit too much in to what he's saying. There are two distinct parts to your complaint:

What's wrong with explaining (1) a complex algorithm or (2) a long and convoluted piece of code with a descriptive comment?

(1) is inevitable. I don't think that Martin would disagree with you. If you're writing something like the fast inverse square root, you're going to need some comments, even if it's just "evil floating point bit level hacking." Barring something simple like a DFS or binary search, it's unlikely that the person reading your code will have experience with that algorithm, and so I think there should be at least a mention in the comments about what it is.

Most code isn't (1), however. Rarely will you write a piece of software that's nothing but hand-rolled mutex implementations, obscure linear algebra operations with poor library support, and novel algorithms known only to your company's research group. Most code consists of library/framework/API calls, IO, boilerplate, and unit tests.

This is the kind of code that Martin is talking about. And he addresses your question with the quote from Kernighan and Plaugher at the top of the chapter:

Don’t comment bad code—rewrite it.

If you have long, convoluted sections in your code, you have failed to keep your code clean. The best solution to this problem isn't to write a paragraph-long comment at the top of the file to help future developers muddle through it; the best solution is to rewrite it.

And this is exactly what Martin says:

The proper use of comments is to compensate for our failure to express ourself in code...Comments are always failures. We must have them because we cannot always figure out how to express ourselves without them, but their use is not a cause for celebration.

This is your (2). Martin agrees that long, convoluted code does need comments -- but he puts the blame for that code on the shoulders of the programmer who wrote it, not some nebulous idea that "we all know that's not enough." He argues that:

Clear and expressive code with few comments is far superior to cluttered and complex code with lots of comments. Rather than spend your time writing the comments that explain the mess you’ve made, spend it cleaning that mess.

8
  • 3
    If a developer I was working with simply wrote "evil floating point bit level hacking" to explain the fast square-root algorithm - they'd get a talking to by me. So long as they included a reference to somewhere more useful I'd be happy though. Commented Sep 1, 2014 at 3:56
  • 9
    I disagree in one way - a comment explaining how something bad works is a lot quicker. Given some code that is likely not to be touched again (most code I guess) then a comment is a better business solution than a big refactoring, that often introduces bugs (as a fix that kills relied-upon bug is still a bug). A perfect world of perfectly understandable code is not available to us.
    – gbjbaanb
    Commented Sep 1, 2014 at 7:42
  • 2
    @trysis haha, yes but in a world where the programmers are responsible and not businesspeople, they'll never ship as they're forever gold-plating a constantly refactored codebase in a vain quest for perfection.
    – gbjbaanb
    Commented Sep 1, 2014 at 14:26
  • 4
    @PatrickCollins nearly everything I read on the web is about doing it right first time. Almost nobody wants to write articles on fixing up messes! Physicists say "given a perfect sphere..." Comp.Scientists say "given a greenfield development..."
    – gbjbaanb
    Commented Sep 2, 2014 at 7:26
  • 2
    The best solution is to rewrite it given infinite time; but given someone else's code base, typical corporate deadlines, and reality; sometimes the best thing to do is comment, add a TODO: Refactor and get that refactor into the next release; and that fix that needed to be done yesterday done now. The thing about all of this idealistic talk about just refactoring is it doesn't account for how things really work in the work place; sometimes there are higher priorities and soon enough deadlines that will preempt fixing legacy poor-quality code. That's just how it is.
    – hsanders
    Commented Sep 4, 2014 at 18:17
8

What's wrong with explaining a complex algorithm or a long and convoluted piece of code with a descriptive comment?

Nothing as such. Documenting your work is good practice.

That said, you have a false dichotomy here: writing clean code vs. writing documented code - the two are not in opposition.

What you should focus on is simplifying and abstracting complex code into simpler code, instead of thinking "complex code is fine as long as it is commented".

Ideally, your code should be simple and documented.

This way, instead of other developers (including yourself) having to read the entire algorithm line by line to figure out what it does, they can just read the friendly descriptive comment you wrote in plain English.

True. This is why all your public API algorithms should be explained in the documentation.

So after writing a complex piece of code written in a partly human-readable programming language, why not add a descriptive and concise comment explaining the operation of the code in friendly and understandable English?

Ideally, after writing a complex piece of code you should (not an exhaustive list):

  • consider it a draft (i.e. plan to re-write it)
  • formalize the algorithm entry points/interfaces/roles/etc (analize and optimize interface, formalize abstractions, document preconditions, postconditions and side effects and document error cases).
  • write tests
  • cleanup and refactor

None of these steps are trivial to do (i.e. each can take a few hours) and the rewards for doing them are not immediate. As such, these steps are (almost) always compromized on (by developers cutting corners, managers cutting corners, deadlines, market constraints/other real world conditions, lack of experience etc).

[...] some algorithms are complex. And therefore are hard to understand when reading them line by line.

You should never have to rely on reading the implementation to figure out what an API does. When you do that, you are implementing client code based on the implementation (instead of the interface) and that means your module coupling is already shot to hell, you are potentially introducing undocumented dependendencies with every new line of code that you write, and are already adding technical debt.

Is it really that bad to explain a complex algorithm with a few lines of comments about it's general operation?

No - that is good. Adding a few lines of comments is not enough though.

What's wrong with explaining complicated code with a comment?

The fact that you shouldn't have complicated code, if that can be avoided.

To avoid complicated code, formalize your interfaces, spend ~ 8 times more on API design than you spend on the implementation (Stepanov suggested spending at least 10x on the interface, compared with the implementation), and go into developing a project with the knowledge that you are creating a project, not just writing some algorithm.

A project involves API documentation, functional documentation, code/quality measurements, project management and so on. None of these processes are one-off, fast steps to make (they all take time, require forethought and planning, and they all require that you come back to them periodically and revise/complete them with details).

1
  • 3
    "You should never have to rely on reading the implementation to figure out what an API does." Sometimes this is inflicted on you by an upstream that you're committed to using. I had a particularly unsatisfying project littered with comments of the form "the following ugly Heath Robinson code exists because simpleAPI() does not work properly on this hardware despite what the vendor claims".
    – pjc50
    Commented Sep 1, 2014 at 14:56
6

instead of other developers (including yourself) having to read the entire algorithm line by line to figure out what it does, they can just read the friendly descriptive comment you wrote in plain English.

I would consider this a slight abuse of "comments". If the programmer wants to read something instead of the entire algorithm, then that's what function documentation is for. OK, so the function documentation might actually appear in comments in the source (perhaps for extraction by doc tools), but although syntactically it's a comment as far as your compiler is concerned, you should consider them separate things with separate purposes. I don't think "comments should be scarce" is necessarily intended to mean "documentation should be scarce" or even "copyright notices should be scarce"!

Comments in the function are for someone to read as well as the code. So if you have a few lines in your code that are difficult to understand, and you can't make them easy to understand, then a comment is useful for the reader to use as a placeholder for those lines. This could be very useful while the reader is just trying to get the general gist, but there are a couple of problems:

  • Comments aren't necessarily true, whereas the code does what it does. So the reader is taking your word for it, and this is not ideal.
  • The reader doesn't understand the code itself yet, so until they come back to it later they still aren't qualified to modify or re-use it. In which case what are they doing reading it?

There are exceptions, but most readers will need to understand the code itself. Comments should be written to assist that, not to replace it, which is why you're generally advised that comments should say "why you're doing it". A reader who knows the motivation for the next few lines of code has a better chance of seeing what they do and how.

6
  • 6
    One useful place for comments: in scientific code, you can often have computations that are quite complex, involving lots of variables. For the sanity of the programmer, it makes sense to keep variable names really short, so you can look at the maths, rather than the names. But that makes it really hard to understand for the reader. So a short description of what is going on (or better, a reference to the equation in a journal article or similar), can be really helpful.
    – naught101
    Commented Sep 1, 2014 at 8:50
  • 1
    @naught101: yes, especially since the paper you're referring to also probably used single-letter variable names. It's usually easier to see that the code does indeed follow the paper if you use the same names, but that's in conflict with the goal of the code being self-explanatory (it's explained by the paper instead). In this case, a comment where each name is defined, saying what it actually means, substitutes for meaningful names. Commented Sep 1, 2014 at 8:56
  • 1
    When I am searching for something specific in code (where is this specific case handled?), I don't want to read and understand paragraphs of code just to discover that it is not the place after all. I need comments that summarize in a single line what the next paragraph is doing. This way, I will quickly locate the parts of code related to my problem and skip over uninteresting details.
    – Florian F
    Commented Sep 1, 2014 at 11:57
  • 1
    @FlorianF: the traditional response is that variable and function names should indicate roughly what the code is about, and hence let you skim over things that certainly aren't about what you're looking for. I agree with you that this doesn't always succeed, but I don't agree so strongly that I think all code needs to be commented to aid searching or skim-reading. But you're right, that's a case where someone is reading your code (sort of) and legitimately doesn't need to understand it. Commented Sep 1, 2014 at 12:01
  • 2
    @Snowman People could do that with variable names. I have seen code where the variable listOfApples contained a list of Bananas. Someone copied the code processing the list of Apples and adapted it for Bananas without bothering changing the variable names.
    – Florian F
    Commented Sep 1, 2014 at 19:42
6

I forget where I read it but there is a sharp and clear line between what should appear in your code and what should appear as a comment.

I believe you should comment your intent, not your algorithm. I.e. comment what you meant to do, not on what you do.

For example:

// The getter.
public <V> V get(final K key, Class<V> type) {
  // Has it run yet?
  Future<Object> f = multitons.get(key);
  if (f == null) {
    // No! Make the task that runs it.
    FutureTask<Object> ft = new FutureTask<Object>(
            new Callable() {

              public Object call() throws Exception {
                // Only do the create when called to do so.
                return key.create();
              }

            });
    // Only put if not there.
    f = multitons.putIfAbsent(key, ft);
    if (f == null) {
      // We replaced null so we successfully put. We were first!
      f = ft;
      // Initiate the task.
      ft.run();
    }
  }
  try {
    /**
     * If code gets here and hangs due to f.status = 0 (FutureTask.NEW)
     * then you are trying to get from your Multiton in your creator.
     *
     * Cannot check for that without unnecessarily complex code.
     *
     * Perhaps could use get with timeout.
     */
    // Cast here to force the right type.
    return (V) f.get();
  } catch (Exception ex) {
    // Hide exceptions without discarding them.
    throw Throwables.asRuntimeException(ex);
  }
}

Here there is no attempt to state what each step performs, all it states is what it is supposed to do.

PS: I found the source I was referring to - Coding Horror: Code Tells You How, Comments Tell You Why

6
  • 8
    The first comment: Has it run yet? Has what run yet? Same for the other comments. For someone not knowing what the code does, this is useless.
    – gnasher729
    Commented Sep 1, 2014 at 14:48
  • 1
    @gnasher729 - Taken out of context almost any comment will be useless - this code is a demonstration of adding comments that indicate intent rather than attempting to describe. I am sorry that it does nothing for you. Commented Sep 1, 2014 at 14:55
  • 2
    A maintainer of that code won't have a context. It's not particularly difficult to figure out what the code does, but the comments are not helping. If you write comments, take your time and concentrate when you write them.
    – gnasher729
    Commented Sep 1, 2014 at 22:51
  • BTW - The Has it run yet comment is referring to the Future and indicates that a get() followed by a check against null detects whether the Future has already been run - correctly documenting the intent rather than the process. Commented Sep 5, 2014 at 9:54
  • 1
    @OldCurmudgeon: Your reply is close enough to what I was thinking, that I'll just add this comment as an example of your point. While a comment isn't needed to explain clean code, a comment IS good to explain why coding was done ONE WAY OVER ANOTHER. In my limited experience, comments are often useful to explain idiosyncracies of the data set the code is working upon, or the business rules the code is meant to enforce. Commenting code that is added to fix a bug is a good example, if that bug happened because an assumption about the data was wrong. Commented Feb 14, 2017 at 18:37
5

Often we have to do complicated things. It's certainly right to document them for future understanding. Sometimes the right place for this documentation is in the code, where the documentation can be kept up to date with the code. But it's definitely worth considering separate documentation. This can also be easier to present to other people, include diagrams, colour pictures, and so on. Then the comment is just:

// This code implements the algorithm described in requirements document 239.

or even just

void doPRD239Algorithm() { ...

Certainly people are happy with functions named MatchStringKnuthMorrisPratt or encryptAES or partitionBSP. More obscure names are worth explaining in a comment. You could also add bibliographic data and a link to a paper that you've implemented an algorithm from.

If an algorithm is complex and novel and not obvious, it's definitely worth a document, even if only for internal company circulation. Check the document into source control if you're worried about it getting lost.

There is another category of code which isn't so much algorithmic as bureaucratic. You need to set up parameters for another system, or interoperate with someone else's bugs:

/* Configure the beam controller and turn on the laser.
The sequence is timing-critical and this code must run with interrupts disabled.
Note that the constant 0xef45ab87 differs from the vendor documentation; the vendor
is wrong in this case.
Some of these operations write the same value multiple times. Do not attempt
to optimise this code by removing seemingly redundant operations.
*/
1
  • 3
    I'd argue against naming functions/methods after their internal algorithm, most of the time the method used should be an internal concern, by all means document the top of your function with the method used, but don't call it doPRD239Algorithm that tells me nothing about the function without having to look up the algorith, the reason MatchStringKnuthMorrisPratt and encryptAES work is that they starts with a description of what they do, then follows up with a description of the methodology.
    – scragar
    Commented Sep 2, 2014 at 14:35
4

But we all know that's not enough.

Really? Since when?

Well designed code with good names is more than enough in the vast majority of cases. The arguments against using comments are well known and documented (as you refer to).

But these are guidelines (like anything else). In the rare case (in my experience, a about once every 2 years) where things would be worse when refactored into smaller legible functions (due to performance or cohesion needs) then go ahead - put in some lengthy comment explaining what the thing is actually doing (and why you're violating best practices).

8
  • 7
    I know it is not enough.
    – Florian F
    Commented Sep 1, 2014 at 11:47
  • 2
    Since when? Apparently, you already know the answer to that. "Well designed code with good names is more than enough in the vast majority of cases." So, it's probably not enough in a minority of cases, which is exact what the asker is asking.
    – Ellesedil
    Commented Sep 2, 2014 at 20:01
  • 3
    I am ever trying to decipher other peoples' code whom I wish had added some comments more than once every two years. Commented Sep 4, 2014 at 20:15
  • @OgrePsalm33 - Do they have small methods and use good names? Bad code is bad, regardless of comments.
    – Telastyn
    Commented Sep 4, 2014 at 20:26
  • 2
    @Telastyn Unfortunately, when working on a large code base, "small" methods and "good" names are subjective to each developer (so is a good comment, for that matter). A developer writing Flarbigan graphical processing algorithm code for 7 years, can write something perfectly clear to him and similar developers, but would be cryptic to the new guy who spent the last 4 years developing Perbian grid infrastructure code. Then, 2 weeks later, the Flarbigan expert quits. Commented Sep 4, 2014 at 21:04
2

The principal purpose of code is commanding a computer to do something, so a good comment is never a substitute for good code because comments can't be executed.

That being said, comments in the source are one form of documentation for other programmers (including yourself). If the comments are about more abstract issues than what the code is doing at every step, you're doing better than average. That level of abstraction varies with the tool you're using. Comments accompanying assembly language routines generally have a lower level of "abstraction" than, for example, this APL A←0⋄A⊣{2⊤⍵:1+3×⍵⋄⍵÷2}⍣{⍺=A+←1}⎕. I think that would probably merit a comment about the problem it's intended to solve, hmmm?

2

If the code is trivial, it doesn't need an explanatory comment. If the code is non-trivial, the explanatory comment will most likely also be non-trivial.

Now, the trouble with non-trivial natural language is that many of us are not very good at reading it or writing it. I'm sure your written communication skills are excellent, but nevertheless someone with a lesser grasp of written language might misunderstand your words.

If you try very hard to write natural language that cannot be misinterpreted you end up with something like a legal document (and as we all know those are more verbose and difficult to understand than code).

Code should be the most concise description of your logic, and there shouldn't be much debate about the meaning of your code because your compiler and platform have the final say.

Personally I wouldn't say that you should never write a comment. Only that you should consider why your code needs a comment, and how you might fix that. This seems to be a common theme in answers here.

1
  • Exactly what I was thinking when I disagreed with the statement "A human can understand a piece of English much faster that he/she can understand a piece of code with the same meaning (as long as the operation isn't trivial)" Code is always less ambiguous and more concise. Commented Sep 2, 2014 at 16:49
0

One point not yet mentioned is that sometimes commenting precisely what a piece of code does can be helpful in cases where a language uses a particular syntax for multiple purposes. For example, assuming all variables are of type float, consider:

f1 = (float)(f2+f3); // Force result to be rounded to single precision
f4 = f1-f2;

The effect of explicitly casting a float to float is to force the result to be rounded to single precision; the comment could thus be viewed as simply saying what the code does. On the other hand, compare that code to:

thing.someFloatProperty = (float)(f2*0.1); // Divide by ten

Here, the purpose of the cast is to prevent the compiler from squawking at the most efficient way of accurately computing (f2/10) [it's more accurate than multiply by 0.1f, and on most machines it's faster than dividing by 10.0f].

Without the comment, someone who was reviewing the former code might think the cast was added in a mistaken belief that it would be needed to prevent the compiler from squawking and that it wasn't needed. In fact, the cast serves the purpose of doing exactly what the language spec says it does: force the result of the computation to be rounded to single-precision even on machines where the rounding would be more expensive than keeping the result in higher precision. Given that a cast to float can have a number of different meanings and purposes, having a comment specify which meaning is intended in a particular scenario can help make clear that the actual meaning lines up with intent.

7
  • I'm not sure that J. Random Programmer, looking at the second example, will realize that the constant is written 0.1 for a good reason, rather than because the original programmer forgot to type an 'f'.
    – David K
    Commented Sep 7, 2014 at 15:45
  • Especially during debugging, you never assume that anything has been done for a good reason.
    – gnasher729
    Commented Sep 8, 2014 at 10:16
  • @DavidK: The purpose of my second example code was to contrast it with the first piece of code. In the second piece of code, the programmer's intention is probably to have someFloatProperty hold the most accurate representation of f2/10 that it can; the primary purpose of the second cast is thus simply to make the code compile. In the first example, however, the cast clearly isn't needed for its normal purpose (changing one compile-time type to another) since the operands is already float. The comment serves to make clear that the cast is needed for a secondary purpose (rounding).
    – supercat
    Commented Sep 8, 2014 at 15:29
  • I agree with the notion that you don't need to make any comment about the (float) cast in the second example. The question is about the literal constant 0.1. You explained (in the next paragraph of text) why we would write 0.1: "it's more accurate than multiply by 0.1f." I'm suggesting that those are the words that should be in the comment.
    – David K
    Commented Sep 8, 2014 at 16:54
  • @DavidK: I would certainly include the comment if I knew that 0.1f would be unacceptably imprecise, and would use 0.1f if I knew that the loss of precision would be acceptable and that 0.1f would in fact be materially faster than 0.1. If I don't know either of those things to be true, preferred my coding habit would be to use double for constants or intermediate calculations whose value may not be representable as float [though in languages that require annoying explicit double-to-float casts, laziness may push be to use use of float constants not for speed, but to minimize annoyance].
    – supercat
    Commented Sep 8, 2014 at 17:07
-1

Comments that explain what the code does are a form of duplication. If you change the code and then forget to update the comments this can cause confusion. I am not saying don't use them, just use them judiciously. I subscribe to the Uncle Bob maxim: "Only comment what the code can't say".

Not the answer you're looking for? Browse other questions tagged or ask your own question.