The need for comments is inversely proportional to the abstraction level of the code.
For example, Assembly Language is, for most practical purposes, unintelligible without comments. Here's an excerpt from a small program that calculates and prints terms of the Fibonacci series:
main:
; initializes the two numbers and the counter. Note that this assumes
; that the counter and num1 and num2 areas are contiguous!
;
mov ax,'00' ; initialize to all ASCII zeroes
mov di,counter ; including the counter
mov cx,digits+cntDigits/2 ; two bytes at a time
cld ; initialize from low to high memory
rep stosw ; write the data
inc ax ; make sure ASCII zero is in al
mov [num1 + digits - 1],al ; last digit is one
mov [num2 + digits - 1],al ;
mov [counter + cntDigits - 1],al
jmp .bottom ; done with initialization, so begin
.top
; add num1 to num2
mov di,num1+digits-1
mov si,num2+digits-1
mov cx,digits ;
call AddNumbers ; num2 += num1
mov bp,num2 ;
call PrintLine ;
dec dword [term] ; decrement loop counter
jz .done ;
; add num2 to num1
mov di,num2+digits-1
mov si,num1+digits-1
mov cx,digits ;
call AddNumbers ; num1 += num2
.bottom
mov bp,num1 ;
call PrintLine ;
dec dword [term] ; decrement loop counter
jnz .top ;
.done
call CRLF ; finish off with CRLF
mov ax,4c00h ; terminate
int 21h ;
Even with comments, it can be quite complicated to grok.
Modern Example: Regexes are often very low abstraction constructs (lower case letters, number 0, 1, 2, new lines, etc). They probably need comments in the form of samples (Bob Martin, IIRC, does acknowledge this). Here is a regex that (I think) should match HTTP(S) and FTP URLs:
^(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|m
+il|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.
+\,\;\?\'\\\+&%\$#\=~_\-]+))*$
As the languages progress up the abstraction hierarchy, the programmer is able to use evocative abstractions (variable name, function names, class names, module names, interfaces, callbacks, etc) to provide built-in documentation. To neglect to take advantage of this, and use comments to paper over it is lazy, a disservice to and disrespectful of the maintainer.
I am thinking of Numerical Recipes in C translated mostly verbatim to Numerical Recipes in C++, which I infer began as Numerical Recipes (in FORTAN), with all the variables a
, aa
, b
, c
, cc
, etc maintained through each version. The algorithms may have been correct, but they did not take advantage of the abstractions the languages provided. And they p*** me off. Sample from a Dr. Dobbs article - Fast Fourier Transform:
void four1(double* data, unsigned long nn)
{
unsigned long n, mmax, m, j, istep, i;
double wtemp, wr, wpr, wpi, wi, theta;
double tempr, tempi;
// reverse-binary reindexing
n = nn<<1;
j=1;
for (i=1; i<n; i+=2) {
if (j>i) {
swap(data[j-1], data[i-1]);
swap(data[j], data[i]);
}
m = nn;
while (m>=2 && j>m) {
j -= m;
m >>= 1;
}
j += m;
};
// here begins the Danielson-Lanczos section
mmax=2;
while (n>mmax) {
istep = mmax<<1;
theta = -(2*M_PI/mmax);
wtemp = sin(0.5*theta);
wpr = -2.0*wtemp*wtemp;
wpi = sin(theta);
wr = 1.0;
wi = 0.0;
for (m=1; m < mmax; m += 2) {
for (i=m; i <= n; i += istep) {
j=i+mmax;
tempr = wr*data[j-1] - wi*data[j];
tempi = wr * data[j] + wi*data[j-1];
data[j-1] = data[i-1] - tempr;
data[j] = data[i] - tempi;
data[i-1] += tempr;
data[i] += tempi;
}
wtemp=wr;
wr += wr*wpr - wi*wpi;
wi += wi*wpr + wtemp*wpi;
}
mmax=istep;
}
}
As a special case about abstraction, every language has idioms / canonical code snippets for certain common tasks (deleting a dynamic linked list in C), and regardless of how they look, they shouldn't be documented. Programmers should learn these idioms, as they are unofficially part of the language.
So the take away: Non-idiomatic code built from low-level building blocks that can't be avoided needs comments. And this is necessary WAAAAY less than it happens.