9

As far as I know \ in C just appends the next line as if there was not a line break.

Consider the following code:

main(){\
return 0;
}

When I saw the pre-processed code(gcc -E) it shows

main(){return
       0;
}

and not

main(){return 0;
}

What is the reason for this kind of behaviour? Also, how can I get the code I expected?

11
  • 1
    what do you mean by "the pre-processed code"? Commented Jun 25, 2013 at 6:53
  • 3
    @DavidBrown: Presumably the output of gcc -E (or equivalent). Commented Jun 25, 2013 at 6:54
  • 2
    @Zane: I wouldn't use it for long strings, since C already joins adjacent string literals (so e.g. "foo" /* bar */ "baz" is equivalent to "foobaz"). The only thing I think the backslash-after-new-line is really useful for is macro definitions (where it lets you put the definition across multiple lines).
    – ruakh
    Commented Jun 25, 2013 at 6:56
  • 2
    This is a question about some specific standalone C preprocessor, not a question about the C language. Expanding: C compilers produce executable programs, they don't produce C code ... so you can't get the code you expected out of a C compiler.
    – Jim Balter
    Commented Jun 25, 2013 at 6:57
  • 1
    Out of curiosity: Why are you interested in how the preprocessor works at all? What is the context? Or are you just curious by yourself? Commented Jun 25, 2013 at 7:00

3 Answers 3

10

Yes, your expected result is the one required by the C and C++ standards. The backslash simply escapes the newline, i.e. the backslash-newline sequence is deleted.

GCC 4.2.1 from my OS X installation gives the expected result, as does Clang. Furthermore, adding a #define to the beginning and testing with

#define main(){\
return 0;
}
main()

yields the correct result

}
{return 0;

Perhaps gcc -E does some extra processing after preprocessing and before outputting it. In any case, the line break seen by the rest of the preprocessor seems to be in the right place. So it's a cosmetic bug.

UPDATE: According to the GCC FAQ, -E (or the default setting of the cpp command) attempts to put output tokens in roughly the same visual location as input tokens. To get "raw" output, specify -P as well. This fixes the observed issues.

Probably what happened:

  1. In preserving visual appearance, tokens not separated by spaces are kept together.
  2. Line splicing happens before spaces are identified for the above.
  3. The { and return tokens are grouped into the same visual block.
  4. 0 follows a space and its location on the next line is duly noted.

PLUG: If this is really important to you, I have implemented my own preprocessor with correct implementation of both raw-preprocessed and whitespace-preserving "pretty" modes. Following this discussion I added line splices to the preserved whitespace. It's not really intended as a standalone tool, though. It's a testbed for a compiler framework which happens to be a fully compliant C++11 preprocessor library, which happens to have a miniature command-line driver. (The error messages are on par with GCC, or Clang, sans color, though.)

6
  • 3
    " your expected result is the one required by the C and C++ standards. " -- Not at all -- the standards do not require that implementations output C code, and they say nothing about the format of C code generated by programs that do. "So it's a cosmetic bug." -- It's not a bug at all, because the format of the output of gcc -E is not specified, other than that it will compile correctly.
    – Jim Balter
    Commented Jun 25, 2013 at 8:07
  • 1
    @JimBalter The output of phase 4 is well-defined enough to be observable. It consists of spaces, newlines, and preprocessing tokens. The standards don't require it to be observable, but if the preprocessor output is indeed what he's observing, the standard specifies what it should be, and where the newline goes. (There is no textual indication of token boundaries, so the text representation certainly falls short a bit.) I'm sure this is documented, and since GCC is supposed to conform to its own documentation, it might be more serious as a bug. gcc.gnu.org/bugzilla/show_bug.cgi?id=57714 Commented Jun 25, 2013 at 20:39
  • I've reconsidered this and agree that it's a GCC bug, because running the output of gcc -E through the compiler will give the wrong line number for an error if, say, the 0 in the OP's code is replaced by x or something else not allowed at that point. There's a response from Andrew Pinski to your bugzilla report claiming that it's the right output, but he's clearly wrong.
    – Jim Balter
    Commented Jun 25, 2013 at 20:59
  • 1
    It's possible that the GCC folks have misinterpreted "each instance of a backslash character () immediately followed by a new-line character is deleted" to mean the the backslash is deleted but the newline is not. The language is ambiguous, but the intent is certainly that "instance of" refers to the two characters.
    – Jim Balter
    Commented Jun 25, 2013 at 21:03
  • @JimBalter It looks like a simple accident, or poor test coverage. Andrew was my buddy at Apple back ~10 years ago but he doesn't seem to remember me… and he has better things to do :P . Judging by the clock, he might be on his way home now. Commented Jun 25, 2013 at 21:10
10

From K&R section A.12 Preprocessing:

A.12.2 Line Splicing

Lines that end with the backslash character \ are folded by deleting the backslash and the following newline character. This occurs before division into tokens.

4
  • This is especially useful when people want macro's to extend across multiple lines
    – Enigma
    Commented Jun 25, 2013 at 7:00
  • @MarounMaroun, oli added a online refrence for pdf Commented Jun 25, 2013 at 7:02
  • @OliCharlesworth The question "What does '\' do in C?" is perfectly answered by this, I think. Commented Jun 25, 2013 at 7:04
  • 3
    @TobiMcNamobi: Sure. But the real question appears to be "why am I seeing this particular behaviour?". Commented Jun 25, 2013 at 7:06
4

It doesn't matter :/ The tokenizer will not see any difference. 1

Update In response to the comments:

There seems to be a fair amount of confusion as to what the expected output of the preprocessor should be. My point is that the expectation /seems/ reasonable at a glance but doesn't actually need to be specified in this way for the output to be valid. The amount of whitespace present in the output is simply irrelevant to the parser. What matters is that the preprocessor should treat the continued line as one line while interpreting it.

In other words: the preprocessor is not a text transformation tool, it's a token manipulation tool.

If it matters to you, you're probably

  • using the preprocessor for for something other than C/C++
  • treating C++ code as text, which is a ... code smell. (libclang and various less complete parser libraries come to mind).

1 (The preprocessor is free to achieve the specified result in whichever way it sees fit. The result you are seeing is possibly the most efficient way the implementors have found to implement this particular transformation)

24
  • 3
    Or you're defining macros, in which case it's pretty important ;) Commented Jun 25, 2013 at 6:56
  • 3
    Oli I know. My point is that the amount of whitespace generated between tokens in the output is irrelevant (and likely unspecified)
    – sehe
    Commented Jun 25, 2013 at 6:57
  • 5
    @OliCharlesworth that proves nothing. He's expecting the wrong thing (on the wrong level of abstraction). See here: coliru.stacked-crooked.com/… GNU cpp is not blatantly broken.
    – sehe
    Commented Jun 25, 2013 at 7:02
  • 4
    @OliCharlesworth There's no bug. I get the same result as the OP, but it does the right thing for macros ... all of which is within what is permitted by the C standard. " the OP is expecting the right thing" -- as has been explained to you, this is wrong ... there is no "right thing" to expect; the C standard says nothing about any -E flag or any other way to observe the results of individual phases.
    – Jim Balter
    Commented Jun 25, 2013 at 7:11
  • 2
    @JimBalter: It's ok, I concede that the standard in no way constrains the implementation here. But I maintain that this GCC behaviour is misleading (and appears to have been changed over time; 4.2 acts differently to 4.7.2), and that this could be viewed as a "bug" (bugs aren't restricted solely to standards compliance). Commented Jun 25, 2013 at 7:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.