10
$\begingroup$

When coding in Python, I found that defining a list of strings without separating the strings with a comma is not a syntax error. When running this code:

x = ['a', 'b', 'c'] # This is the proper one
y = ['a' 'b' 'c'] # Commas missing!
print(x) # results in ['a', 'b', 'c']
print(y) # results in ['abc']

However, if I run this code, Python complains about it.

a = 'a'
b = 'b'
c = 'c'
z = [a b c] # SyntaxError: invalid syntax. Perhaps you forgot a comma?

Instead of reporting an error, Python simply assumes that the three non-comma-separated strings should be combined into the same string. What are the reasons for this? Is it common for people to write lists of strings without comma separators and intend for them to be the same string?

$\endgroup$
12
  • 5
    $\begingroup$ Relevant StackOverflow question $\endgroup$
    – lyxal
    Commented Aug 6, 2023 at 4:06
  • 11
    $\begingroup$ I provided input to the original ANSI X3J11 committee in the early 1980s, objecting to "a" "b" being treated like "a" + "b" rather than as a syntax error, as it would lead to undetectable typos and bugs. ¶ Over the years since then I've encountered more than a few instances of arrays of strings, one per line, being reordered leaving the string from the last line (no comma at the end) somewhere in the middle and silently concatenated to the following element. It's very difficult to notice (until things break weeks, months, or years later). I don't know why Python copied such a bad idea. $\endgroup$ Commented Aug 6, 2023 at 12:56
  • 2
    $\begingroup$ "Is it common for people to write lists of strings without comma separators and intend for them to be the same string?" Probably not lists, but individual strings, yes. And the syntax can't really have a different meaning just because it's in a list literal. $\endgroup$
    – kaya3
    Commented Aug 6, 2023 at 14:10
  • 3
    $\begingroup$ I’m voting to close this question because questions about programming in specific languages is more appropriate for Stack Overflow. $\endgroup$
    – Barmar
    Commented Aug 7, 2023 at 20:06
  • 7
    $\begingroup$ @Barmar This question isn't about programming in Python, it's about a decision made by the designers of Python. Questions about the design or implementation of specific languages are on-topic here, and we have several other questions also tagged python. $\endgroup$
    – kaya3
    Commented Aug 7, 2023 at 20:52

1 Answer 1

24
$\begingroup$

The reason the first example does not error is because Python will automatically concatenate adjacent string literals. This happens inside and outside of lists.

As mentioned in the Python documentation, this:

can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings[.]

This feature was once considered for removal, however, "[t]here wasn’t enough support in favor, the feature to be removed isn’t all that harmful, and there are some use cases that would become harder."

As to why implicit string concatenation exists:

Many Python parsing rules are intentionally compatible with C. This is a useful default, but Special Cases need to be justified based on their utility in Python. ...

In C, implicit concatenation is the only way to join strings without using a (run-time) function call to store into a variable.

source

$\endgroup$
4
  • 12
    $\begingroup$ C needs this feature so that concatenation can be done in macro expansions or with macros that expand into string literals. That need doesn't exist in Python, so this really does seem like an unnecessary compatibility. $\endgroup$
    – Barmar
    Commented Aug 7, 2023 at 20:09
  • $\begingroup$ @Barmar: In C, this has the benefit that tokens never span multiple lines, so a tokeniser can report lexical errors per line, and always recover at a line break. To my knowledge, Python has always had multiline string tokens, so it doesn’t make use of this. But overall it helps with error reporting to prevent runaway syntax. $\endgroup$
    – Jon Purdy
    Commented Jun 26 at 22:36
  • $\begingroup$ @JonPurdy Python has had multiline string tokens (triple-quoted strings) for at least a very long time, but indenting them is tricky because they incorporate any whitespace you might use to indent the noninitial lines. A series of juxtaposed single-line string literals (especially inside brackets to make use of implicit line joining) is a way to avoid that. $\endgroup$
    – chepner
    Commented Jul 5 at 22:00
  • $\begingroup$ @chepner: Sure, they’re both useful and all. I just meant Python is missing out on one of the benefits of this syntax because it doesn’t have the C feature that newlines are recovery points when you don’t have multiline tokens, because it does have them—and has for a long time, so evidently it was never a goal to be lexically compatible with C, just to be somewhat familiar-looking to C users. $\endgroup$
    – Jon Purdy
    Commented Jul 7 at 2:32

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .