6

There are common pairings of escape sequences to ASCII control characters, such as Ctrl-C and Ctrl-Z to ETX and SUB, respectively.

On the Wikipedia Control Codes page, there are most pairings, but no cited reference.

Are the control character and key sequence pairings part of a standard?

Where is that listed for Linux and other OS's?

Are there man pages listing these pairings?

Are they purely decades of unwritten convention?

References

2

3 Answers 3

10

The pairings are "the Latin alphabet", 1 through 26 (and then the relevant rest of ASCII too).

Ctrl-C gives ETX, byte value 3 (0x03, 000000011); C is ASCII 67 (0x43, 010000011). Flip bit 7 (add/subtract 64) to get from one to the other. SUB is byte value 26, and so on, as listed on the Wikipedia page you mention in order from 1-26 and A-Z.

The other C0 controls correspond to Ctrl and other non-alphabetic characters: NUL is Ctrl-@, since @ is ASCII 64, and [ (91) corresponds to ESC (27), and so on until you hit space.

ASCII defines these bytes with those labels and (somewhat) meaning, as does Unicode, and as do numerous other encoding standards. The use of Ctrl to flip that bit is determined by the terminal or input driver, but the name "control characters" is fairly suggestive of how that pairing comes about. They will have the same correspondence between letter and byte on any system following this tradition.

On the other hand, many of the ASCII controls and their key sequences are either not used or used for different purposes than originally envisaged, in modern Unix-like systems at least. Ctrl-C and Ctrl-D are still reasonably parallel in the effect they have, but Ctrl-V is usually used to initiate literal input these days rather than synchronous idle, for example, and I've never seen a group separator in the wild.

8
  • this is a cool system for this! I wish the wikipedia page or a man page or some *nix docs explained this. So common but never really explained Commented May 13, 2018 at 4:10
  • Are the pairings in the ASCII standard? Commented May 13, 2018 at 4:33
  • 2
    The character names and values are. ASCII doesn't concern itself with how you input things, it's just an encoding, so the Ctrl-x combinations build on it, rather than the other way around. Commented May 13, 2018 at 4:36
  • 3
    None of the encoding standards deal with how you input things and so none of them will be related. Commented May 13, 2018 at 4:55
  • 1
    I wouldn't say it's not determined by the input driver. In my answer I show the kernel-side mapping is in the VT layer (which sits between the input driver and the TTY layer). Specifically it's controlled by the keymap, so it should even be user-configurable :-).
    – sourcejedi
    Commented May 13, 2018 at 12:24
6

I wrote a document in 1984 that summarizes ANSI Codes X3.64-1979, ANSI X3.4-1977, and ANSI X3.41-1974. This ansicode.txt describes how the control codes affect DEC LA-series hardcopy terminals and the VT-series video terminals.

1
  • 1
    Thus showing that it is definitely not "unwritten". This stuff has been written down by a lot of people. Did you ever think of updating yours for the VT42x and VT52x and the 1986 and 1991 editions of ECMA-48? It doesn't have SIMD and suchlike later additions, has LNM which was later deprecated, and DECKEYS is known as DECFNK of which there are now (since at least the VT420) more than those that you listed.
    – JdeBP
    Commented May 13, 2018 at 14:51
3

De-jure standardization - maybe no

We can rule out that it is part of POSIX. POSIX tries to avoid requiring either the full ASCII set, or the ASCII encodings (numeric values), on which this mapping is based. For example, the ETX character you mention is not required by POSIX. Nor does POSIX mention anything about Control-C / ETX or Control-Z / SUB as defaults when discussing the characters used for INTR and SUSP, for example.

As pointed out by others, the Control key behaviour is not inherently part of the definition of a character set / character encoding. It looks like the Control key mapping was not specified as part of ASCII. Nor does it seem to be part of the series of ANSI terminal standards.

De-facto standardization - the VT100?

I think you can partly explain the expectation of this behaviour in terms of the VT100 / VT102 having become the de-facto standard, though I expect the behaviour pre-dates it. See "What protocol/standard is used by terminals?"

See VT102 User Guide, "Transmitted Characters" -> "Function Keys" -> "Control Character Keys".

Figure 4-3 shows the keys that generate control characters. You can generate control characters in two ways.

  • Hold down CTRL and press any unshaded key in Figure 4-3.
  • Press any shaded key in Figure 4-3 without using CTRL. These dedicated keys generate control characters without the use of CTRL.

Table 4-2 lists the control characters generated by the keyboard. Different computer systems may use each control character differently.

NOTE: The VT102 generates some control characters differently than previous DIGITAL terminals. Table 4-3 lists the changes.

I found this last note particularly interesting. The VT102 uses Control-space for NUL, whereas "previous terminals" from Digital used Control-@. It also changes the escapes for the last two C0 controls, RS and US. (I wonder how this fits into the pattern of flipping bit 7). The VT100 also uses Control-space, so I assume "previous terminals" refers to the VT52 family.

Linux kernel

This is not replicated in different hardware drivers e.g. PS/2 keyboard v.s. USB keyboard. Instead it is handled in the VT layer. See vt/keyboard.c. The state of the keyboard modifiers, including Control, is maintained in shift_state. The shift state is then used to select a keymap.

param.shift = shift_final = (shift_state | kbd->slockstate) ^ kbd->lockstate;
param.ledstate = kbd->ledflagstate;
key_map = key_maps[shift_final];

https://elixir.bootlin.com/linux/v4.16.8/source/drivers/tty/vt/keyboard.c#L1393

So for more information I think you would have to look into keymaps as used by the VT layer. I assume the key map for Control is set up to produce the pairing you describe.

The loadkeys man page also mentions the kernel default key map. This has been moved since the man page was written; it is now located at drivers/tty/vt/defkeymap.c_shipped. To read these tables, you must know the Linux keycodes used to index it. They are based on a QWERTY keyboard, so the letters are neither alphabetical nor contiguous. See include/uapi/linux/input-event-codes.h. Or better, this table which shows both keycodes and the default Control mapping.

3
  • 1
    You have to look further back than the VT100, and not at systems that use a scancode/keycode and a modifier word to index into a 2-dimensional array. There were systems whose keyboards actually did force the upper bits in response to a control key modifier. Although this takes us well out of the realm of Unix and Linux and into the likes of superuser.com/questions/763879 .
    – JdeBP
    Commented May 13, 2018 at 15:11
  • 1
    @JdeBP question included request for Linux docs. This seems a pretty cromulent explanation for why control-space on the console generates ^@, heh heh (at least when typing into xxd): we follow the rules from VT100 :). (same for xterm and gnome-terminal). I wonder if there's anything more to know on the caret notation echoed by the TTY. Amusing that Linux VT is a fresh implementation that uses control-space, but Linux TTY seems to echo ^@. I suppose "^ " is hard to tell apart though :)
    – sourcejedi
    Commented May 13, 2018 at 16:00
  • I see the echoed ^@ rubs out as a single character in each case as well... deep magic.
    – sourcejedi
    Commented May 13, 2018 at 16:03

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .