The graphic characters would be the one for which the isgraph()
/iswgraph()
standard functions return true or the ones matched by the [[:graph:]]
regular expressions, that is the ones in the graph
character class in the locale.
Per POSIX, the print
class must be a superset of graph
and be disjunct from cntrl
and graph
must be a superset of upper
, lower
, alpha
, digit
, xdigit
, and punct
and must not include the space (U+0020) character (with no mention of other whitespace characters).
The idea being that the graphic characters would be the ones for which ink would be used to draw them, while printable would be the non-control ones.
In practice, on GNU systems (such as Ubuntu) at least print
is graph
plus the non-control characters from the space
class. Here with glibc 2.35 (as used on Ubuntu 22.04) and in UTF-8 locales, that includes:
U+0020 SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
While the space
class has:
U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000B LINE TABULATION
U+000C FORM FEED
U+000D CARRIAGE RETURN
U+0020 SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
https://www.gnu.org/software/gawk/manual/gawk.html#Bracket-Expressions
. However, one significant issue is that the definitions may depend on the current Locale settings. Multi-byte characters (e.g. UTF-8) are a whole new game.