3

What are the downsides of splitting /proc/pid/stat on Linux by whitespace? For example using bash one can access the third column via

$ cat /proc/$$/stat
14198 (bash) S 14195 14198 14198 34816 ...
$ x=($(< /proc/$$/stat)); echo ${x[2]}
S
$ 

and all seems well?

2 Answers 2

5

The chief problem is that the space character (0x20) is used both for the delimiter between records and may also appear within a record; should a local user be able to set the process name

$ perl -e '$0="like this"; sleep 999' &
[1] 14343
$ 

then the parse splitting by whitespace will fail

$ x=($(< /proc/14343/stat)); echo ${x[2]}
this)
$ 

as the command name contains a space.

$ cat /proc/14343/stat
14343 (like this) S 14198 14343 ...
$ 

How bad could this be? According to proc(5) the "controlling terminal of the process" is interesting

          tty_nr %d   (7) The controlling terminal of the  process.   (The
                      minor  device number is contained in the combination
                      of bits 31 to 20 and 7 to 0; the major device number
                      is in bits 15 to 8.)

so if a process misuses the controlling terminal information parsed incorrectly from /proc/pid/stat because someone changed that information, well, you may get a security vulnerability.

The parsing is additionally complicated by the fact that a ) can be placed in the process name though there is a 15 character limit

$ perl -e '$0="lisp) a b c d e f g h i"; sleep 999' &
[4] 14440
$ cat /proc/14493/stat
14493 (lisp) a b c d e) S 14198 14493 14198 34816 ...
$ 

Ideas to Parse this Wart of an Interface

Since the process name can vary somewhere between the empty string and 15 bytes of almost any contents

1234 () S ...
4321 (xxxxxxxxxxxxxxx) S ...

one idea would be to split on the first space to obtain the pid, then work backwards from the end of this string to find the first ); the stuff before the first ) from the right should be the process name and to the left the regular fields. Unit tests for the code would be highly advisable...

1
  • 2
    It’s often better to parse one of the other files in /proc/pid, if all the information needed is in a single file, or race conditions aren’t an issue. (So if anyone thought of reading /proc/pid/comm to help with parsing /proc/pid/stat, no, it isn’t a good idea.) Commented Dec 7, 2017 at 15:14
3

If you need to even think about it, why not just read /proc/$pid/status instead. It gives the same information on nicely labeled lines, and escapes newlines and backslashes that appear in the process name:

$ perl -e '$0="foo\nbar\n"; system "head -3 /proc/$$/status";'
Name:   foo\nbar\n
Umask:  0022
State:  S (sleeping)
2
  • that's probably easier for perl or such but likely more difficult for C to deal with ( github.com/Microsoft/ProcDump-for-Linux/issues/8 )
    – thrig
    Commented Dec 8, 2017 at 22:14
  • @thrig, eh, that code there reads /proc/$pid/stat, not status. So no wonder it has trouble. Reading a single-datum-per-line file in C is just a loop over fgets() and strcmp (for the headers). Though I do think I'll go to sleep now instead of coding the un-escaping.
    – ilkkachu
    Commented Dec 8, 2017 at 22:34

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .