11

I understand that "everything is a file" is not entirely true, but as far as I know, every process gets a directory in /proc with lots of files. Read/write operations are often great bottlenecks in speed, and having to read/write from/to files all the time can significantly slow down processing.

Does having to keep a bunch of files in /proc slow things down? If not, how doesn't having to do a lot of IO operations not be a huge design flaw in Linux?

10
  • 37
    /proc is not a real filesystem, it's a view into the kernel's process tables. Any overhead in maintaining this view falls somewhere between "nonexistant" and "negligible".
    – cas
    Commented May 1, 2021 at 9:49
  • 5
    @cas: I would say it's almost entirely on the "nonexistent" end of the spectrum, because with very few exceptions, the kernel already has to keep track of all that information anyway, just to keep the system running correctly.
    – Kevin
    Commented May 1, 2021 at 22:22
  • 3
    Welcome to the Unix philosophy of "everything is a file", and many things that do not exist on storage are represented using the same interface. Commented May 1, 2021 at 22:36
  • 1
    @kevin I agree. overhead is very much on the "non-existent" side of the spectrum. But it's > zero, so non-existent isn't quite right.
    – cas
    Commented May 3, 2021 at 13:13
  • 1
    @KonradRudolph do you honestly have difficulty understanding, or are you just being tiresomely pedantic about your own artificially narrow and contorted definition of the word "real" and how it relates to the word "filesystem"? If the former, then the comments and answers here will tell you everything you need to know. If the latter, then please go quibble somewhere else, in private where no one else has to be exposed to it.
    – cas
    Commented May 4, 2021 at 11:51

6 Answers 6

47

Files in /proc and /sys exist purely dynamically, i.e. when nothing is reading them, they aren't there at all and the kernel spends no time generating them.

You could think of /proc and /sys files as API calls. If you don't execute them, the kernel doesn't run any code

9
  • 38
    Even when reading and writing them, they are still not there at all. When you do a directory listing, the kernel tells you they are there. And when you open one, the kernel tells you it's a file. But it isn't. In some sense, it never exists. Commented May 1, 2021 at 18:01
  • 1
    And in some cases, directory entries exist and can be opened but don’t appear when directories are listed. Some of the information produced when reading files does take a little time to generate, e.g. /proc/meminfo (with special handling to limit race conditions). Commented May 1, 2021 at 20:58
  • 2
    @DreamConspiracy You can use a bind mount for this without having to unplug anything: mount --bind / /somewhere/else will show you a view of / mounted somewhere else, but without all the virtual filesystems like /proc overlaid on it.
    – Boann
    Commented May 2, 2021 at 1:50
  • 1
    @jamesqf, for the ones that do a some relatively trivial lookup, a dedicated system call might well be faster just because it would only be one system call. Reading a file from /proc or /sys requires three: open(), read() and close() (or two, if you're sloppy and don't close)
    – ilkkachu
    Commented May 2, 2021 at 20:23
  • 5
    @JörgWMittag do not attempt to read /proc/$SPOON—that would be impossible. Instead try to realize the truth: there is no /proc/$SPOON
    – jez
    Commented May 2, 2021 at 22:34
29

I think you're taking the whole "everything is a file" saying just a bit too literally. What that really means is that "everything can be accessed as though it were a file".

Despite what some of the other answers may say, in point of fact, nothing in /proc actually exists as a file on disk. The /proc filesystem doesn't take up any disk space at all, and so no I/O is spent maintaining it. This can be confirmed by unmounting the /proc filesystem, and looking at the disk usage numbers.*

Rather, its contents are generated dynamically when a program tries to access it. For instance, when you type ls /proc, what actually happens is that the kernel gets the current list of process IDs from the process table and sends them to the ls program as though they were regular directory names. The same thing happens with the contents of each "directory", plus all the other "directories" and "files" in /proc.

*On a normal system, the /proc filesystem is probably going to be in use, which will keep you from unmounting it. So if you actually want to perform this test, you might need to boot to single-user mode, and even then there's no guarantee of success.

6
  • 22
    It's more like, a file in Unix is an abstraction, and disk files are just one realization of that abstraction.
    – chepner
    Commented May 2, 2021 at 2:34
  • 1
    This can be confirmed by unmounting the /proc filesystem, and looking at the disk usage numbers. - what would that tell you? It's a separate mounted filesystem anyway, so it's already not included in the usage info for / in df -h output. The fact that it has no disk mountpoint at all is more revealing. mount | grep proc will show something like proc on /proc type proc (...), as opposed to /dev/nvme0n1p2 on / type btrfs. It's not part of /, and you didn't have to make space for it when partitioning your disk. And while /bin/true;do :;done doesn't cause constant I/O. Commented May 3, 2021 at 6:10
  • 1
    @PeterCordes The OP assumed that the files in /proc were real disk files, in which case, yeah, they would affect the disk usage numbers somewhere. It's also true that the /proc filesystem doesn't have an associated partition, but I figured the lack of change in disk used would be an easier concept to grasp. Commented May 3, 2021 at 14:05
  • 1
    Right, I understand what you're trying to show. But that teaches a wrong idea about disk usage: that unmounting one FS could change reported disk usage on another. That's never how it works. Unless you were trying to disprove the idea that a procfs mount on /proc would work by creating real disk files on the underlying FS. I guess that's a possible misconception, too, which I hadn't considered, and your experiment is a way for someone to rule out that misunderstanding. Still, I think it's important to point out that no mounts ever work that way. Commented May 3, 2021 at 14:21
  • FWIW du -cs /proc returns zero bytes total size.
    – moooeeeep
    Commented May 4, 2021 at 9:26
13

Another way to think about /proc files is that while 'everything is a file', not every file is bytes stored on a disk.

For /proc files, reads are satisfied by the kernal dynamically generating the necessary bytes, based on the status of running processes, rather than by retrieving bytes from disk storage.

Likewise, writes on /proc files (where permitted) are interacting with the kernel, not with bytes on a disk.

3
  • interacting with processes - more usually interacting with the kernel, e.g. echo 6 | sudo tee /proc/sys/vm/swappiness, or more normally in a boot script via sysctl(8). Are there any writable files under /proc/<PID>/? (/proc/<PID>/fd/... are writable, but are actually symlinks to the real files, not a way to interact with the process.) Commented May 3, 2021 at 6:13
  • "interacting with processes" >> "not a way to interact with the process " Fair call. There are some writable files under /proc/<pid>, but I think they're all(?) to do with process specific values held by kernel. You might say that, for some of them, you're acting on a process, but that's different to saying you're interacting with a process. I'll update... Commented May 3, 2021 at 11:45
  • 1
    Oh you're right, there are some writeable files. I should have checked. Like /proc/<PID>/coredump_filter, or oom_score_adj, or even comm (argv[0]). You're interacting with kernel attributes of that process, so that's fair. Commented May 3, 2021 at 12:00
8

Doing much I/O can be a bottle-neck, especially if the underlying device (e.g. hard-disk) is slow and/or busy.

However most parts of proc do not actually refer to a physical device. Moreover, proc contains a representation of all the processes in the system. Some of the files refer to data, but most of that resides in RAM – which is blazing fast to access.

Indeed, accessing files can be slow. However just maintaining the directory tree without actually accessing the files and directories does not cost that much performance.

Also keep in mind that processes only rarely access their own or other's processes' representation in proc. A process can acceess most information directly without ever accessing proc manually.

Example: A process may acquire memory on heap via malloc. The kernel knows the process has allocated memory and represents this information via /proc/…/maps. The process will use the memory directly by accessing the pointer, and not by doing I/O on /proc/…/mem.

8
  • Which parts of proc refer to a physical device? Commented May 1, 2021 at 10:07
  • @StephenKitt I know file handles exist in /proc/…/fd (which in turn can refer to files on physical devices). I am not perfectly sure if there is absolutely no connection between /proc and physical devices.
    – Hermann
    Commented May 1, 2021 at 20:36
  • 6
    @Hermann The contents of /proc/.../fd are symlinks pointing to the actual paths being accessed, not direct references to the open file descriptors, so they’re still not physical devices, and there is no actual I/O going on through /proc when accessing them. Commented May 1, 2021 at 20:49
  • 2
    @AustinHemmelgarn That's not entirely true. They do look like symlinks, but they actually work partially like hardlinks, in that they refer to a particular inode, not a path. cat /proc/PID/fd/X will output the contents of the exact file that PID keeps open, even if it got deleted or replaced by something different in the meantime.
    – TooTea
    Commented May 3, 2021 at 7:22
  • 1
    @AustinHemmelgarn You say they're symlinks, but you can actually open them even if the symlink target is inaccessible or nonexistent. For example, I believe you can open a copy of a socket through /proc/.../fd, even though the link says it points to something like "socket:25474624342" which doesn't exist. Commented May 3, 2021 at 12:33
7

The term “file” is heavily overloaded here. It might mean a file stored on an actual disk, but in this case it means anything accessed using the file API: open, read, write and close.

These four functions are a very generic API, and Unix always shoehorned various other things into this API. Character devices, block devices and named pipes are all accessed through the same four basic functions, but the thing being read and written is not file on a disk.

Traditionally the device files did have entries on the disk to keep the path lookup simple, but that would be inefficient for proc as sys filesystems, so they have custom lookup too and don't write anything to the disk at all. Nor does, for that matter, a tmp filesystem, which simply keeps the data in the cache (and possibly swaps them out to the generic swap).

So when you are not accessing them, there is no overhead at all. When you do, all it takes is allocating the dentry, inode and file structures in kernel (to tie the file descriptor to) and formatting the information from the internal kernel structures. It is a bit slower than a dedicated API would be, but it avoids having to add more entry-points to the kernel and allows utilizing existing utilities for processing the information.

3
  • Not everything can be shoehorned into open/read/write/close. That's why we have the ioctl(2) call. Commented May 3, 2021 at 16:17
  • @roaima, I didn't say everything. And you still need a filehandle for ioctl.
    – Jan Hudec
    Commented May 3, 2021 at 16:26
  • Fundamentally I like your answer (+1). I was being picky about your o/r/w/c paradigm Commented May 3, 2021 at 17:16
3

There is some unnecessary overhead in doing the open and close system calls. This is one reason for the proposed introduction of the readfile system call.

The overhead of using a filesystem with ASCII names for /proc and /sys is completely worth it when you consider the alternatives. In the very bad old days programs had to be SUID root so that they could read kernel memory and parse the binary process structures. That was horribly buggy if userspace and kernel got out of sync. Also incredibly buggy once systems had multiple CPUs and the kernel process structures could change while being read without locking.

I believe that some of the BSDs decided to use a form of ioctl or netlink access which gives them binary access to the process data without reading the kernel directly. That is more efficient than procfs but makes it very difficult to use from shell and scripting languages.

2
  • 1
    Whether the data is encoded as text or binary is independent of whether it is sent over file API (proc/sys) or socket API (netlink). Text nicely abstracts away the differences between versions, platforms and configurations and is easier to use from scripts.
    – Jan Hudec
    Commented May 3, 2021 at 6:16
  • I must add that parsing text (it is not uncommon to condense a lot of data in a single /proc file) cannot be as efficient as a syscall that access some indexed struct in kernel. Commented Aug 20, 2023 at 15:56

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .