3

I have been the dev/maintainer of an open source IRC bot since the late 90s. The goal was always to make it as versatile & useful as possible in a small memory footprint.

During the 2000s I also wrote some proof of concept code to squeeze useful programs down to just 4kB RSS, which wasn't too hard to do on the 2.4 kernel. I made it happen with both init & agetty; that is, I made them run resident performing their duties inside a single 4kB page of memory.

Now, color me surprised when one day I ask my bot to report on its memory usage and it responds with this:

[Mar 27 2018] <bot> VM 1000 kB (Max 2988 kB), RSS 4 kB [ Code 212 kB, Data 68 kB, Libs 556 kB, Stack 132 kB ]

To get 4kB RSS on kernel 2.4 I had to map all code, rodata and stack segments to the same page. Since I'm not doing that with the bot, even the theoretical limit should be 12kB. But with later kernels, there seems to be some extra accelerator mappings so that even unmapping stack and rodata still leaves 12kB mapped.

The bot has been linked with libmusl so the "sane" standard RSS as its running was 54kB. I did create an ld script to reorder functions into blocks of rarely used to core essential, but still, 4kB isn't reasonable even in theory. The system is a Xeon with plenty of physical memory, no swap and no system load so there was no pressure to swap pages out.

Any idea what happened here? I'm still interested in the possibility to remap everything to a single 4kB page, although to date I have only gotten it down to 12kB reproducible and 8kB unreproducible.

The bot read the RSS from /proc and just reports what it reads unaltered. ps aux displayed the same VSZ & RSS as the bot reported.

1

1 Answer 1

1

This is not an answer. (Text just too complicated to edit as a comment)


I think that answering your question would first require to precise the exact meaning of the value reported as RSS in 4 series. What does it precisely represent?
Because something I hold for certain is that the calculus has (vastly) changed since 2.4.
The first change I can remember was in 2.6 times when Andrew Morton & Al. wanted to implement some RLIMIT to the RSS.
Of course, whatever limit to something is meaningless unless that something is precisely defined and this triggered series of discussions (that you could almost certainly find digging into the lkml) about what should be taken into account, in particular what was taken into account in previous versions :

  • the io mmaped device areas
  • Non-linear mappings
  • Hugetlb memory
  • Shared normal memory
  • SysV-IPC shared memory
  • Not shared normal memory

As I recall, everybody agreed the VM_IO part should no longer be taken into account and a lot of pressure was made in order to avoid taking into account the full size of the shared libraries.
Consensus was so hard to reach on other parts (Hugetlb & Shared normal memory) that they needed to be split into several subcategories.

Some sort of shaggy dog story that I have not followed accurately. I just wanted to point out that the meaning of the RSS in 4 is certainly not identical to the one it had in 2.4 times and consequently that the value reported is likely to differ by a significant amount for the very same executable.

Good Luck to you.

I'll of course delete this post as soon as some good answer get posted.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .