11

First of all, yes, I have read LinuxAteMyRAM, which doesn't explain my situation.

# free -tm
             total       used       free     shared    buffers     cached
Mem:         48149      43948       4200          0          4         75
-/+ buffers/cache:      43868       4280
Swap:        38287          0      38287
Total:       86436      43948      42488
#

As shown above, the -/+ buffers/cache: line shows indicates the used memory rate is very high. However, from output of top, I don't see any process used more than 100 MB of memory.

So, what used the memory?

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28078 root      18   0  327m  92m  10m S    0  0.2   0:25.06 java
31416 root      16   0  250m  28m  20m S    0  0.1  25:54.59 ResourceMonitor
21598 root     -98   0 26552  25m 8316 S    0  0.1  80:49.54 had
24580 root      16   0 24152  10m  760 S    0  0.0   1:25.87 rsyncd
 4956 root      16   0 62588  10m 3132 S    0  0.0  12:36.54 vxconfigd
26703 root      16   0  139m 7120 2900 S    1  0.0   4359:39 hrmonitor
21873 root      15   0 18764 4684 2152 S    0  0.0  30:07.56 MountAgent
21883 root      15   0 13736 4280 2172 S    0  0.0  25:25.09 SybaseAgent
21878 root      15   0 18548 4172 2000 S    0  0.0  52:33.46 NICAgent
21887 root      15   0 12660 4056 2168 S    0  0.0  25:07.80 SybaseBkAgent
17798 root      25   0 10652 4048 1160 S    0  0.0   0:00.04 vxconfigbackupd

This is an x86_64 machine (not a common-brand server) running x84_64 Linux, not a container in a virtual machine. Kernel (uname -a):

Linux 2.6.16.60-0.99.1-smp #1 SMP Fri Oct 12 14:24:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Content of /proc/meminfo:

MemTotal:     49304856 kB
MemFree:       4066708 kB
Buffers:         35688 kB
Cached:         132588 kB
SwapCached:          0 kB
Active:       26536644 kB
Inactive:     17296272 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     49304856 kB
LowFree:       4066708 kB
SwapTotal:    39206624 kB
SwapFree:     39206528 kB
Dirty:             200 kB
Writeback:           0 kB
AnonPages:      249592 kB
Mapped:          52712 kB
Slab:          1049464 kB
CommitLimit:  63859052 kB
Committed_AS:   659384 kB
PageTables:       3412 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    478420 kB
VmallocChunk: 34359259695 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

df reports no large consumption of memory from tmpfs filesystems.

8
  • 2
    What's the output of ps -eo pid,user,args,pmem --sort pmem?
    – Braiam
    Commented Jun 12, 2014 at 16:58
  • Pasted here link, tried a few time, got the same output.
    – Jason
    Commented Jun 14, 2014 at 12:51
  • 3
    Do not use head! I want the complete output of the complete command. If I wanted you to use head I would put it in my command. Please, always provide the complete output to the command people asks.
    – Braiam
    Commented Jun 14, 2014 at 13:05
  • 3
    On a phone, don't remember the syntax off the top of my head, but check for sysv shared memory. Command is ipcs, I think.
    – derobert
    Commented Jun 15, 2014 at 4:53
  • 5
    Did you ever find a solution for this? - I am having a similar issue here: superuser.com/questions/793192/…
    – Hackeron
    Commented Aug 23, 2015 at 13:49

4 Answers 4

4

Memory on Linux can be a strange beast to diagnose and understand.

Under normal operation most, if not all, your memory will be allocated to one task or another. Some will be allocated to the currently running foreground processes. Some will be storing data cached from disk. Some will be holding data associated with processes that aren't actively executing at that one specific moment in time.

A process in Linux has its own virtual address space (VIRT in the output of top). This contains all the data associated with the process and can be considered how "big" the process is. However it is rare for all that memory to be actively part of the "real" memory map (RES in the output of top). The RES, or resident memory, is the data which is directly accessible in RAM at the point in time. Then there is also shared memory (SHR) on top of that. That can be shared between multiple instances of the same process. So the memory in use by a process is at any one point in time RES plus SHR, but if there is more than one instance of the process using the shared memory the usage is RES plus RES plus RES ... plus SHR.

So why the difference between RES and VIRT? Surely if a process has a block of allocated memory it's allocated memory, isn't it? No. Memory is allocated in pages, and pages can be either Active or Inactive. Active ones are what are in RES. Inactive are "the rest". They can be pushed to one side as they aren't being accessed at the moment. That means they can be swapped out to disk if memory gets tight. But they don't just go straight to disk. First they sit in a cache. You don't want to be swapping all the time, so there's a buffer between the application and the swap space. Those buffers are constantly changing as the swapper selects a different process to execute and different pages become active and inactive. And all that's happening way to fast for a mere human to keep up with.

And the on top of all that there is the disk buffers. Not only does the inactive memory go to a cache, but when that cache gets swapped to disk it first goes to a disk buffer to be queued up for writing. So that's a second layer of cache in the mix. And those disk buffers are also used by other parts of the system for general IO buffering. So they are constantly changing too.

So what you are seeing in things like top and free etc are either instantaneous snapshots of the current state of the machine, or aggregated statistics over a period of time. By the time you have read the data it's out of date.

Any one process can access large amounts of the memory, but it's seldom sensible to do so. It can't be accessing all the memory at once anyway, so memory that it's not currently looking at gets moved to cache unless it's specifically flagged as being "locked in core".

So the amount of memory "used" by an application and the amount of memory it "has" are two completely different things. Much of an applications data space is actually in the cache, not in the "core" memory, but since the cache is in RAM most of the time it's instantly available and just needs "activating" to become "core" memory. That is unless it's been swapped out to disk, when it then needs unswapping (which might be fast if it's in the buffer).

Due to the high speed nature of the beast and the fact that the figures are always changing, the numbers may even change part way through calculating what they are, so it's never possible to exactly say "this is how much memory is in use" from a user's perspective. The meminfo is a snapshot in time provided by the kernel, but since it's the kernel that's executing then it's not necessarily showing the real state of any one processes memory usage, as no process is actively executing at that time - it's between processes.

Like I said, it's all very confusing.

But at the end of the day it really doesn't matter. What matters is not how much memory you have "free", but how much swap space you have used, and how often swap space is being accessed. It's swapping that slows a system down, not lack of memory (though lack of memory causes excess swapping). If you have lots of used memory, but you're not using any (or very little) swap space, then things are normal. Free memory in general isn't desirable, and is often purely transitional anyway, in that it was in use for one purpose, but hasn't yet been allocated for another - for instance it was cache memory, and it's been swapped to disk, but it hasn't yet been used for anything else, or it was disk buffers, the buffers have been flushed to disk, but no application has requested it for cache yet.

5
  • 6
    This is really interesting but doesn't answer the question of why the OP is observing this specific discrepancy.
    – terdon
    Commented Jun 13, 2014 at 13:31
  • I think the only real discrepancy lies between the OPs expectations and what Linux is providing. I.e., the values that Linux gives just don't add up, and that's because they have already changed.
    – Majenko
    Commented Jun 13, 2014 at 13:35
  • As the OP doesn't really seem to understand the question he is asking I don't see how a "right" answer can be chosen. We can explain how the system works till we're blue in the face, but if he fails to grasp those basics and to realize that his question is actually meaningless, we will never have a "right" answer.
    – Majenko
    Commented Jun 13, 2014 at 15:53
  • Appreciate for writing this but honestly I don't like the agnosticism tone behind it. I agree with the "snapshot" theory but if the snapshot keeps giving the same number which says the RAM usage is high while you can't find out how it happened, won't you be curious?
    – Jason
    Commented Jun 14, 2014 at 13:04
  • 5
    You should post this on your blog. It's good, but it isn't relevant here. There's something weird going on (and I mean weird coming from someone who understands what you wrote), since the processes' VIRT don't account for all RAM usage, and the system isn't swapping despite pressure to do so. Commented Jun 14, 2014 at 13:18
0

This is one part of the answer:

There is a difference between what is designated as "Used" memory (in the "free" command) and "Memory allocated to active (user) processes" (in /proc/meminfo). Ok, so your system has 48149 MB total (approx 47Gb)

If you look at your /proc/meminfo you see: Inactive: 17296272 kB = (approx 16.5 Gb) - Inactive memory could be from processes which have terminated. It can also be memory which has not been used for a long time by a process which is active. The memory is not "freed" just because the process terminated. Why? because its more work. The same page of memory might be used again, so the linux kernel just leaves the data there on the "inactive" list until a process needs it.

This page explains some of that. http://careers.directi.com/display/tu/Understanding+and+optimizing+Memory+utilization; Read the section on the PFRA(Page frame reclaiming algorithm) used by the Linux kernel: "Pages included in disk and memory caches not referenced by any process should be reclaimed before pages belonging to the User Mode address spaces of the processes" "Reclaiming" means moving them out of "used" (inactive + active) and into "free".

This explains Memory management in more detail: How active and inactive lists work, and how pages move between them https://www.cs.columbia.edu/~smb/classes/s06-4118/l19.pdf

I believe that there is also memory used by the kernel for data structures, and that this shows up as "slab 1049464 kb" (~ 1 GB) I believe, but am not positive that this is counted separately.

5
  • Just wanted to add that in the past I had experience with a system running out of memory due to a poorly written application that allocated shared memory segments, but did not release them. The shared memory segments persisted even when all processes using them died. This was not Linux, but it might be true in linux also. as mentioned above, see ipcs for info on this. see makelinux.net/alp/035 It says that you need to explicitly deallocated shared memory.
    – ssl
    Commented Jun 16, 2014 at 23:54
  • 1
    I don't understand everything that your answer is about, but “Inactive memory could be from processes which have terminated” is definitely wrong. Userland memory comes in two flavors: mapped or anonymous. Mapped memory can always be reclaimed because the data can be reloaded from a file. Anonymous memory can be reclaimed if it's swapped out. Inactive memory is memory that is a good candidate for reclaiming; however the content must be in a file or swap somewhere, because that memory is still in use. When a process dies, its memory becomes free, and is no longer accounted in active+inactive. Commented Jun 19, 2014 at 17:49
  • 1
    Some references: What can cause an increase in inactive memory and how to reclaim it? on Server Fault; old but still mostly applicable tips from Red Hat. And the article by Bhavin Turakhia that you cite, as well; it's not explicit on the matter, but it does explain about anonymous and mapped pages in the section “Understanding the PFRA”. Commented Jun 19, 2014 at 17:52
  • I got the thought on inactive pages not referenced by a process from this article: kernel.org/doc/gorman/html/understand/understand013.html Though I suppose it could be pages freed by a process which is still running. section "Reclaiming Pages from the LRU Lists"
    – ssl
    Commented Jun 25, 2014 at 20:21
  • But possibly that just refers to pages in the swap cache?
    – ssl
    Commented Jun 25, 2014 at 20:53
-2

Do you use NFS at all?
It might be worth running slabtop -o either way, the nfs_inode_cache can get out of hand.

0
-4

The figure you should be looking at is used swap, in your output that is "0" which means that you have NOT run out of RAM. As long as your system is not swapping memory, you should not worry about the other figures, which are very hard to interpret anyway.

Edit: Ok, it seems that my answer is being considered cryptic rather than concise. So let me elaborate.

I guess the main problem here is in interpreting the output of top/ps, which is not very accurate. E.g. as multiple uses of the same shared libraries are not calculated as you would expect, see e.g. http://virtualthreads.blogspot.ch/2006/02/understanding-memory-usage-on-linux.html

What is, however, dead accurate is that if the swap size is exactly zero, than your system did not run out of memory (yet). Of course, that's a very course statement, but for profiling your systems actual memory usage, top will not be the right thing. (And if you look at top, at least sort the output for virt or %mem.)

See also http://elinux.org/Runtime_Memory_Measurement

8
  • 1
    You shouldn't worry if your system is swapping either, that's normal. You should worry if your system is swapping too often (which isn't the same thing as having a large used swap space). The fact that the used swap is 0 is in itself weird, with so little free physical memory. Commented Jun 14, 2014 at 13:19
  • well, his output indicates that his system did no swapping at all. That's surely the optimal swapping rate. I did not say a small swap size is a good thing, but a zero size surely is. And as long as the system does not actually run out of free memory, why should it start swapping?
    – Echsecutor
    Commented Jun 15, 2014 at 19:07
  • No, the absence of swapping is far from optimal. The memory of programs that aren't getting used at the moment should be swapped to make room for disk cache for frequently-used files. As for the bit you just added about the output of free, I think you meant top — but even then the sum can only be more than the total (because shared memory is counted multiple times), not less. Commented Jun 15, 2014 at 19:22
  • what do you mean by the sum can only be more not less? top only shows as many processes as fit on the screen, I am pretty sure that the above is not all running processes, hence them being not sorted by memory usage, that piece of output is pretty useless for the 'what used the memory' question.
    – Echsecutor
    Commented Jun 15, 2014 at 19:29
  • oh, and I do not want to enter a debate for when the optimal time for starting to swap is, but the default of linux server is to not swap memory only because it's "not getting used at the moment".
    – Echsecutor
    Commented Jun 15, 2014 at 19:33

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .