12

In a related question I asked about the benefit of a dual-CPU system in terms of doubling the L3 cache.

However, I have noticed that the Xeon E5-2600 series of CPU's has exactly 2.5 MB of L3 cache per core.

This leads me to believe that the operating system reserves 2.5 MB of L3 cache per core. However, I also have the contradictory impression that the L3 cache is shared among all cores. There is surprisingly little information or discussion about this.

My major concern is whether low-priority background applications might "hog" the L3 cache and slow down performance for higher-priority foreground applications. Two specific performance problems that I have motivate this question.

  1. Compiling a certain C++ program requires 25 minutes on my current development system in VS 2008, whereas on another system it goes vastly faster, requiring only 5 minutes on VS 2008 with identical settings - despite the fact that I have a near high-end i7-970 CPU and sufficient RAM.

  2. Programs often take up to 20 seconds to run (i.e., display their main window) on my system; and on a related noted, the Windows shell requires up to 10 seconds to display the Windows Explorer context menu (and related behaviors also take about as long), despite my attempts to limit the context menu entries (there are currently perhaps 10 additional ones beyond the default).

My system is certainly loaded with a very large number of applications that I have installed (and uninstalled) over the years, but I do my best to streamline the system nonetheless.

I also have many low-priority background applications running; in particular redundant cloud backup software such as CrashPlan, which typically add up to utilize about 25% of the total CPU utilization on this 6-core 12-thread system.

I will be getting a new computer. I know that I will continue to be running many background applications, and installing/uninstalling many programs. If I thought that getting a dual-CPU system that doubles not only the cores but the L3 cache would assist with overcoming the horrible C++ compiler performance and the general system slow-down, I would gladly do it.

There should be no reason why a high-end system operates so slowly, even with many programs and background applications. But if my problems will occur no matter how much CPU power and L3 cache I give the system, simply because I do have so many programs and background applications installed and running, I don't want to waste $2,500 additional dollars on a dual-CPU system that won't help solve my problem.

Any suggestions, in particular regarding my question about whether the L3 cache is shared among all cores (such that low-priority background applications might conceivably be hogging the L3 cache, slowing down higher-priority programs), or rather if it is tied to individual cores, would be appreciated.

4
  • Good question that I don't personally have a good answer for except to say that I was also under the impression L3 was shared. I would just ask why on earth you're calling these '2nd generation' Xeons when 'Xeon' has been an Intel product for a decade now. (If this is by analogy to Sandy Bridge i3/5/7 chips being '2nd generation' then it's a bad analogy)
    – Shinrai
    Commented Apr 16, 2012 at 22:02
  • Intel refers to the i7-2600 line of CPU's as "2nd-generation" (ark.intel.com/products/family/59136/…). By "2nd-generation Xeon" I mean the equivalent release of the Xeon Sandy-Bridge E architecture CPU's on March 6, 2012 (en.wikipedia.org/wiki/…). Commented Apr 16, 2012 at 22:10
  • 1
    That's the analogy I thought you were making. It's a bad one (those are 2nd gen i7s but these are not 2nd gen Xeons), and I'd change the title IMO...I was expecting to find a question about 12 year old processors and that might keep a lot of people from clicking into here. Maybe change '2nd generation' to 'Sandy Bridge-E'.
    – Shinrai
    Commented Apr 16, 2012 at 22:13

2 Answers 2

16

On these CPUs, each physical core has its own L2 cache. The L3 cache is shared by all cores and is inclusive -- that is, any data that resides in any core's L2 cache also resides on the L3 cache.

While this may seem a waste of L3 space, it actually makes the L3 invaluable for accelerating inter-core memory operations. The primary purpose of the L3 cache is to act as a switchboard and staging area for the cores. For example, if one core wants to know if a region of memory might be cached by another core, it can check the L3 cache. If information was processed by one core and next needs to be processed by another core, they hand it off through the L3 cache rather than the slower off-chip memory. Beyond that, its performance impact is not that much except for unusual algorithms -- the L2 cache is big enough for small things and the L3 cache is too small for big things.

So while each core does have its own 256KB L2 cache and effectively 256KB reserved in the L3 cache, the balance is shared by all cores. Less important activity in other cores can harm the performance of a more important task that benefits from using L3 space. But for the reasons I mentioned, it's generally not a significant effect in practice and it's generally not worth worrying about beyond optimizing "bulk data" operations (such as compression and scanning) to minimize cache pollution. (For example, using non-temporal operations.)

0

It's my understanding that all levels of cache are implemented directly on the chip and that L2 and L3 are one in the same (that only Intel recognizes the difference, AMD combines them.). With this in mind I would imagine that the L3 cache on CPUs are not shared between the CPUs on a dual socketed motherboard. This also makes sense keeping in mind that it is typical to see separate memory channels to RAM per CPU.

Someone correct me if I am wrong.

1
  • 1
    L2 and L3 are not at all the same thing. On recent Intel designs, L1/L2 are per-core and small (32k L1 I$ & D$ / 256k unified L2), while L3 is inclusive and shared by the GPU and all cores. L1/L2 are physically separate, but kind of serve similar purposes (i.e. making memory access fast for a single core). The inclusive L3 has another purpose: coherency between cores (and the GPU). See @DavidSchwartz's answer. Commented Jul 11, 2015 at 4:17

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .