How does an LRU cache work?

Question

I am trying to implement a small RISC CPU (without branch prediction and register renaming.) I want to give it a fully associative cache (I and D caches.)

Everything is going well, but now I must understand how to decide which entry to discard. I use a CAM to store tags (page number) and find cache entries. Upon miss I can handle the miss via a MMU unit etc.

I already know how to do direct mapped cache.

How can I implement an LRU policy on the cache?

The best solution I found would need N (=number of cache lines) clock cycles to compare each line and find the least recently used entry to be evicted. That’s very slow.

Time to evict the page to external RAM + time to choose the page = too many clock cycles.

Is there any way to decide which page is the least recently used one in one clock cycle?

Grabul · Accepted Answer · 2015-08-02 01:58:55Z

Real fully associative instruction and/or data caches are rare, sometimes hidden as loop buffer, combining write buffers...

What is common is having 2 to 16 ways set-associative caches. Up to 4 ways, true LRU is possible, beyond that, pseudo-LRU is usually used.

There is little benefit to go beyond 4-8 ways for a L1/L2 CPU cache. More ways may be needed in L2-L3 caches shared between several CPUs.

For 4 ways true LRU, there are 24 possible combinations : 1-2-3-4, 1-2-4-3, 1-3-2-4... etc. It can be stored as a 5 bits code which directly indicates the least recently used way.

harry courtice · Accepted Answer · 2015-08-02 01:38:04Z

0

I think you need a "heap" data structure. This will give you o(log n) or o(1) performance for inserting a new page reference (depending on the implementation) and will give you o(log n) time to delete the oldest page. The time to find the oldest page is o(1) (it's always at the top of the heap).

https://en.wikipedia.org/wiki/Heap_(data_structure)

answered Aug 2, 2015 at 1:38

harry courtice

5962 silver badges8 bronze badges

Add a comment |

Dave Tweed · Accepted Answer · 2015-08-02 14:10:19Z

One direct way to do this is to keep the cache lines in a doubly-linked circular list. This would allow both moving a line to the head of the list (most recently used) and finding the tail of the list (least recently used) to be O(1) operations.

Unfortunately, moving a line to the head of the list, assuming the links are kept in block memories of some sort, requires a minimum of about 4 clock cycles (one read and three writes), which could slow down your cache accesses unless you figure out a way to decouple this activity from the activity of the CPU.

Stack Exchange Network

How does an LRU cache work?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
microprocessor
or ask your own question.

Hot Network Questions

How does an LRU cache work?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged microprocessor or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
microprocessor
or ask your own question.