1
\$\begingroup\$

I am trying to implement a small RISC CPU (without branch prediction and register renaming.) I want to give it a fully associative cache (I and D caches.)

Everything is going well, but now I must understand how to decide which entry to discard. I use a CAM to store tags (page number) and find cache entries. Upon miss I can handle the miss via a MMU unit etc.

I already know how to do direct mapped cache.

How can I implement an LRU policy on the cache?

The best solution I found would need N (=number of cache lines) clock cycles to compare each line and find the least recently used entry to be evicted. That’s very slow.

Time to evict the page to external RAM + time to choose the page = too many clock cycles.

Is there any way to decide which page is the least recently used one in one clock cycle?

\$\endgroup\$

3 Answers 3

2
\$\begingroup\$

Real fully associative instruction and/or data caches are rare, sometimes hidden as loop buffer, combining write buffers...

What is common is having 2 to 16 ways set-associative caches. Up to 4 ways, true LRU is possible, beyond that, pseudo-LRU is usually used.

There is little benefit to go beyond 4-8 ways for a L1/L2 CPU cache. More ways may be needed in L2-L3 caches shared between several CPUs.

For 4 ways true LRU, there are 24 possible combinations : 1-2-3-4, 1-2-4-3, 1-3-2-4... etc. It can be stored as a 5 bits code which directly indicates the least recently used way.

\$\endgroup\$
0
\$\begingroup\$

I think you need a "heap" data structure. This will give you o(log n) or o(1) performance for inserting a new page reference (depending on the implementation) and will give you o(log n) time to delete the oldest page. The time to find the oldest page is o(1) (it's always at the top of the heap).

https://en.wikipedia.org/wiki/Heap_(data_structure)

\$\endgroup\$
0
\$\begingroup\$

One direct way to do this is to keep the cache lines in a doubly-linked circular list. This would allow both moving a line to the head of the list (most recently used) and finding the tail of the list (least recently used) to be O(1) operations.

Unfortunately, moving a line to the head of the list, assuming the links are kept in block memories of some sort, requires a minimum of about 4 clock cycles (one read and three writes), which could slow down your cache accesses unless you figure out a way to decouple this activity from the activity of the CPU.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.