5

Firstly, this is true, right? I feel that reads will always be faster than writes, also this guy here does some experiments to "prove" it. He doesn't explain why, just mentions "caching issues". (and his experiments don't seem to worry about prefetching)

But I don't understand why. If it matters, let's assume we're talking about the Nehalem architecture (like i7) which has L1, L2 cache for each core and then a shared inclusive L3 cache.

Probably this is because I don't correctly understand how reads and writes work, so I'll write my understanding. Please tell me if something is wrong.

If I read some memory, following steps should happen: (assume all cache misses)

    1. Check if already in L1 cache, miss
    2. Check if in L2 cache, miss
    3. Check if in L3 cache, miss
    4. Fetch from memory into (L1?) cache

Not sure about last step. Does data percolate down caches, meaning that in case of cache miss memory is read into L3/L2/L1 first and then read from there? Or can it "bypass" all caches and then caching happens in parallel for later. (reading = access all caches + fetch from RAM to cache + read from cache?)

Then write:

    1. All caches have to be checked (read) in this case too
    2. If there's a hit, write there and since Nehalem has write through caches, 
write to memory immediately and in parallel
    3. If all caches miss, write to memory directly?

Again not sure about last step. Can write be done "bypassing" all caches or writing involves always reading into the cache first, modifying the cached copy and letting the write-through hardware actually write to memory location in RAM? (writing = read all caches + fetch from RAM to cache + write to cache, written to RAM in parallel ==> writing is almost a superset of reading?)

10
  • 2
    Please don't cross-post between SE sites. Either flag a mod to request and/or wait for a mod to migrate your other question here. If you want it here and not there, since you've already posted both places, please consider going and deleting it from SO. Commented Nov 12, 2013 at 20:02
  • 1
    Reading something is passive, writing (changing) something is active. Activity is almost always harder than passivity. ;) Commented Nov 12, 2013 at 20:17
  • @user2898278 - Do you have any possible sources more reliable then a random blog?
    – Ramhound
    Commented Nov 12, 2013 at 20:23
  • You have got something elementary wrong here. Every bit of data is addressed...there's no trickling down cache levels looking for data as if you were guessing.
    – M.Bennett
    Commented Nov 12, 2013 at 21:15

2 Answers 2

5

Memory must store its bits in two states which have a large energy barrier between them, or else the smallest influence would change the bit. But when writing to that memory, we must actively overcome that energy barrier.

Overcoming the energy barrier in RAM requires waiting while energy is moved around. Simply looking to see what the bit is set to takes less time.

For more detail, see MSalters excellent answer to a somewhat similar question.

I'm not certain enough of the details of how caching interacts with RAM to answer that part of the question with any authority, so I'll leave it to someone else.

1
  • Thank you for this. I now better understand why "pure" writes would be slower than pure reads. But how much difference do the electronic factors make? I mean would the difference purely due to electronic factors be around 1.5 times between read and write bandwidth? Any ideas? (by pure, I mean excluding caches) Commented Nov 12, 2013 at 22:16
3

Write Case: If you have something to write to memory and you have a good memory controller, ignoring all caching, all you have to do is send a transaction to the memory controller with the data you want written. Because of memory ordering rules, as soon as the transaction leaves the core, you can move on to the next instruction because you can assume the hardware is taking care of the write to memory. This means a write takes virtually no time at all.

Read Case: On the other hand, a read is an entirely different operation and is greatly assisted by caching. If you need to read in data, you can't go on to your next step in your program until you've actually got the data in hand. That means you need to check caches first and then memory to see where the data is. Depending on where the data is at, your latency will suffer accordingly. In a non-threading, non-pipelined core, non-prefetching system, you're just burning core cycles waiting on data to come back so you can move on to the next step. Cache and memory is orders of magnitude slower than core speed/register space. This is why reading is so much slower than a write.

Going back to the write transaction, the only issue you may run into with speed is if you're doing reads after a write transaction to the same address. In that case, your architecture needs to ensure that your read doesn't hop over your write. If it does, you'll get the wrong data back. If you have a really smart architecture, as that write is propagating out towards memory, if a read to the same address comes along, the hardware can return the data way before it ever gets out to memory. Even in this case of read-after-write, it's not the write that takes a while from the core's perspective, it's the read.

From a RAM perspective: Even if we're not talking about a core and we're only talking about RAM/memory controller, doing a write to the MC will result in the MC storing it in a buffer and sending a response back stating that the transaction is complete (even though it's not). Using buffers, we don't have to worry about actual DIMM/RAM write speeds because the MC will take care of that. The only exception to this case is when you're doing large blocks of writes and go beyond the capabilities of the MC buffer. In that case, you do have to start worrying about RAM write speed. And that's what the linked article is referring to. Then you have to start worrying about the physical limitations of reading vs writing speeds that David's answer touches on. Usually that's a dumb thing for a core to do anyway; that's why DMA was invented. But that's a whole other topic.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .