23
\$\begingroup\$

This article shows that DDR4 SDRAM has approximately 8x more bandwidth DDR1 SDRAM. But the time from setting the column address to when the data is available has only decreased by 10% (13.5ns). A quick search shows that the access time of the fastest async. SRAM (18 years old) is 7ns. Why has SDRAM access time decreased so slowly? Is the reason economic, technological, or fundamental?

\$\endgroup\$
4
  • 1
    \$\begingroup\$ Could another possible reason be that it simply isn’t that necessary? \$\endgroup\$ Commented Feb 20, 2019 at 6:08
  • \$\begingroup\$ For example low access time is necessary to make search a data in the memory more faster. \$\endgroup\$
    – Arseniy
    Commented Feb 20, 2019 at 7:16
  • \$\begingroup\$ I realize that, extra speed is always nice, but coming from a software developer perspective, perhaps compared to all other IO and architecture (including microservices that can literally run on different data centers), RAM speed just isn't that much of a bottleneck anymore. Sometimes 'good enough' is good, or at least doesn't warrant the extra R&D into speeding it up. I would consider adding that as a potential reason in your question too. \$\endgroup\$ Commented Feb 20, 2019 at 7:21
  • 1
    \$\begingroup\$ According to Wikipedia DDR3-2200 has a First Word latency of 6.36 ns, that is how long it takes a signal to propagate around 3ft on FR4, I would say we are pretty close to the physical limits \$\endgroup\$
    – Mark Omo
    Commented Feb 20, 2019 at 21:23

3 Answers 3

34
\$\begingroup\$

It's because it's easier and cheaper to increase the bandwidth of the DRAM than to decrease the latency. To get the data from an open row of ram, a non trivial amount of work is necessary.

The column address needs to be decoded, the muxes selecting which lines to access need to be driven, and the data needs to move across the chip to the output buffers. This takes a little bit of time, especially given that the SDRAM chips are manufactured on a process tailored to high ram densities and not high logic speeds. To increase the bandwidth say by using DDR(1,2,3 or 4), most of the logic can be either widened or pipelined, and can operate at the same speed as in the previous generation. The only thing that needs to be faster is the I/O driver for the DDR pins.

By contrast, to decrease the latency the entire operation needs to be sped up, which is much harder. Most likely, parts of the ram would need to be made on a process similar to that for high speed CPUs, increasing the cost substantially (the high speed process is more expensive, plus each chip needs to go through 2 different processes).

If you compare CPU caches with RAM and hard disk/SSD, there's an inverse relationship between storage being large, and storage being fast. An L1$ is very fast, but can only hold between 32 and 256kB of data. The reason it is so fast is because it is small:

  • It can be placed very close to the CPU using it, meaning data has to travel a shorter distance to get to it
  • The wires on it can be made shorter, again meaning it takes less time for data to travel across it
  • It doesn't take up much area or many transistors, so making it on a speed optimized process and using a lot of power per bit stored isn't that expensive

As you move up the hierarchy each storage option gets larger in capacity, but also larger in area and farther away from the device using it, meaning the device must get slower.

\$\endgroup\$
4
  • 22
    \$\begingroup\$ Great answer. I just want to emphasise the physical distance factor: at maybe 10cm for the furthest RAM stick, 1/3 to 1/2 of the speed of light as the signal speed, plus some extra length to route & match the PCB tracks, you could easily be at 2ns round trip time. If ~15% of your delay is caused by the unbreakable universal speed limit... you're doing real good in my opinion. \$\endgroup\$
    – mbrig
    Commented Feb 19, 2019 at 20:22
  • 1
    \$\begingroup\$ L1 is also organized uniquely, is directly in the core that uses it, and uses SRAM. \$\endgroup\$
    – forest
    Commented Feb 20, 2019 at 4:20
  • \$\begingroup\$ @forest And also has a pretty strict size limit - make it too large, and there's no way to keep it so fast. \$\endgroup\$
    – Luaan
    Commented Feb 20, 2019 at 9:12
  • \$\begingroup\$ L1d cache can also be heavily optimized for latency, e.g. fetching tags and data in parallel for all ways in set. So when a tag match just muxes the data to the output, instead of needing to fetch it from SRAM. This can also happen in parallel with the TLB lookup on the high bits of the address, if the index bits all come from the offset-within-page part of an address. (So that's one hard limit on size, like @Luaan mentioned: size / associativity <= page-size for this VIPT = PIPT speed hack to work. See VIPT Cache: Connection between TLB & Cache?) \$\endgroup\$ Commented Feb 20, 2019 at 19:18
6
\$\begingroup\$

C_Elegans provides one part of the answer — it is hard to decrease the overall latency of a memory cycle.

The other part of the answer is that in modern hierarchical memory systems (multiple levels of caching), memory bandwidth has a much stronger influence on overall system performance than memory latency, and so that's where all of the latest development efforts have been focused.

This is true in both general computing, where many processes/threads are running in parallel, as well as embedded systems. For example, in the HD video work that I do, I don't care about latencies on the order of milliseconds, but I do need multiple gigabytes/second of bandwidth.

\$\endgroup\$
4
  • \$\begingroup\$ And it should definitely be mentioned that software can be designed for the "high" latency pretty easily in most cases, compared to the difficulty and cost of decreasing the latency. Both CPUs and their software are very good at eliminating the effective latency in most cases. In the end, you don't hit the latency limit as often as you might think, unless you have no idea about how the memory architecture and CPU caching/pre-fetching etc. works. The simple approach usually works well enough for most software, especially single-threaded. \$\endgroup\$
    – Luaan
    Commented Feb 20, 2019 at 9:15
  • \$\begingroup\$ On modern Intel CPUs, memory latency is the limiting factor for single-core bandwidth: bandwidth can't exceed max_concurrency / latency, and a single core has limited capacity for off-core requests in flight at once. A many-core Xeon (with higher uncore latency from more hops on the ring bus) has worse single-core bandwidth than a quad-core desktop chip, despite have more DRAM controllers. Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?. It takes many more threads to saturate memory B/W on a many-core Xeon. \$\endgroup\$ Commented Feb 20, 2019 at 19:54
  • \$\begingroup\$ Overall your main point is correct: most accesses hit in cache for low latency to avoid stalling the out-of-order back-end. HW prefetch mostly just needs bandwidth to keep up with sequential accesses and have data ready in cache before the core needs it. DRAM latency is hundreds of core clock cycles, so efficient software has to be tuned to use access patterns that don't cache misses by defeating both spatial/temporal locality and HW prefetching. Especially for loads, because store buffers can decouple store latency from the rest of the out-of-order backend. \$\endgroup\$ Commented Feb 20, 2019 at 20:02
  • \$\begingroup\$ For disk I/O, latencies of milliseconds would matter if we didn't have readahead prefetch to hide it for sequential accesses. But the higher the latency, the harder it is to hide. (The better your prefetch algorithms need to be, and the more predictable your access patterns need to be.) And the more requests / data bytes you need to keep in-flight to get the bandwidth you want. \$\endgroup\$ Commented Feb 20, 2019 at 20:06
2
\$\begingroup\$

I don't have that much insights, but I expect it is a bit of all.

Economic

For the majority of computers/telephones, the speed is more than enough. For faster data storages, SSD has been developed. People can use video/music and other speed intensive tasks in (almost) real time. So there is not so much need for more speed (except for specific applications like weather prediction etc).

Another reason is to process a very high RAM speed, CPUs are needed which are fast. And this comes with a lot of power usage. Since the tendency of using them in battery devices (like mobile phones), prevents the use of very fast RAM (and CPUs), thus makes it also not economically useful to make them.

Technical

By the decreasing size of chips/ICs (nm level now), the speed goes up, but not significantly. It is more often used for increasing the amount of RAM, which is needed harder (also a economic reason).

Fundamental

As an example (both are circuits): the easiest way to get more speed (used by SSD), is to just spread the load over multiple components, this way the 'processing' speeds adds up too. Compare using 8 USB sticks reading from at the same time and combining the results, instead of reading data from 1 USB stick after each other (takes 8 times as long).

\$\endgroup\$
11
  • 1
    \$\begingroup\$ What exactly do SSDs have to do with SDRAM latency? \$\endgroup\$
    – C_Elegans
    Commented Feb 19, 2019 at 17:01
  • \$\begingroup\$ @C_Elegans they are both circuits, for this 'generic' question I don't think there is so much difference. \$\endgroup\$ Commented Feb 19, 2019 at 17:04
  • 2
    \$\begingroup\$ The amount of time to open a page hasn't really decreased that much due to the precharge cycle; the amount of energy required is not significantly different today than it was a decade ago. That dominates the access time in my experience. \$\endgroup\$ Commented Feb 19, 2019 at 17:08
  • 6
    \$\begingroup\$ @MichelKeijzers While they are both circuits, SSDs and SDRAM serve very different use cases, and make use of different techniques for storing data. Additionally, saying that CPUs don't really need faster RAM doesn't make much sense, the entire reason most modern CPUs have 3 levels of caches is because their ram can't be made fast enough to serve the CPU. \$\endgroup\$
    – C_Elegans
    Commented Feb 19, 2019 at 17:17
  • 1
    \$\begingroup\$ You said for bigger storage there are SSDs. Did you mean faster? It's more expensive to get the same amount of storage in an ssd than an hdd. The main selling point of SSDs is the speed, and perhaps the noise and reliability. For capacity, HDDs are still better \$\endgroup\$
    – user198712
    Commented Feb 20, 2019 at 7:09

Not the answer you're looking for? Browse other questions tagged or ask your own question.