Back in the times of magnetic core memory, the core drivers and receivers had to be adjusted for the core temperature, or the core had to be kept at a fairly steady operating temperature. This was because the memory operated close to the physical limits imposed by its design. As a counterexample, the 6116 static RAM doesn't operate anywhere near where nominal the die temperature would affect its operation so much that it had to measure it and adjust its operation. Alas, modern SDRAM is again back to where the magnetic core memory was: the performance of silicon is temperature dependent, and also dependent on the layout of the circuit board, and it operates with such small margins that various tuning ("training") procedures have to be performed during normal operation of the RAM to maintain proper operation. We've gone full circle: houskeeping the SDRAM is more like dealing with a very fast analog circuit :)
Generally, the standard DRAM interfaces are JEDEC standards - you can register a free user account on jedec.org, and use it to download those standards. So, let's talk about something concrete: LPDDR4 (low power double data rate 4) DRAM standard JESD209-4 (dated August 2014). Older DRAM gets obsolete rather quickly, so if you want to design-in some SDRAM, it's best to focus on the current generation of chips. Since the standard is readily accessible, I won't cover it in much detail, but just expose the "alien" nature of the SDRAM interfaces (when compared to RAM devices in common use in the 70s/80s).
Pinout
the data and address buses are parallel instead of serial.
The data and address busses are parallel indeed, but the interface is not something that you can directly attach to a microcontroller. First of all, the logic is HS_LLVCMOS (high speed low voltage CMOS), and the I/O ring (the silicon that drives the pins) runs at 1.1V nominal, 1.06V minimal, 1.17V maximal. So you definitely need logic level translators, since few low-speed MCUs can support such levels natively.
Furthermore, the "parallel" bus organization is designed to make sense for high-speed computer memory, and is not like that of a typical "slow" DRAM chip from the 1970s that had /RAS and /CAS lines, a chip enable (maybe), row/column address inputs, read/write select, and data I/O lines.
All SDRAM families have a bus designed to send commands to the RAM, and many of those commands (in fact - the vast majority of them!) have nothing to do with reading or writing the data. There are several mode control registers that you can access - they are not "memory mapped", i.e. they are not visible in the address space of the memory, but are accessed using dedicated commands.
LPDDR4 chips each have one or two completely independent "channels", i.e they are effectively split into one or two independent SDRAM devices, and each one has a synchronous parallel I/O bus. The signals are:
Control/command/address section
- CK_t/CK_c - differential clock input for the address, command and control signals (i.e. all signals other than DQ). This section is always single-data-rate, i.e. the signals are sampled by the RAM on the rising clock edge only.
- CKE - clock enable, used to inhibit the clock internally without having to externally turn it off
- CS - chip select/command nibble select - the deselect command is activated when CS is low on the rising clock edge, and then the CA bus is ignored until the next clock cycle; otherwise CS must be high on the rising clock edge and low on the falling clock edge
- CA[5:0] - command/address input bus - yes, you only provide 6 bits at a time
- ODT_CA - on-die termination enable for the CA bus - can be overridden in the MR22 register
- /RESET - used to bring all channels on the die to the RESET state
Data section
- DQS1_t/DQS1_c - differential clock I/O for the upper data bus byte (DQ[15:8])
- DQS0_t/DQS0_c - differential clock I/O for the lower data bus byte (DQ[7:0])
During the READ command, the memory chip drives the DQS clocks. During the WRITE command, the user drives the DQS clocks. The data bus is double-data-rate, i.e. on each clock cycle, new DQ values are clocked by both the rising and the falling clock edges.
- DQ[15:0] - 16-bit data I/O bus
Power and miscellaneous
- VDDQ, VDD1, VDD2 - 1.1V supply voltages
- ZQ - calibration reference, connect to VDDQ through a 240Ω ± 1% resistor.
Clock Speed
Can standard PC memory be bought down to more "comfortable" clock speeds?
Yes. In the standard, table 88 - Clock AC Timings - specifies the maximum average clock period tCK(avg) of 100ns. Thus to run the RAM, you need to provide it with a 10MHz clock at minimum. This clock can be turned off according to Input Clock Stop protocol (ibid., sec 4.37). This minimum clock applies to all speed grades of LPDDR4 memory.
The jitter specification tJIT(cc) (ibid., table 88) is 140ps maximum for LPDDR4-1600, and gets tighter for faster devices. It is only critical with fast clock speeds - for the most part: as long as the clock period +/- jitter doesn't exceed specs, things will be OK but DDR2+ devices contain delay locked loops (DLLs) that need to synchronize with the clock period. Those loops won't achieve good lock if the clock jitter is excessive. Using an externally-derived clock and synchronizing the microcontroller with it is thus preferable. See the 3-part article "Dealing with clock jitter in embedded DDR2/DDR3 DRAM designs" for more insights into clock jitter's effects on DDR2+ memories.
The 10MHz clock doesn't mean that the RAM has to be "doing something" all the time: on many of the clock cycles, the DESelect command can be given (CS = 0 on a rising clock edge).
Refresh
The LPDDR4 chips with 4Gb, 6Gb and 8Gb (gigabits) dies have 8 banks, and each of the banks has to be refreshed periodically. The refresh rate requirement depends on the die temperature: the higher the temperature, the faster the refresh rate has to be. The RAM helps you here: as a user, you have to periodically read the MR4 register to determine the refresh rate the RAM requires at its current operating temperature. The RAM has a built-in temperature sensor. The fastest refresh rate is 0.9us per bank. The banks can be refreshed individually - when one of the banks is being refreshed, the others can be used. There's also a Refresh All command, to refresh all banks at the same time.
Addressing
All memory data accesses - reads or writes - proceed as follows:
The row is activated using the Activate-1 and Activate-2 commands. Those provide the row address to the RAM.
A column within the row is then selected for reading or writing by the Read-1 or Write-1 command followed by the Read-2 or Write-2 commands, also called CAS-2 since they are identical for reading and writing. CAS-2 must be sent no earlier than tRCD (RAS-to-CAS delay) after Activate-2 - i.e. no earlier than after 18ns or 4 tCK cycles - whichever is later.
Until now, the DQS clock has been dormant. The clock becomes active after the read or write latency has passed, followed by passing of the preamble time. For reads, the RAM supplies the DQS clock; for writes, the user supplies the DQS clock. The preamble can be configured via the MR1 register, and it can be a "static" preamble where DQS doesn't toggle, or a dynamic one where DQS toggles one cycle before the data gets exchanged.
Everything Else
As the clock rates go above the minimum, and as the operating temperature changes, various training routines have to be performed.
And the final kicker: the power-up and power-down supply voltage sequencing is critical. The RAM is only guaranteed to survive 400 unplanned power down cycles, i.e. shutdown while the device was in the middle of an active command, and even those have to observe proper supply ramp-down. So, if you plan on using LPDDR4, you have to detect power loss before the supply voltages go out of regulation, the RAM must be brought to an idle state (this generally takes less than 1us, so not a big deal), and then all the inputs have to be "parked" in valid low logic states, and the regulators have to ramp down the voltages in a controlled fashion.
I've been running LPDDR3 (according to JESD209-3 standard) with a rather "slow" 100MHz microcontroller that doesn't have a DDR controller - one of the two auxiliary low-complexity cores in the microcontroller performs the RAM controller functions and voltage monitoring and bring-up/shut-down according to the standard.
It's not the most complex thing to do, so all those other answers imploring that it's "out of reach" of a beginner are overblowing it a bit.
Even DDR4 is something that you could interface with in late 1970s - sure, it wouldn't run anywhere near its own full speed, but it would easily keep pace with any and all digital systems of that age, with no wait states, and with next to no overhead (it'd work almost like SRAM would). All it would take to make it work would be a bit of discrete glue logic, and some analog level shifters and power supply controllers.
I'm confident that if you gave any of the DDR specs, up to and including DDR4 to, say, a DEC engineer in the late 1970s/early 1980s, they could make a working interface in a couple of months at most. And that would be to run the memory fully in accordance with the JEDEC standard. If all you wanted to do was just a proof of concept, it'd probably take no more than a couple of weeks of work. When the system starts up, various command sequences can be written into bipolar SRAM (in 1979, that would have been e.g. the Fairchild 13ns F10415A 1kbit ECL 10k family SRAM), and then "streamed" to the SDRAM using a state machine. During read and write cycles, the appropriate row and column address, as well as write data would be latched on the computer's bus and driven towards the SDRAM at appropriate times, and the read results would be similarly latched and driven back onto computer bus. Given the otherwordly DDR4 performance (vs. what was usual back in the late 70s), all the latencies in the SDRAM would be of little consequence, and the memory would be essentially a zero-wait-state DRAM for a vast majority of systems. Even the fastest systems of that time could access it in a single bus cycle, easily.
In practice, a DDR4 controller IP (intellectual property - a core you buy and stick into your ASIC, or comes standard in the SoC or FPGA you are using) sure saves time, but it's not too hard to implement if you aren't running at the full speed of the RAM chip. The main reason to favor fast SDRAM over slow SDRAM is that the fast SDRAM is cheap and widely available, whereas older SDRAM is becomes more expensive and eventually only available on secondary market as time goes on.