1

This is an overview of how one might use the Linux DMA Engine to engage the mainline Xilinx driver for its FPGA IP block. https://forums.xilinx.com/xlnx/attachments/xlnx/ELINUX/13011/2/Linux-DMA-In-Device-Drivers.pdf

I found an example basically implementing that, I even see matching function names e.g. on PDF page 22 with the completion stuff. It is a little proxy driver in the kernel that bridges the Xilinx main driver to user space. What the example does is:

  • allowing to set up one transfer with a scatter-gather list of just 1 element
  • triggered from user space by a call to ioctl()
  • it uses the DMA slave interface.
  • after submitting a transfer and requesting DMAengine to process pending,
  • waits for a completion to know the transfer to be over(that's apparently mapped to the Xilinx XDMA block's TLAST event)

I wonder now: what would be the "correct" way of implementing continuously running DMA transfer(s) from the FPGA DMA block to the Linux user space? Would it be feasible, assuming adquate buffer sizes for the speeds used, to keep the example driver in this way, i.e. that single transfers are triggered from user space, and that a double buffer is used, and while the DMAengine has been triggered for the next transfer ans is performing it, the buffer from the previous one is being processed from userspace?

So basically, a user-space loop of initiating single-tranfers within the kernel proxy driver using the DMAengine.

Or is something else necessary? I saw this: https://www.kernel.org/doc/html/latest/driver-api/dmaengine/provider.html

So. besides DMA_SLAVE, there is also DMA_CYCLIC. Cyclic kinda sounds like it is the thing to be used when doing continuous transfers. But I don't know what it's been designed for and the limits. It mentions "typically used for audio", well, it's more something like maybe 50 MB/s I'm looking for, which quite exceeds typical auio rates.

I did try my described naive "user space loop" approach, but only the first transfer (only reads from the FPGA) has a successful completion, thereafter I get timeouts,even though the FPGA apparently has further TLAST in store. I could imagine causes like timing issues, of e.g. having to do some specific thing at a pricise or close moment. Although I did try very low speeds of FIFO filling on the FPGA sides (like single digit Hz per buffer). I guess before I continue a possible dead-end out of a lack of knowledge, I'd like to hear from someone who knows: how is it supposed to be done?

2
  • 1
    Cyclic transfer is for that. The traditional user is Audio cards. The transfer bandwidth is limited by hardware, there is nothing special about it in software. Of course, you will get interrupts and you need to do something in callbacks, it all takes time and it is can be a bit asynchronous, which makes this not reliable for RT-like operations.
    – 0andriy
    Commented Jun 22, 2020 at 22:24
  • Ah, ok. I thought maybe it's somehow optimized for audio, which favors small buffers / low latency at kHz rates. Something RT-ish would actually be good, so then that's not it I guess. It is basically working now with the scheme described above (there were tricky issues in the FPGA config), using quite a bit of CPU, but I expected that.
    – sktpin
    Commented Jun 26, 2020 at 10:40

0