This is an overview of how one might use the Linux DMA Engine to engage the mainline Xilinx driver for its FPGA IP block. https://forums.xilinx.com/xlnx/attachments/xlnx/ELINUX/13011/2/Linux-DMA-In-Device-Drivers.pdf
I found an example basically implementing that, I even see matching function names e.g. on PDF page 22 with the completion stuff. It is a little proxy driver in the kernel that bridges the Xilinx main driver to user space. What the example does is:
- allowing to set up one transfer with a scatter-gather list of just 1 element
- triggered from user space by a call to ioctl()
- it uses the DMA slave interface.
- after submitting a transfer and requesting DMAengine to process pending,
- waits for a completion to know the transfer to be over(that's apparently mapped to the Xilinx XDMA block's TLAST event)
I wonder now: what would be the "correct" way of implementing continuously running DMA transfer(s) from the FPGA DMA block to the Linux user space? Would it be feasible, assuming adquate buffer sizes for the speeds used, to keep the example driver in this way, i.e. that single transfers are triggered from user space, and that a double buffer is used, and while the DMAengine has been triggered for the next transfer ans is performing it, the buffer from the previous one is being processed from userspace?
So basically, a user-space loop of initiating single-tranfers within the kernel proxy driver using the DMAengine.
Or is something else necessary? I saw this: https://www.kernel.org/doc/html/latest/driver-api/dmaengine/provider.html
So. besides DMA_SLAVE, there is also DMA_CYCLIC. Cyclic kinda sounds like it is the thing to be used when doing continuous transfers. But I don't know what it's been designed for and the limits. It mentions "typically used for audio", well, it's more something like maybe 50 MB/s I'm looking for, which quite exceeds typical auio rates.
I did try my described naive "user space loop" approach, but only the first transfer (only reads from the FPGA) has a successful completion, thereafter I get timeouts,even though the FPGA apparently has further TLAST in store. I could imagine causes like timing issues, of e.g. having to do some specific thing at a pricise or close moment. Although I did try very low speeds of FIFO filling on the FPGA sides (like single digit Hz per buffer). I guess before I continue a possible dead-end out of a lack of knowledge, I'd like to hear from someone who knows: how is it supposed to be done?