Digital Signal Quantized Processor Complexity

Question

In “traditional” digital signal processing, the complexity is computed as a number of multiplication the operation requires, e.g. the computation of the N-point DFT via the decimation-in-frequency FFT requires (N/2)log_2(N) complex multiplications (and the same number of additions, but in the DSP the additions are assumed to require less time and their contribution is traditionally omitted, unless specifically required). However, we typically do not define which numbers we multiply. My question is how we define the complexity in the DSP sense, if we apply some quantization rule for the numbers to be multiplied: e.g. we proceed from multiplying two real numbers in all operations of the processing devise to multiplying the pairs of int8, float 16, etc. The latter operations are evidently “simpler” that the multiplication of two reals, but how can we quantify and express this complexity reduction for the DSP element?

TimWescott · Accepted Answer · 2021-05-04 17:26:35Z

First, if there's a more formal way to do this than what I'm going to show, I don't know it.

Call out the MAC operations along with data path widths. I.e., "the FFT algorithm used takes $4 n \log n$ operations with 24-bit wide data paths"
- A corollary to this would be that sometimes you may have different precision in your coefficients and your actual data. So you may call out "24 bit signal path, 16 bit coefficients".
Call out the LUTs used (in an FPGA). This isn't direct, and has a lot to do with the cleverness of the implementation, but within a specific FPGA technology it has meaning.

Note that these days, unless you're doing full-custom ASIC design, data paths come in chunks - processors give you multiples of their native word width, and FPGAs come with "DSP blocks" that come in fixed widths. I'm only guessing, but I'd bet money that if you go to design an ASIC then at the first level of design complexity there will be signal processing libraries with their own fixed-width data paths, and that libraries with variable-width data paths would take more engineering resources to use.

Knut Inge · Accepted Answer · 2021-05-04 19:52:59Z

The execution time of an algorithm on a given piece of hw is going to be whatever it is going to be. I assume that you want a simplified/theoretical guideline for picking either hw or algorithm.

If you are using a dsp or SoC, I guess that you are going to use SIMD. The general picture is that a SIMD unit is n bits wide (eg 128 bits), and the capability for doing add, subtract or multiply is the same, while multiply-accumulate may or may not be more expensive.

So: an add is as expensive as a multiply. An 8-bit operation is half as expensive as an 16-bit operation. If you are multiplying two 8-bit integers and keeping only (e.g.) the «high» 8 bits of the result, you will often find an operation that does just this spread out over the full simd width, scheduled once every clock tick.

Some architectures may be capable of scheduling more than one simd operation per clock tick, and they may have several simd units with differing capabilities. Some may offer 256-bit or 512-bit simd instructions that for all intentd and purposes are serialized half-length simd instructions (ie one 512-bit op takes twice the time and does twice the work of one 256-bit op).

-k

Stack Exchange Network

Digital Signal Quantized Processor Complexity

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
signal-analysis
fourier-transform
quantization
terminology
preprocessing
or ask your own question.

Hot Network Questions

Digital Signal Quantized Processor Complexity

2 Answers 2

Not the answer you're looking for? Browse other questions tagged signal-analysisfourier-transformquantizationterminologypreprocessing or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
signal-analysis
fourier-transform
quantization
terminology
preprocessing
or ask your own question.