I would suggest that even if you want to use analogue filtering stages (they can give sound a warmth that can be hard to achieve via other means) it may be a good idea to generate the starting waveforms digitally. Many Williams' Electronics arcade machines in the 1980's generates sound using a board that contained a 6800 microprocessor, a small amount of RAM and ROM, and a little bit of I/O including a DAC. All of the sound effects were generated using tight program loops which generated samples and fed them to the DAC. Since the processor was used for nothing but sound generation, loop execution speed could be used for timing.
In practice, even the simplest microcontrollers have some sort of timer resource, which could be helpful if you want to be able to change the audio parameters while playing sounds. Using something like 6805 code, one would start by writing a poll routine for each voice; for speed these routines would live in RAM--something like:
poll1:
brclr TMR_CONTROL,TMR_READY,poll ; Wait for start of next 'tick'
bclr TMR_CONTROL,TMR_READY
FRQ1L: lda #PATCH
PH1L: add #PATCH
sta PH1L+1 ; Patch value for LSB of phase
FRQ1M: lda #PATCH
PH1M: add #PATCH
sta PH1M+1 ; Patch code
FRQ1H: lda #PATCH
PH1H: add #PATCH
sta PH1H+1 ; Patch code
sta FETCH+2 ; Patch LSB of target
FETCH: lda TABLE_BASE ; 16-bit address
clr DAC_ENABLES
sta DAC_OUTPUT
lda #ENABLE_1
sta DAC_ENABLES
rts
Next, one would have a main loop which would repeatedly call the poll routine for each voice in sequence and, between calls, perform whatever other logic needed to be done (e.g. seeing if any voice parameters needed to be updated). Using this approach, it's possible to update a fair number of voices with a high sample rate.
While it's possible to do the initial wave generation entirely using analogue circuitry, it's difficult to have multiple independent analog generators whose frequency characteristics are absolutely identical within a fraction of a percent. The human ear is very sensitive to variations in pitch--far more so than to variations in amplitude--so whatever is used for signal generation must be very consistent. Using a simple microcontroller as a starting point is a good way to get such consistency, even if one then feeds the generated signal through analog shaping circuitry.