1
$\begingroup$

I have a machine learning algorithm that takes speech sample audio recordings collected from mechanical Turk. During processing it was shown that some audio from certain OS/microphone devices have been preprocessed by noise suppression algorithm by default which affects the results I am getting.

As part of my writeup I need to ensure that OS and Microphone preprocessing is accounted for. Ideally I would like my algorithm to be independent of the recording source, however I am not even sure how to go about detecting the kinds of preprocessing that is being applied and give a lower and upper error bounds for how it affect my algorithm.

My question is how does other people deal with preprocessing on audio files that are outside your control? Are there some common preprocessing that I should know about (echo cancellation, noise suppression, etc.)?

Something I can do for example is to add noise suppression myself on some reference files and add +/- for the max and minimum deviation from the results from the reference file. However it's a bit unsatisfactory as I do not know how different my preprocessing is to other devices.

$\endgroup$

1 Answer 1

1
$\begingroup$

There isn't a universal pre-processing chain, but here are some common ones that I'd look out for:

1) Noise Gating If a signal is below a certain threshold, the mic is muted. To identify: You'll see a VERY quiet noise floor, followed by a sudden spike in level.

2) Automatic Gain Control (AGC) Mic volume modulates based on the signal level. This will be difficult to identify unless you have multiple recordings from the same system, which will show modulation in the noise floor based on the signal level.

3) Compression This one should be self-evident. You'll see a very steady signal level that makes the waveform look like a "block", i.e. not much changes in level throughout the entire recording.

These are all relatively simple real-time processing examples. There may also be more sophisticated post-processing applied in some cases, such as noise suppression.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.