I have a machine learning algorithm that takes speech sample audio recordings collected from mechanical Turk. During processing it was shown that some audio from certain OS/microphone devices have been preprocessed by noise suppression algorithm by default which affects the results I am getting.
As part of my writeup I need to ensure that OS and Microphone preprocessing is accounted for. Ideally I would like my algorithm to be independent of the recording source, however I am not even sure how to go about detecting the kinds of preprocessing that is being applied and give a lower and upper error bounds for how it affect my algorithm.
My question is how does other people deal with preprocessing on audio files that are outside your control? Are there some common preprocessing that I should know about (echo cancellation, noise suppression, etc.)?
Something I can do for example is to add noise suppression myself on some reference files and add +/- for the max and minimum deviation from the results from the reference file. However it's a bit unsatisfactory as I do not know how different my preprocessing is to other devices.