Take a moment of your audio without snare. Say, peak value at this moment is +1(voltage scaled to [-1...+1]). You also have a snare at this moment, which have value -1.
When rendering that values summed. So
(-1) + 1 = 0 (silence)
If you take snare away, the result will be +1 (loudest peak)
In case of mp3, it use spectral processing, so it doesn't care about out level, but levels of spectral bins. After decoding phase of spectral parts can be different from what it was before. Some frequencies reduced, as a result, you will have different sum of phases of spectral bins.
Very stupidly explained (I'm not got at English), but hope you get this.
Last edited by mpl; 05-21-2017 at 06:22 AM.
|