Questions about downsampling and pitch detection.

Alkamist · 10-29-2019, 04:45 PM

I'm trying to speed up my script that does pitch detection.

An idea I have to achieve that goal is to downsample the audio so the pitch detection script has less data to process. After all, most things you are interested in finding the pitch of have fundamental pitches far below what the normal sampling rate provides.

I have extremely minimal DSP knowledge however, so I'm not sure how to do this.

I've read that you need to use a steep lowpass filter before decimation of samples in the buffer. The decimation is straight forward, but for me, filtering is not.

What kind of filter do I need to use? Is there any example code anyone has?

Also, when you feed a sample rate into an audio accessor that doesn't match up with the source's sample rate, does it do some sort of filtering or does it just decimate?

Xenakios · 10-29-2019, 05:07 PM

Quote:

Originally Posted by Alkamist

Also, when you feed a sample rate into an audio accessor that doesn't match up with the source's sample rate, does it do some sort of filtering or does it just decimate?

AFAIK It uses the project's default resampling mode (point sampling, linear interpolation, sinc etc). So you don't really have to do anything in your own code to resample to a lower sampling rate, just pass the sample rate you want into the AudioAccessor function.

Alkamist · 10-29-2019, 05:54 PM

Quote:

Originally Posted by Xenakios

AFAIK It uses the project's default resampling mode (point sampling, linear interpolation, sinc etc). So you don't really have to do anything in your own code to resample to a lower sampling rate, just pass the sample rate you want into the AudioAccessor function.

For whatever reason I'm getting very strange results. I tried doing a simple algorithm with filtering and decimation and it seems to work well, but if I try to feed an AudioAccessor a different sample rate it is not working well at all. I'm getting octave shifts everywhere and at lower sample rates my samples seem to be disappearing entirely.

I'm not sure what I'm doing wrong. I probably am not taking something into account.

What are all of the factors I would need to consider? You have to feed a lot of things to the AudioAccessor function, is feeding a different sample rate the only thing I need to do?

Code:

GetAudioAccessorSamples(AudioAccessor accessor, int samplerate, int numchannels, starttime_sec, int numsamplesperchannel, buffer_ptr samplebuffer)

I tried keeping my 'numsamplesperchannel' the same and changing it to what it would be scaled to with the new sample rate and neither work. Also I'm not sure if 'starttime_sec' needs to be scaled to fit the new sample rate either.

I'll likely test all these things out, I'm just asking here in case anyone knows off hand.

Alkamist · 10-31-2019, 01:53 PM

Is there anyone who knows what kinds of filters are generally used for antialiasing when downsampling? I'm having trouble finding the right information online. If anyone could point me to some code examples or something I would be grateful.

Hypex · 11-06-2019, 09:49 PM

I wonder if you could simplify the math somewhat to speed it up. A number of years ago I was talking about frequencies in samples with a musician friend and how to calculate them. His explanation was very simple, with the difference between each sample, specifying the frequency of that point in time. A low difference meant a low curve in the wave and a low frequency. Likewise, a higher difference meant a higher curve in the wave and therefore a higher frequency. So these numbers in combination with the sample rate can calculate the frequencies of the sounds within.Sounds logical.

Mathematically it's a fairly simple way of looking at it. I don't know how accurate it is for actual real world use. Nor if it can be used as base for determining pitch.

It can be good to simplify the math when speed is desired.

BTW downsampling is easy since you just skip every second sample to halve it. I don't know if antialiasing will help to determine pitch since you are smoothing it over and modifying it. It's not a font, lol. But one easy way would be to take two points, split the difference, that is find the difference then halve it to find the one between them and add it in. It's more suitable for upsampling when you need to plug in samples that aren't there. As with downsampling it needs more analysis to determine the sweet spot for each point.

Alkamist · 11-07-2019, 08:42 AM

Quote:

Originally Posted by Hypex

I wonder if you could simplify the math somewhat to speed it up. A number of years ago I was talking about frequencies in samples with a musician friend and how to calculate them. His explanation was very simple, with the difference between each sample, specifying the frequency of that point in time. A low difference meant a low curve in the wave and a low frequency. Likewise, a higher difference meant a higher curve in the wave and therefore a higher frequency. So these numbers in combination with the sample rate can calculate the frequencies of the sounds within.Sounds logical.

Mathematically it's a fairly simple way of looking at it. I don't know how accurate it is for actual real world use. Nor if it can be used as base for determining pitch.

It can be good to simplify the math when speed is desired.

BTW downsampling is easy since you just skip every second sample to halve it. I don't know if antialiasing will help to determine pitch since you are smoothing it over and modifying it. It's not a font, lol. But one easy way would be to take two points, split the difference, that is find the difference then halve it to find the one between them and add it in. It's more suitable for upsampling when you need to plug in samples that aren't there. As with downsampling it needs more analysis to determine the sweet spot for each point.

As nice as that would be, I don't think anything like that would work. Detecting the pitch of monophonic audio is more complicated than that. The difference in value between two samples isn't only dependent on the frequency of the underlying wave, but also the amplitude and its complex harmonic structure.

Take an 80 Hz sine wave for example. If you select any random sample and calculate the difference between it and its adjacent samples, that value will be different depending on how loud the sine wave is, yet the frequency never changed. Sure, if you knew it was a sine wave to begin with, and you knew its amplitude, you could probably use that information to come up with the frequency, but while analyzing pitch realistically we never have that kind of information and the waveforms are usually not that simple.

As far as downsampling is concerned, it's not that easy either unfortunately. What you are describing is called decimation. If you do that alone without any filtering beforehand, there will be aliasing artifacts reflected down into the audible range that will mess with the pitch detection algorithm. It leads to rampant octave shifts at least with the algorithm I'm using.

SaulT · 11-08-2019, 08:27 PM

Resampling, one of my favorite topics!

Two main ways to downsample, either with stacks of IIR filters (biquads etc) or with FIR filters. Both introduce phase delay, but the right kind of FIR filter preserves phase and I think that's probably important here. The procedure is simple, filter first then throw away the samples you don't need (decimate).

Nyquist filters aka L-th band filters, are a type of symmetric FIR filter where every L-th tap is zero, and while the cut-off is then limited to 1/L, it's probably the most computationally efficient. The most popular version is probably the halfband filter, L=2.

You can find examples in ReaJS in both code that I've released (look for st-oversampler.jsfx-inc) and code that Tale has released (you can google his mono_synth page to find his oversampler.jsfx-inc code, my code is an extension of his). The part to look at would be the os2_down() code I suppose, where the y values are the samples at the higher samplerate.

Generating these are pretty easy in Matlab, Octave, etc and there are even online calculators that can generate more general case FIR filters, say if you want to set a cutoff at 0.2 of srate instead of 0.25, etc. Of course the tradeoff there is that these filters wouldn't have the zero every Lth tap.

If phase isn't important then do a 6th or 8th order biquad cascade and call it good.

Alkamist · 11-09-2019, 04:56 PM

Quote:

Originally Posted by SaulT

Resampling, one of my favorite topics!

Two main ways to downsample, either with stacks of IIR filters (biquads etc) or with FIR filters. Both introduce phase delay, but the right kind of FIR filter preserves phase and I think that's probably important here. The procedure is simple, filter first then throw away the samples you don't need (decimate).

Nyquist filters aka L-th band filters, are a type of symmetric FIR filter where every L-th tap is zero, and while the cut-off is then limited to 1/L, it's probably the most computationally efficient. The most popular version is probably the halfband filter, L=2.

You can find examples in ReaJS in both code that I've released (look for st-oversampler.jsfx-inc) and code that Tale has released (you can google his mono_synth page to find his oversampler.jsfx-inc code, my code is an extension of his). The part to look at would be the os2_down() code I suppose, where the y values are the samples at the higher samplerate.

Generating these are pretty easy in Matlab, Octave, etc and there are even online calculators that can generate more general case FIR filters, say if you want to set a cutoff at 0.2 of srate instead of 0.25, etc. Of course the tradeoff there is that these filters wouldn't have the zero every Lth tap.

If phase isn't important then do a 6th or 8th order biquad cascade and call it good.

Thanks for the advice! I'll do some research on all the things you posted as soon as I get the chance!

Hypex · 11-11-2019, 12:00 AM

Quote:

Originally Posted by Alkamist

As nice as that would be, I don't think anything like that would work. Detecting the pitch of monophonic audio is more complicated than that. The difference in value between two samples isn't only dependent on the frequency of the underlying wave, but also the amplitude and its complex harmonic structure

It would be nice but it's just as I expected. Still I do wonder if the difference could be used to calculate a waveform angle which could be multiplied with the base frequency. Just not useful in the real world. I've spent my life looking at waveforms. And it's easy to see all the patterns that appear. But harder to see how the patterns can be separated into single components.

Quote:

Take an 80 Hz sine wave for example. If you select any random sample and calculate the difference between it and its adjacent samples, that value will be different depending on how loud the sine wave is, yet the frequency never changed. Sure, if you knew it was a sine wave to begin with, and you knew its amplitude, you could probably use that information to come up with the frequency, but while analyzing pitch realistically we never have that kind of information and the waveforms are usually not that simple.

In real use a sound wouldn't be that simple. Being a mix of sounds and harmonics. And other complex varieties.

I asked another friend about how sounds can be mixed together and the answer was somewhat complicated that involved logarithms and such. However, in this case I think that was over complicated. Since on a computer it just adds samples together mathematically. And I've read about this and know this since I wrote code to add samples that worked. Plus Reaper would do the same thing, just in float these days.

Quote:

As far as downsampling is concerned, it's not that easy either unfortunately. What you are describing is called decimation. If you do that alone without any filtering beforehand, there will be aliasing artifacts reflected down into the audible range that will mess with the pitch detection algorithm. It leads to rampant octave shifts at least with the algorithm I'm using.

That's a good point. It would shift the octave since the playback rate is changed. That would need to be taken into account.

Either way about it will be complicated. Well, mathematically complicated, which increases workload of CPU. Most computers are on average fairly fast today so it shouldn't be a problem.

Given downsampling but retaining pitch would be like time stretching or time shrinking I'm wondering if a time stretch algorithm could help. I recall this old sample software called Audio Master that did time stretching which become popular in the 90's or there about. Given computers were way less powerful back then perhaps an old algorithm like this could help. Not exactly Elastique but close enough.

Of course at any rate, changing the speed will require computations, so the best method for your time may be a trade off between the pitch finder and reducing playrate if it makes it worthwhile.