COCKOS
CONFEDERATED FORUMS
Cockos : REAPER : NINJAM : Forums
Forum Home : Register : FAQ : Members List : Search :
Old 03-30-2021, 02:37 PM   #1
ladron
Human being with feelings
 
Join Date: Oct 2020
Posts: 7
Default WDL convolution and very small buffers

I've been successfully using WDL_ConvolutionEngine_Div for short impulses for a while now. Recently, I've needed to handle longer (4 seconds plus) IRs. Switching to Tale's WDL_ConvolutionEngine_Thread works on my reasonably fast Windows desktop, but I'm having trouble getting it to run without underruns on a less powerful Raspberry Pi 4.

At first I thought that the Raspberry Pi just didn't have enough processing power, but it isn't maxing out the CPU. And it works if I increase the processing buffer size to something large like 1024 (rather than my normal 32-sample buffer), but then latency in unacceptable.

Any ideas for other things I could try? Thanks!
ladron is offline   Reply With Quote
Old 04-01-2021, 07:01 AM   #2
Ric Vega
Human being with feelings
 
Join Date: May 2020
Posts: 19
Default

I've been using the Convolution Engine with up to 250,000 points per channel on a 2018 MacBook Pro 2,3 GHz Dual-Core Intel Core i5, and it works perfectly but it uses a big chunk of CPU. My buffer size is usually at 128, which I think has negligible latency. But a buffer size of 32 is way too small for this kind of algorithms.
Ric Vega is offline   Reply With Quote
Old 04-02-2021, 11:08 AM   #3
ladron
Human being with feelings
 
Join Date: Oct 2020
Posts: 7
Default

Quote:
Originally Posted by Ric Vega View Post
But a buffer size of 32 is way too small for this kind of algorithms.
A buffer size of 32 samples is aggressive, but not at all unachievable on modern computers and audio interfaces.

I am running guitar fx processing using a 32-sample buffer without dropouts on a humble Raspberry Pi 4 with a USB audio interface using convolution for guitar cabinet impulse responses.
ladron is offline   Reply With Quote
Old 04-02-2021, 11:39 AM   #4
Ric Vega
Human being with feelings
 
Join Date: May 2020
Posts: 19
Default

Quote:
Originally Posted by ladron View Post
A buffer size of 32 samples is aggressive, but not at all unachievable on modern computers and audio interfaces.

I am running guitar fx processing using a 32-sample buffer without dropouts on a humble Raspberry Pi 4 with a USB audio interface using convolution for guitar cabinet impulse responses.
Oh I see. But in that case you certainly don't need such a long IR. I'd say 10ms is more than enough to completely profile a guitar cabinet. I've been probing some digital cabs with a delta function and after 100 or 200 points (@ 44.1kHz) the IR is pretty much zero.
Ric Vega is offline   Reply With Quote
Old 04-02-2021, 11:46 AM   #5
ladron
Human being with feelings
 
Join Date: Oct 2020
Posts: 7
Default

Quote:
Originally Posted by Ric Vega View Post
Oh I see. But in that case you certainly don't need such a long IR.
Yes - the cab IRs work fine. I'm trying to add spring reverb, though, so I need longer ones...
ladron is offline   Reply With Quote
Old 04-02-2021, 12:08 PM   #6
Ric Vega
Human being with feelings
 
Join Date: May 2020
Posts: 19
Default

Quote:
Originally Posted by ladron View Post
Yes - the cab IRs work fine. I'm trying to add spring reverb, though, so I need longer ones...
Unfortunately I'm not very technical, so I don't know how different our setups are, but I carried out a test and was able to process a stereo 250,000 point IR in my setup at buffer size 32 without problems. Maybe you could check your ProcessBlock and FFTConvolution functions to see that there isn't anything slowing the algorithm down?
Ric Vega is offline   Reply With Quote
Old 04-02-2021, 12:31 PM   #7
ladron
Human being with feelings
 
Join Date: Oct 2020
Posts: 7
Default

Quote:
Originally Posted by Ric Vega View Post
I carried out a test and was able to process a stereo 250,000 point IR in my setup at buffer size 32 without problems.
It works fine for me on a fast computer, too. My issues are on a much slower Raspberry Pi 4.

For longer IRs, I'm getting inconsistent performance from buffer pass to buffer pass. Worst case is over 3x the CPU usage of the best case, and is enough to push me into dropout territory on the Raspberry Pi.
ladron is offline   Reply With Quote
Old 04-02-2021, 12:41 PM   #8
Ric Vega
Human being with feelings
 
Join Date: May 2020
Posts: 19
Default

Quote:
Originally Posted by ladron View Post
It works fine for me on a fast computer, too. My issues are on a much slower Raspberry Pi 4.

For longer IRs, I'm getting inconsistent performance from buffer pass to buffer pass. Worst case is over 3x the CPU usage of the best case, and is enough to push me into dropout territory on the Raspberry Pi.
Try 128 sample buffer size or even 256 in that case. Latency should be a problem there.
Ric Vega is offline   Reply With Quote
Old 04-03-2021, 01:30 AM   #9
Tale
Human being with feelings
 
Tale's Avatar
 
Join Date: Jul 2008
Location: The Netherlands
Posts: 3,213
Default

Assuming the Raspberry Pi 4 is fast enough to pull this off (which I don't know, but it might), then maybe tweaking thread priorities would help? Ideally the worker thread needs to be of lower priority than the main audio thread, but higher than any GUI threads.
Tale is offline   Reply With Quote
Old 04-05-2021, 02:38 PM   #10
ladron
Human being with feelings
 
Join Date: Oct 2020
Posts: 7
Default

I'm pretty sure the issue is that the Raspberry Pi is only fast enough to do a WDL_ConvolutionEngine_Div of a certain size before it can no longer handle the larger FFT chunks in time for a 32-sample buffer.

Forcing maxfft_size to 2048 in WDL_ConvolutionEngine_Thread::SetImpulse() gives me a workable WDL_ConvolutionEngine_Div, but that makes for a smaller 4096 FFT size for the threaded convolution, so it is less efficient. Still, it lets me get stable 1-second convolutions at a 64-sample buffer, which is an improvement. Much past 1 seconds, though, and the threaded convolution gets too expensive.

Does the threaded convolution's FFT size have to be twice the realtime WDL_ConvolutionEngine_Div size? I've tried making it a different (larger) size, but the code stopped working...
ladron is offline   Reply With Quote
Old 04-05-2021, 04:20 PM   #11
ladron
Human being with feelings
 
Join Date: Oct 2020
Posts: 7
Default

I think the best solution may be to have a WDL_ConvolutionEngine_Div engine for both real-time and background processing, with an FFT size threshold to cut over from one to the other.

That seems to be the way this implementation is working:

https://chromium.googlesource.com/ch...bConvolver.cpp
ladron is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 06:04 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, vBulletin Solutions Inc.