PDA

View Full Version : processSingleReplacing()


olilarkin
09-03-2010, 03:11 AM
has anyone modded iPlug so that in does a 32bit single precision processReplacing instead of/as well as 64bit processDoubleReplacing? I want to be able to choose because at the moment my synth uses doubles everywhere is a cpu hog. I have a global olfloat typedef so i can switch it all to single precision.

I tried to work it out but IPlugBase is quite confusing, wondered if anyone had already done it.

thanks,

oli

Tale
09-03-2010, 04:37 AM
On a modern CPU (supporting SSE2 instructions) doubles are genrally faster than single precision floats. However, doubles do take twice as much memory, so if you are using large wave tables you might consider storing only these as single precision floats.

I would first try to figure out which part of the code of your synth is so CPU hungry, before doing anything. Perhaps there is a type conversion or a denormal in there somewhere.

cc_
09-03-2010, 04:56 AM
Yes, I would try and figure out why it is faster with floats, sounds like something funny is going on.

This might also be relevant: http://forum.cockos.com/showpost.php?p=381261&postcount=4 .

olilarkin
09-04-2010, 07:05 AM
thanks, i don't think i explained well...

I don't yet know if the fact I am using doubles everywhere is the problem, thats why I wanted to have a processSingleReplacing so I could just change my olfloat typedef to float and check the difference. I am using one of the very first apple intel laptops from 2006 so I don't know if the processor compares to the latest ones.

Perhaps there is a type conversion or a denormal in there somewhere.

How do you actually find denormals? My synth uses lots of CPU but it doesn't spike at all.

oli

bvesco
09-05-2010, 01:49 AM
Re: denormals

I wouldn't say you "find" denormals as much as you "prevent" denormals. I put a denormal squasher at the end of every single one of my processing units that involves division (or multiplication by a number less than 1.0).

cc_
09-06-2010, 12:11 AM
I don't yet know if the fact I am using doubles everywhere is the problem, thats why I wanted to have a processSingleReplacing so I could just change my olfloat typedef to float and check the difference.

Apologies if you're an old hand at this :) ... but in my experience guessing where the performance problem is and trying stuff out is never the way to improve performance. Better to first measure using a profiler - then you know where it's worth trying to improve things.

olilarkin
01-21-2011, 05:31 PM
are you guys so sure about doubles being faster/as fast?

it seems to me that with sse _m128 you can do 4 floats in the same time as two doubles, and the memory thing must also be a big issue. most hosts still use 32bit float internally. I for one would like the option of processSingleReplacing, so will investigate adding it, if only to easily test between float and double and not waste time going from float > double > float.

I need to do some testing on windows, but so far on OSX i'm shocked at the performance of some basic synths i have made compared to more complex things i used to do on a lesser processor using synthedit. Of course i do not know exactly what was going on inside those synthedit modules i was using, but still. maybe i need to learn assembler.

using the shark profiler doesn't seem to tell me that much apart from that my linear interpolation and phase wrapping code is a weak point.

by the way, can anyone recommend a free profiler on windows?

Tale
01-22-2011, 01:56 AM
are you guys so sure about doubles being faster/as fast?
Yes. Well, on Windows anyway.

If you compile with SSE2 support, then doubles are generally faster then floats. Without SSE2 support doubles are probably somewhat slower. However, since doubles take more memory you could expect more cache misses when using large arrays.

I use floats to store e.g. large wave tables, because of the lesser memory usage, but I use doubles for everything else.

maybe i need to learn assembler.
I know a little assembler, but thusfar I haven't been able to beat Microsoft's C++ compiler (with optimizations enabled) when it comes to performance.

schwa
01-22-2011, 06:12 AM
I agree with the above comments that float vs double processing is unlikely to be a significant performance issue, but if you want to implement VST processReplacing directly, you will need to re-implement the float versions of these functions: AttachInputBuffers, AttachOutputBuffers, and ProcessBuffers. You will need new versions of the inchannel/outchannel containers that can hold float buffers, and those float functions can be implemented just like the double versions, but using the float buffer containers.

schwa
01-22-2011, 06:15 AM
If your performance badness is specific to OSX, the problem could be struct alignment. OSX is unforgiving of floats/doubles falling on anything but 8-byte boundaries. For example this will raise an exception on OSX every time you do floating-point math with Myclass::myvar:

class Myclass
{
void Myfunc();
double myvar;
};

This can be easily fixed by using the WDL_FIXALIGN macro in wdltypes.h (see other places in WDL for example usage).

olilarkin
01-22-2011, 06:25 AM
thanks for the info. that applies to a huge number of my classes. I guess i'll try it on windows first and see how it compares

oli

Tale
01-22-2011, 03:27 PM
by the way, can anyone recommend a free profiler on windows?
I have tried Very Sleepy (http://www.codersnotes.com/sleepy) for the first time earlier today, which seems to do the job.