View Single Post
Old 04-07-2012, 04:03 PM   #2
liteon
Human being with feelings
 
liteon's Avatar
 
Join Date: Apr 2008
Posts: 510
Default soft-float performance

this is a bit of side note:

i was curious on the performance situation when using software floating point in comparison to hardware, so i had to run some tests in this aspect. the only adequate way to get at least somehow accurate measures is my case, not having a real ARM device, while running either in a simulator or a VM, was to see what happens when x86 handles optimized software floating point and draw some conclusions from that.

instead of looking for the GNU build of their soft-float library i wrote a quick version of floating point addition that takes into consideration everything that the FPU might do, such as check for NAN, infinity and round to nearest as the default rounding mode. i've used some compensation trickery for the actual measurement code to neglect any possible small deviations, caused by compiler optimizations, pipelining or OOE (if that is even possible). this is greatly simplified on a single core x86 with the TSC if you can get the OS into a passive mode.

the results are:
no test code - ~0 cycles
x87 FADD - ~24 cycles
SOFT-FADD - ~140 cycles
SOFT-FADD with -O3 - ~40 cycles

GCC -O3 does a great job optimizing the function into something that might be considered "difficult to follow" x86 assembly (not that x86 normally is), but the performance is excellent. while these numbers will be completely different on an ARM CPUs (and overall the code will be much slower), i think that i cannot confirm that hardware floating point arithmetic is thousand of times faster than software, information for which i took from various small articles and more explicit hardware documentation. i would speculate a 10-30 times faster execution for VFP's FADD over a unoptimized software version on ARM.

if someone is interested i can post the test code.

p.s.
i was able to fry something on my MB/AGP port, so currently my graphic card only runs in VGA mode, but i guess i will continue slowly the ARM port after i have a better platform to work on (unfortunately this affects my job-work as well). to my surprise watching a low-res "modern" video on a native player and low-res flash (e.g. youtube) works ok even without hardware acceleration and high AGP transfer rates.

--
liteon is offline   Reply With Quote