COCKOS
CONFEDERATED FORUMS
Cockos : REAPER : NINJAM : Forums
Forum Home : Register : FAQ : Members List : Search :
Old 03-21-2012, 01:14 AM   #1
liteon
Human being with feelings
 
liteon's Avatar
 
Join Date: Apr 2008
Posts: 510
Default ARM port of EEL2

following this thread:
http://forum.cockos.com/showthread.php?t=98337

i did some work on giving this a quick go...

here are the initial results:
https://github.com/neolit123/wdl/commits/eel2-arm

the glue code took most of my time, as i had to first understand what (the hell?) is going on in there. there are some major problems when function calls are made, for example calling something libc from the "virtual machine". my current solution, which is basically - passing an address table around in assembly, may urge the need for some facepalm-like gestures in certain developers.

mind that this a soft-float port to ARM, which will run slow, but on pretty much everything. a VFP version can be possibly branched out in the same build, while FPA and FPE do not make much sense to be implemented in my opinion, since the support is minimal (afaik).

only some basic operators and functions are implemented at this point, but the semantics are in place.

test:

Code:
ret = 3.1415926535897932384626433832795;
ret = (sqr(ret - 3.0) / 2 + 1.5)*ret;
ret = (sqr(ret - 3.0) / 2 + 1.5)*ret;
ret = (sqr(ret - 3.0) / 2 + 1.5)*ret;

// goes something like
ret = 3.14159265358979323
ret = 4.7438810584205937
ret = 14.3291800878725635
ret = 941.0712054248509730
--
liteon is offline   Reply With Quote
Old 04-07-2012, 04:03 PM   #2
liteon
Human being with feelings
 
liteon's Avatar
 
Join Date: Apr 2008
Posts: 510
Default soft-float performance

this is a bit of side note:

i was curious on the performance situation when using software floating point in comparison to hardware, so i had to run some tests in this aspect. the only adequate way to get at least somehow accurate measures is my case, not having a real ARM device, while running either in a simulator or a VM, was to see what happens when x86 handles optimized software floating point and draw some conclusions from that.

instead of looking for the GNU build of their soft-float library i wrote a quick version of floating point addition that takes into consideration everything that the FPU might do, such as check for NAN, infinity and round to nearest as the default rounding mode. i've used some compensation trickery for the actual measurement code to neglect any possible small deviations, caused by compiler optimizations, pipelining or OOE (if that is even possible). this is greatly simplified on a single core x86 with the TSC if you can get the OS into a passive mode.

the results are:
no test code - ~0 cycles
x87 FADD - ~24 cycles
SOFT-FADD - ~140 cycles
SOFT-FADD with -O3 - ~40 cycles

GCC -O3 does a great job optimizing the function into something that might be considered "difficult to follow" x86 assembly (not that x86 normally is), but the performance is excellent. while these numbers will be completely different on an ARM CPUs (and overall the code will be much slower), i think that i cannot confirm that hardware floating point arithmetic is thousand of times faster than software, information for which i took from various small articles and more explicit hardware documentation. i would speculate a 10-30 times faster execution for VFP's FADD over a unoptimized software version on ARM.

if someone is interested i can post the test code.

p.s.
i was able to fry something on my MB/AGP port, so currently my graphic card only runs in VGA mode, but i guess i will continue slowly the ARM port after i have a better platform to work on (unfortunately this affects my job-work as well). to my surprise watching a low-res "modern" video on a native player and low-res flash (e.g. youtube) works ok even without hardware acceleration and high AGP transfer rates.

--
liteon is offline   Reply With Quote
Old 04-11-2012, 01:49 PM   #3
IXix
Human being with feelings
 
Join Date: Jan 2007
Location: mcr:uk
Posts: 3,889
Default

Go on!
IXix is offline   Reply With Quote
Old 05-08-2012, 03:03 PM   #4
liteon
Human being with feelings
 
liteon's Avatar
 
Join Date: Apr 2008
Posts: 510
Default

here is an initial merge into the refactored eel2. compiles, but has to be adapted/tested later on:
http://github.com/neolit123/wdl/comm...3f57d229e22ddb

the github diff does not have ignore-* i believe.

--
liteon is offline   Reply With Quote
Old 05-23-2012, 12:08 PM   #5
Justin
Administrator
 
Justin's Avatar
 
Join Date: Jan 2005
Location: NYC
Posts: 15,721
Default

Very cool! I'm about to push some new EEL changes online, including a bytecode interpreted mode (that is portable)... Now I'm tempted to go find a Raspberry Pi to help port the native ARM version (with FPU I hope?). Sorry if all of our EEL changes cause merge hell :/

Last edited by Justin; 05-23-2012 at 12:15 PM.
Justin is offline   Reply With Quote
Old 05-24-2012, 12:51 AM   #6
Tale
Human being with feelings
 
Tale's Avatar
 
Join Date: Jul 2008
Location: The Netherlands
Posts: 3,646
Default

We have a Raspberry Pi at work, but I haven't had a chance to play with it yet.
Tale is offline   Reply With Quote
Old 05-24-2012, 09:31 PM   #7
liteon
Human being with feelings
 
liteon's Avatar
 
Join Date: Apr 2008
Posts: 510
Default

Quote:
Originally Posted by Justin View Post
Very cool! I'm about to push some new EEL changes online, including a bytecode interpreted mode (that is portable)... Now I'm tempted to go find a Raspberry Pi to help port the native ARM version (with FPU I hope?). Sorry if all of our EEL changes cause merge hell :/
no problem,
there isn't much of a trouble merging, really...

the CPU in Raspberry PI is a bit outdated - ARM1176JZF-S, but has a VFP unit and is good enough for development. i wanted to get soft-float support in, because unlike the x87, which will probably be there for quite some time, ARM might soon decide to deprecate the VFP unit at some point to save dye space (and thus force use of the newer NEON SIMD only or come up with something else). there are a lot of ARM CPU's that have different floating logic and are simply not compatible (VFP,NEON,FPA,FPE).

if we neglect that, the VFP control word has a field that puts the co-processor into scalar mode which is suitable for EEL2, i think.
https://www.scss.tcd.ie/~waldroj/3d1/arm_arm.pdf
page 885.

the register exchange (CPU-COP) and overall the instruction sets are pretty straightforward.

i wouldn't consider working on a mobile device, unless its possible to attach a real monitor, mouse and a keyboard to it. also, i don't think serious programmers can be convinced that Android or iOS are better than something like Debian for development.

for the sake of running on a mobile device i did run a previous build of EEL2 on an Android phone, but then the build broke at some point. :\

--
liteon is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 09:02 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.