27. December, 2007
27. December, 2007
in by Michael Neumann

The last couple of days I’ve spent improving the performance of my neural net simulator that I wanted to embed in Ruby (for easier scripting). The pure C++ version is very fast until I embed it into Ruby, in which case performance drops significantly. Here some numbers:

20.98 seconds, pure C++
24.16 seconds, C++ embedded in Ruby
26.16 seconds, C++ embedded in Ruby with wrapping C++ objects in Ruby objects
36.90 seconds, C++ embedded in Ruby with loading code written in Ruby (implies wrapping)

Note that in the last case, the loading code is written in Ruby. The time spent in loading the net is not significant (under 1 second)!

So how comes it that the same code embedded in Ruby just runs slower? At first I thought it must have to do with some compiler flags like -fPIC or -shared. But no, that’s not the reason!

My latest theory is that of memory fragmentation. Loading a net in Ruby allocates a lot of objects making further memory allocations slower. That’s the only reason I now can think of. Would be nice to have two separate memory frames, one for Ruby and one for my C++ application so that fragmenting one doesn’t hurt the other. In my C++ application I have to allocate a lot of memory for the priority queues, mainly expand them in size to hold more entries. That’s the only downside of the priority queues I use. The need for expanding an array. Maybe I use some more advanced data structure, something what I found under the name DSplay. A DSplay uses three structures, a splay tree which holds the newest items, a calendar queue (for one year) for the medium items and a linked list for all the “far future” items. The good thing is that this combination doesn’t need that much allocation. A free list allocator is enough. Of course I’d use an implicit heap instead of the splay tree, which would make it a lot faster IMO.