A few minutes ago I announced version 1.0.0 of Cplus2Ruby. I use it extensively in the pulsed neural net simulator called Yinspire that I am currently developing.
Have you ever wondered why operating system kernels are compiled without full compiler optimizations turned on? The reason is that it’s impossible to implement proper threading in C/C++ in a library, because the execution model of C/C++ is single-threaded. If the compiler doesn’t know about threads, it might optimize a global variable access into a register access and if now another thread writes to that global variable we have a race. That’s just one example what can happen. A good introduction into the problem is given here (in German) or in the video Getting C++ Threads Right by Hans Boehm.
The next version of C++ called C++0x will include approaches to overcome this limitation. That would be a nice addition to the C standard as well, as C is used quite extensivly in systems programming where this matters even more.
What I am dreaming of is a combination of features from UPC (unified message passing/shared memory), Cyclone (abstract data types, region analysis, fat pointers), C (performance) and some concepts from D (modules, templates, closures). I’d call this language ASYL - Advanced Systems Language.
The last couple of days I’ve spent improving the performance of my neural net simulator that I wanted to embed in Ruby (for easier scripting). The pure C++ version is very fast until I embed it into Ruby, in which case performance drops significantly. Here some numbers:
20.98 seconds, pure C++ 24.16 seconds, C++ embedded in Ruby 26.16 seconds, C++ embedded in Ruby with wrapping C++ objects in Ruby objects 36.90 seconds, C++ embedded in Ruby with loading code written in Ruby (implies wrapping)
Note that in the last case, the loading code is written in Ruby. The time spent in loading the net is not significant (under 1 second)!
So how comes it that the same code embedded in Ruby just runs slower? At first I thought it must have to do with some compiler flags like -fPIC or -shared. But no, that’s not the reason!
My latest theory is that of memory fragmentation. Loading a net in Ruby allocates a lot of objects making further memory allocations slower. That’s the only reason I now can think of. Would be nice to have two separate memory frames, one for Ruby and one for my C++ application so that fragmenting one doesn’t hurt the other. In my C++ application I have to allocate a lot of memory for the priority queues, mainly expand them in size to hold more entries. That’s the only downside of the priority queues I use. The need for expanding an array. Maybe I use some more advanced data structure, something what I found under the name DSplay. A DSplay uses three structures, a splay tree which holds the newest items, a calendar queue (for one year) for the medium items and a linked list for all the “far future” items. The good thing is that this combination doesn’t need that much allocation. A free list allocator is enough. Of course I’d use an implicit heap instead of the splay tree, which would make it a lot faster IMO.
As a C programmer I often used the int type without thinking too much about whether it is an unsigned integer or not. Take indices as example. They usually can’t be negative, nevertheless I regularily use int for them (int is signed by default).
From my old assembly days I know an optimization trick which comes into play when you want to divide by a power of two. This can be optimized by using a bit shift right operation. In the same way you can speed up multiplications by using bit shift left.
In my code I always write “x / 2” instead of “x » 1” for readability reasons. And I always hoped that the compiler is clever enough to do this optimization. Is the compiler really clever enough? Let’s see what the compiler generates:
; ; %eax contains the value of "i" ; ; C code: (int)i / 2 movl %eax, %edx shrl $31, %edx addl %edx, %eax sarl %eax ; C code: (unsigned int)i / 2 shrl %eax
You see that in both cases the gcc compiler generates a bit shift operation instead of an expensive division. But you can also see that for signed integers it generates three operations more than for unsigned integers. While I couldn’t measure the performance improvements in my algorithms (because memory is the bottle neck here), I use unsigned int whenever possible.
Every real programmer should definitively know C. It’s a nice (arguable), small, and for sure understandable language which runs on most if not all machines (or at least a cross-compiler exists). Great operating systems like Linux or BSD are written in C. Languages like Ruby or Python are written in C as well. So what about C++?
Well, C++ is a really big beast! It has a lot of features not found in C:
- Strong(er) typing (I think C99 has this as well)
- Classes and namespaces
But well, as I said, C++ is big, and I heard someone saying that there is not even one compiler around that implements every little detail of C++. Uh, if there is not even one compiler that fully supports C++, how can there be any one guy who understands every little detail of C++? Except Bjarne of course, the creator :).
While C++ has a lot of features, it misses something very important: A garbage collector! That’s an absolute MUST HAVE for any real application.
Another big annoyance in my opinion in C++ is that you usually split your classes into a header (.h) and an implementation file (.cc), which doesn’t help readability.
If you use CplusRuby you get everything that C++ can do, with plain C and Ruby, plus a garbage collector, and a lot of more features. In the next version, I’ll implement templates, which is something that I really need to improve performance. So stay tuned.