2. February, 2008
2. February, 2008
in
»
»
by Michael Neumann

Have you ever wondered why operating system kernels are compiled without full compiler optimizations turned on? The reason is that it’s impossible to implement proper threading in C/C++ in a library, because the execution model of C/C++ is single-threaded. If the compiler doesn’t know about threads, it might optimize a global variable access into a register access and if now another thread writes to that global variable we have a race. That’s just one example what can happen. A good introduction into the problem is given here (in German) or in the video Getting C++ Threads Right by Hans Boehm.

The next version of C++ called C++0x will include approaches to overcome this limitation. That would be a nice addition to the C standard as well, as C is used quite extensivly in systems programming where this matters even more.

If you are into high-performance number crunching then you should consider Unified Parallel C (UPC). It unifies both message passing (as used by MPI) and the shared memory model (as used by OpenMP).

What I am dreaming of is a combination of features from UPC (unified message passing/shared memory), Cyclone (abstract data types, region analysis, fat pointers), C (performance) and some concepts from D (modules, templates, closures). I’d call this language ASYL - Advanced Systems Language.

8. February, 2007
27. June, 2007
in by Michael Neumann

This article is about the down-sides of Software Transactional Memory (STM). STM uses concepts from databases and applies them to main memory. Databases allow concurrent reads and writes to a single resource (a table for example) and implement techniques to fullfill the ACID properties. Now imagine multiple threads accessing a single resource (a variable in memory). STM does the same as a database, but for main memory.

But the question is, does it scale? Is it good to have this kind of shared-state concurrency? Of course, shared-state concurrency doesn’t scale per-se. You always have a single resource, and multiple accessors that concur for the resource. Whereas in a shared-nothing architecture, you have one resource per accessor, so there is true parallelism and no conflicts. It’s much easier to program in a shared-nothing paradigm than it is for shared-state. It’s less error-prone and it scales.

Whenever you have multiple threads accessing the same shared state (the same memory locations), you have to coordinate them when accessing the state. You have to gain locks, release locks, be careful to not dead-lock or live-lock the threads. It’s very hard to predict what is going on.

I’ve been thinking about a simple database implementation lately, that is single-threaded and event-driven like modern web-servers (Lighttpd or Nginx). The whole index is in main-memory, whereas the data lives on the disk. Queries can only contain elements that live in the index in memory. Because whenever you go to disk, you have to wait a very long time. That’s why it is event-driven. To avoid locking, the index is traversed in memory and those rows that match or are of interest are marked and at the end a request to read those rows from the disk is issued in the background (asynchronous). Maybe it is handed-over to a second thread, that handles disk-access, so that the other thread is free to continue with the next request. This method is very simple. But of course it has the requirement that every column that occurs in a “WHERE” clause has to be available in the in-memory index (more complex queries could be build at the application layer). There is no concurrency, everything is sequential. But usually the access to the index is very fast (as it is in memory). And this doesn’t matter, as you’d partition your database into small pieces, each of which is a database as described above. Of course, there are no referential constraints across partitions (inside it is possible).

Would be interesting how fast such a database would be. The key point is that it avoids complexity by avoiding shared-state and concurrency at all.