19 January 2013

I loved Damien Katz' blog on The Unreasonable Effectivenes of C.

I also really liked commentor Mark Peskin's input on the news re-cap of Damien's blog over on InfoQ. I just wanted to offer my 2c from the perspective of someone who's done enough C to have 2c.

One thing that Damien noted is that it's rare to see really large, comprehensive framework-like APIs in C. As long you as build small, singly-focused APIs with light domains (a few typedefs / structs and functions as the 'contract') then it's very easy to 'export' the library and re-use it. This, I think, is a sweet spot for C. I would never write something in C for performance, but instead because it's simply easier to do certain things in C (kernel programming, embedded programming, anything working with the hardware, anything relying on APIs that aren't both established and widespread enough as to have warranted an cross-platform 'abstraction' in a higher level language like Java, Ruby, Python etc).

I would also try to not write a full blown system in C, simply because it doesn't 'scale' for large projects. The large projects that do use C end up re-inventing a lot of things (like 'objects' and namespacing). IMHO, there are very few domains that truly need to be in C the whole way through. Some very notable exceptions are systems level components like operating systems (Linux) or UIs like GNOME, of course. But for applications, it's easier to build out in a higher level language and 'integrate' with lower level APIs where the higher level language and platform has gaps. Java has many such 'gaps,' though some've been slowly addressed in the last decade as APIs have become commonplace across many different operating systems: event driven IO, file system notifications, file permissions and metadata, etc.

Mark makes a great point about how to integrate C libraries and modules. He suggests JNI sucks, and to use messaging. I am a big fan of messaging. Fundamentally, successful use of messaging and successful use of JNI both require the same thing: you need to simplify the exported API drastically.

When using JNI, I try to never 'leak' any complex C types into my java API and vice versa, always communicating through numeric types and char* -> jstrings. Even if the native code I'm exposing is C++, I'll still use the C-flavor of JNI (not C++) because it forces this canonicalization that's favorable to interop. If you keep the surface area of the C API very light, and try to avoid threading, it's easy to make it work as a native extension via Java JNI or CPYthon or MRI Ruby, etc.

Once you've gone through this process, then exposing the C API via messaging is easier because the message payloads between two systems can't, by definition, be much more complicated than the surface area of the C library. Of course, if you're using messaging, that means either writing messaging code in C, or exposing the C library to some higher level language and doing the messaging there. The nice part about messaging is that insulates your higher-level language code from your C code which - let's be honest - might be flakier than your Java code. I still won't link code written against the imagemagick C APIs directly to my application! There is some black magic in that library...! If the C code dies, the messaging system absorbs the requests until another node running the C code can pick up the slack. On the other hand, if you really are using C for performance, then messaging introduces at least a network hop, not to mention another component in the system, and you might lose any gains you made by writing it in C. In this case, it is possible to use write stable, well-behaved JNI or native extensions, but this again requires keeping the surface area small and understanding the pre-conditions nad post-conditions. No threads. Don't pass around pointers to complex objects between C and Java. Make sure you understand who's supposed to clean up memory, and when.