REALbasic and LLVM

In my previous blog, one of the commenters asked me about my thoughts on the new LLVM backend compiler that RS is implementing for REAL Studio.  I’ll first start by saying that I am not a backend compiler expert, nor want to be.  Much of what I’ll write about I’m getting from sources that are in the know, as they say.

What exactly is LLVM other than a fancy acronym?  It stands for Low Level Virtual Machine and is a compiler infrastructure.  It was started in 2000 at the University of Illinois and in 2005 Apple hired some of those original authors for their development system.

RB currently uses a custom written backend compiler.  As you can imagine, maintaining a compiler is a lot of work and writing optimizations is not easy.  LLVM was designed for multiple layers of optimization including compile-time, link-time, run-time and even idle-time.

What does LLVM mean for REALbasic applications?  It means that applications should be smaller and faster.  Code that isn’t used will be removed (dead-stripping code).  Currently RB can do this but it’s not exceptionally efficient (but really not all that bad either) as it which leaves some unused code compiled into your application.  For most of us this isn’t a big deal, but for many that start a project with a standard toolset, they might be introducing some inefficiencies into the final product.

Switching to LLVM is great news, though.  Smaller, faster, better are all things to strive for.  Without giving away too much information, the RBScript compiler is being reworked to use LLVM.  It’s a good first step.

But that’s all it is.  If you’ve not used RBScript, let me tell you, it’s not REALbasic.  One of the most powerful features of REALbasic is the cross-platform debugger.  Run in one environment while debugging in the other.  Even when you are debugging locally (on the same machine in the same environment) you’re ‘remote debugging’.  If my source is correct, switching to LLVM will require rewriting the entire debugger because the LLVM metadata system is different than RB’s debugger map system.

Since LLVM uses different metadata properties the current introspection system will break and have to rewritten.  Plugins will also break.  The IDE, the plugin system and even the reporting engine are a heavy consumers of introspection.  So going to LLVM means a huge portion of the product needs to get updated.

Rewriting the debugger is not a trivial task.  To do this the RIGHT way is a huge job that could take a couple of strong backend compiler developers 6 to 12 months to fully implement.  Even hacking it together is probably a 4 to 8 month project and it would still take some time getting the remote debugger to work properly.

If all this sounds daunting and kind of scary, well, it is.  Switching backend compilers is not something to do on a whim.  It has to be well thought out and resources have to be allocated properly.  The roadmap for features that will not be added during the transition time need to be thought out properly as well since limited resources are a fact of life in all organizations.  A small organization needs smart long-term planning.

Make no mistake, though, the backend compiler switch promises to be take a while.  It basically rewrites major portions of the product which will lead to some subtle and not-so-subtle changes.  The risk is high and it will affect EVERYONE.

There’s been some talk about all the ‘freebies’ we will get when using LLVM (such as 64 bit support).  Not true.  Using LLVM gets you ‘closer’ but it still won’t be free.  Switching to LLVM removes one of the many tasks involved with such support.

With all that said, I think moving to LLVM is a must.  If RS wants to support different platforms it’s a smart move going forward.  The optimization capabilities look very promising and will be welcome to many users.  It IS the right thing to do.  Just be prepared for some bumps along the way.

One final word.  I’m a passionate RB user.  I’ve also been a project manager where I think about the bumps in the road before they happen.  I’m not an expert at this stuff, though, so any misconceptions or mistakes are entirely my own.

11 thoughts on “REALbasic and LLVM

  1. My understanding of the current compiler is that it can dead-strip framework-code written in RB but not the majority that is writen in C/C++
    I’m happy to be corrected if I’m wrong, though.

  2. Hi Bob, and thanks for your commentary and your free RB instructional videos -keep up the good work.

    The RS move to LLVM sounds excellent but also a bit scary during the transition period; as you’ve noted this is no small task.

    Obviously smaller builds that also run faster is something we’ve been wanting for years so this great news.

    In my opinion having less bugs and more reliability in RB is more important that smaller faster builds.

    Hopefully this move to LLVM also means less bugs in RB?

    In theory since RS would no longer be writing the complier shouldn’t there be less bugs as the LLVM team is much larger than the RS team and we can bet quality is very important to those LLVM folks (reading the list of LLVM users is noteworthy)?

    Additionally shouldn’t the RS team will have more time to fix the bugs in RB unless switching to LLVM is creating more work for the REAL team?

    I don’t think I’ve read anything about the LLVM switch reducing the RB bug count which was disappointing.

    Bob can you please contribute your thoughts on this?

    Thanks.

  3. I’ve read this and GP’s blog about LLVM and I don’t know if I understand it or not.
    LLVM language possibly/probably has a relatively small number of commands.
    Valid LLVM code is presumably ‘valid’ for all supported platforms but you’re going to need a different sequence of those instructions to produce a pleasing/meaningful result on an iphone as opposed to a PC.
    So for every platform the LLVM supports RB has to enable the developer to produce a suitable sequence of LLVM language instructions for later compilation to that platform’s machine language.
    And this still means lots of work at level of the IDE/front-end compiler/etc?
    When REAL talk about their framework is that what it is, the IDE and the front end compiler or is there more to it than that?

  4. When we talk about the framework there are two pieces – one written in C++ that constitutes things like FolderItems, Pictures, Graphics, and anything else that is “built in to RB”. There are some portions that are written in RB as well. The pieces may call into the C++ based framework por may be subclasses of things in the C++ framework – things like HTTPSockets, POP3Sockets, SMTPSockets and quite a number of others.

    The “Framework” is all of these things – but some can be dead stripped (the stuff built in REALbasic itself) and some can’t (the C++ based stuff)

    The move to LLVM is, fortunately for us, something that already knows how to emit machine code for a wide variety of processors. So REAL does not need to spend time creating that. And there are other bits that LLVM includes like optimizations and the infrastructure to add many more.

    What REAL has to focus on it the bits that they need to do whether they write the actual compiler or not – converting RB code into some intermediate representation (also known as the “front end”) This front end has to exist for any compiler – it’s just a matter of what the front end turns the RB code into – currently it’s one form and for LLVM it will be in LLVM’s virtual instruction set (see http://llvm.org/docs/LangRef.html)

    LLVM then takes that, optimizes it etc and eventually a back end portion emits real machine code from the optimized LLVM code.

    Then it all gets linked – I’m not sure if LLVM provides native linkers for all platforms RB targets or not. That may be something REAL needs to create – I’m not that familiar with all the bits that LLVM has.

  5. In theory since RS would no longer be writing the complier shouldn’t there be less bugs as the LLVM team is much larger than the RS team and we can bet quality is very important to those LLVM folks (reading the list of LLVM users is noteworthy)?

    RS will still be responsible for its own compiler. LLVM is not a whole compiler, but a toolkit for optimization and code generation. A compiler uses LLVM to generate code; it is only one piece of the compiler toolchain. The LLVM project RS is working on will replace the existing backend code generator components with a module that calls out to LLVM.

    The LLVM code is certainly better tested than anything RS has ever produced, but I do not expect that switching to it will have any noticeable effect on the already-very-rare occurrence of backend bugs. The existing backends are a small fraction of the overall compiler size, and they are some of the most reliable modules in all of REALbasic.

    The primary thing you should expect from a switch to LLVM is that performance-sensitive code will likely perform better. The existing backends do very little in the way of optimization, while LLVM offers a wide array of well-tested optimizations. In most situations this won’t matter, but if your app has some tight data-processing loop, you will probably be very happy about what LLVM does to its performance. It is also likely that your executables will be somewhat smaller.

  6. Maybe the move to LLVM sounds excellent but I doubt whether they are able to do it. The transition to Cocoa looks too heavy, it takes too long. The team is getting smaller, the tasks are enormous. I have not heard about the new developers in the RS. I do not expect LLVM before 2013. I think I learn better Objective-C before the RS present stable version with support for Cocoa and LLVM…

  7. The move to Cocoa has taken a really, really long time. It’s taken more than twice as long as we had originally expected. And there are a lot of factors involved with that. A big one is that a lot of modernization of the underlying code has taken place. For instance, REAL Studio started out on Classic Mac OS which handle events entirely differently from Windows or Linux. Next we supported Windows which meant adapting the event model (designed for Classic Mac OS) to Windows. Next we supported Mac OS X via Carbon (which conveniently, gave us a way to use the old event model) and finally Linux. Like Windows, Linux has a much more modern event model than Classic Mac OS. But we were still supporting Classic Mac OS so the easiest thing to do was to use the event model we already had.

    Then it comes time to move to Cocoa. Trying to use the old event model requires jamming a square peg into a round hole. Since Windows and Linux have event models that are modern like Cocoa, we ripped out all the plumbing left over from the Classic Mac OS days and modernized the event model. This refactoring was expensive but worth it in the long run.

    As for LLVM, we already have it working for RBScript which was the biggest part of the job. The move to LLVM is nothing like the move to Cocoa and you won’t be required to deal with both of those at the same time because we will still be supporting Carbon for a while and as of today, I don’t see any reason why you would be forced to use Cocoa with LLVM.

    Lastly, I don’t know where Bob got his information about how long it might take to get LLVM working with our debugger, but our engineers that are working on this have told me that getting the debugger working is not going to be a big job at all. It’s not going to be trivial, but it’s not going to be months of work either.

    So don’t worry about it. These are all steps in the right direction. Yes, sometimes they take longer than any of us would like but they are still progress.

  8. @Geoff Perlman wrote, “As for LLVM, we already have it working for RBScript which was the biggest part of the job.”

    90% of the job takes 10% of the time, the remaining 10% of the job takes 90% of the time.

    No offense, but REAL Software’s track record hasn’t been particularly stellar when it comes to estimating how long projects are going to take.

  9. Project estimating is never easy and the bigger a project is, the more difficult estimating it accurately becomes. However, the last time we made a big compiler change we estimated it would take 2 years and that’s exactly how long it took. Now that was changing the entire compiler. This is NOTHING like that. But the results will speak for themselves. There’s really little point in debating this now.

Comments are closed.