R, spray-can and Docker

Control F1 Lead Architect Phil Kendall gives some advice on performing R calculations in microservices.

Back in January this year, Control F1 started work as the lead member of the i-Motors consortium, a UK Government and industry funded* project working towards viable, commercially sustainable Smart Mobility applications for connected and autonomous vehicles. One of the key elements we will be delivering as part of the project is the capability to add predictive and contextual intelligence to connected vehicles, allowing all individual drivers, fleet managers and infrastructure providers to make better decisions about transport in the UK. At a coding level, this means we need to get some data science / machine learning / AI code written and deployed. This post gives a quick run through of the technology choices we made, why we made them and how we implemented it all.

Why R?

There are effectively two choices for doing “small scale” (i.e. fits into the memory on one machine) data science; R and Python (with scikit-learn). It just so happens that I’m much more an R guy than a Python guy, and the algorithms we wanted to deploy here were written in R.

Why Docker?

For i-Motors, we’ve gone down the microservices route for a lot of the common reasons, including the ability to independently improve the various components of our system without needing to do high risk “Big Bang” deployments where we have to change every critical part of the system at once. There are obviously alternatives to Docker for running microservices – while this post is Docker-specific, it shouldn’t be too hard to adapt what’s here to another container platform.

Why spray-can?

This is where it gets a bit more complicated! Excluding the definitely right out there on the cutting edge Docker for Windows Server 2016, running Docker means running Linux. At Control F1 we’re mostly a .NET house on the server side, so a number of the i-Motors components have been written in .NET Core and very happily deploy themselves on Docker. However, the .NET to R bridge hasn’t yet been ported to .NET Core, so there’s no simple way for a .NET Core application to talk to R at the moment. I investigated a couple of other options for bridging to R, including using node.js and the rstats package. Unfortunately, the official release of rstats doesn’t work with the latest versions of node, and while there are forks out there which fix the issue, basing a long-term project on a package without official releases didn’t seem like the wisest solution. The one option which did present itself was JRI, the Java/R Interface which I’d made some use of before when running on the JVM.

When it comes to JVM languages, I’m a big fan of Scala and the spray.io toolkit – again, the solution here isn’t particularly tied to Scala and spray.io and should be relatively easy to adapt to any other JVM language and/or web API framework.

Implementation

All the code for this blog post is available from Bitbucket. I’ll give a brief overview of the code here.

Startup

The web API is set up in RSprayCanDockerApp and RSprayCanDockerActor. This is pretty much a straight copy of the spray-can “Getting Started” app, with the notable exception that we bind the listener to 0.0.0.0 rather than localhost – this is important as the requests will be coming from an unknown source when deployed in Docker.

R integration

The guts of the R integration happens in the SynchronizedRengine class and its associated companion object. There are two non-trivial bits of behaviour here:

  • The guts of R are inherently a singleton object – there is one and only one underlying R engine per JVM. SynchronizedRengine.performCalculation() has a simple lock around the call into the R engine so that we have one and only one thread accessing the R engine.
  • The error handling is “a bit quirky”. If the R engine encounters an error, it calls the rWriteConsole() function in the RMainLoopCallbacks interface. The natural thing to do here would be to throw an exception, but unfortunately the native code between the Rengine.eval() call and the callback silently swallows the exception, so we can’t do that; instead we stash the exception away in a variable. If the evaluation failed (indicated by it returning null), we then retrieve the stashed away exception. In Scala, we wrap this into a Try object, but in a less functional language you could just re-throw the exception at this point.

Docker integration

The Docker integration is done via SBT Native Packager and is pretty vanilla; three things to note:

  • The Docker image is based on our “OpenJRE with R” image – this is the standard OpenJDK image but with R version 3.3 installed, and the JRI bridge library installed in /opt/lib. The minimal source for this image is also on Bitbucket.
  • We pass the relevant option to the JVM so that it can find the JRI bridge library: -Djava.library.path=/opt/lib
  • We set the appropriate environment variable so that the JRI bridge library can find R itself: R_HOME=/usr/lib/R

If you just want a play with the finished Docker container, it’s available from Docker Hub; just run it up as “docker -p 8080:8080 controlf1/r-spraycan-docker“.

Putting it altogether

For this demo, the actual maths I’m getting R to do is very simple: just adding two numbers. Obviously, we don’t need R to do that but in the real world you should be able to substitute your own algorithms easily – we’ve already deployed four separate machine learning algorithms into i-Motors based on this pattern. But as demos are always good:

$ curl http://localhost:8080/add/1.2/3.4

4.6

Where next?

What we’ll be working on in the near future is investigating how this solution scales with the load on the system – a single instance of the microservice will obviously be limited by the single-threaded nature of R, but we should be able to bring up multiple instances of the microservice (“scale out” rather than “scale up”) to handle the level of requests we expect i-Motors to produce. I’m not foreseeing any problems with this approach, but we’ll certainly be keeping an eye on the performance numbers of our “intelligence services” as we increase the number of vehicles in the system.

* i-Motors is jointly funded by government and industry. The government’s £100m Intelligent Mobility fund is administered by the Centre for Connected and Autonomous Vehicles (CCAV) and delivered by the UK’s innovation agency, Innovate UK.
Advertisements

How to use the MATLAB Compiler Runtime with AWS Elastic Beanstalk

One of Control F1’s current projects is working alongside the RAC on their RAC Advance platform, a revolutionary new technology that uses the latest diagnostic software to deliver an enhanced breakdown service for customers. 
As is so often the case when working with innovative new technologies, our team have uncovered a number of solutions and fixes that haven’t been documented in the past. And because we’re all about sharing, lead developer Phil Kendall explains here, (for the technically minded amongst you), some of the team’s learnings re: using MATLAB Compiler Runtime with AWS Elastic Beanstalk….

One of the components of the RAC Advance system (an ASP.NET web application) we’re working on makes use of MATLAB to perform some of the advanced calculations that make the platform a success.

In order to provide scalability and reliability, the component is deployed via AWS Elastic Beanstalk. When Control F1 began their work with the RAC, the component was hosted on a custom AMI which had to have the MATLAB Compiler Runtime manually installed, and then the AMI had to be maintained over time. One of the improvements we were hoping to make to the system was to reduce the number of manually maintained components in the system, so we began looking at whether it was possible to install the MATLAB Compiler Runtime automatically via Elastic Beanstalk’s configuration mechanism (.ebextensions).

To my slight surprise, this didn’t seem to be anything anyone had ever done before (or at least, had ever publicly documented how to do). Fortunately, the solution turned out to be not too complicated, although there are a couple of rough edges I’d like to smooth off:

  1. Download the appropriate version of the MATLAB Compiler Runtime from Mathworks’ website, and put this into an S3 bucket you control. You’ll need to make the file publicly readable.
  2. Create the following file and save it as “matlab.config” in a “.ebextensions” folder of your web application (note that the spacing is crucial here, and that’s it’s all spaces, not tabs):

sources:
  c:\\MatlabCompilerRuntime: https://s3-eu-west-1.amazonaws.com/your-bucket-name-goes-here/MCR_R2014b_win64_installer.exe
commands:
  01_install_matlab:
    command: setup.exe -agreeToLicense yes -mode silent
    cwd: c:\\MatlabCompilerRuntime\\bin\\win64
  02_modify_path:
    command: setx PATH "%PATH%;C:\Program Files\MATLAB\MATLAB Compiler Runtime\v82\runtime\win64"
  03_reset_iis:
    command: iisreset

(Note that the config files within the .ebextensions folder are run in alphabetical order so if you’ve already got other extensions in there, you may want to rename the file so that it’s run in the correct order).

To some extent, that’s all there is to it, but it’s probably worth an explanation as to how that’s working. Essentially, there are two main steps: the first, indicated by the “sources” stanza, downloads a ZIP file from the specified location (our S3 bucket) and expands it into the specified folder. While the MATLAB Compiler Runtime installer has an “.exe” extension, it’s actually a self-extracting ZIP file, and the Elastic Beanstalk functionality is perfectly happy to deal with this.

The second step is to actually run the installer – this is what is accomplished by the “01_install_matlab” stanza, which uses the silent install functionality of the installer. (If you’re using the 32-bit runtime, you’ll need to modify the path specified in the “cwd” line). Finally, we kick IIS to pick up a modified PATH which includes the native MATLAB DLLS (“03_reset_iis”).

While this solution works, as noted above there’s a couple of things I’d like to improve:

  1. Ideally, you wouldn’t have to make the file in the S3 bucket publicly readable. However, the “sources” functionality supports only publicly readable files at the moment, so there’s no easy way round this. It would be possible to install other components onto the box which would let you authenticate and download a protected file, but that seems like overkill. Hopefully Amazon will add authentication support for “sources” at some point.
  2. The observant will note I skipped over the “02_modify_path” stanza – what’s that for? As noted, when the MATLAB installer finishes, it modifies PATH to include the location of the native MATLAB DLLs. However, the installer runs as a background task, so the actual command returns instantly, and crucially before it has modified PATH. As far as I know, there’s no way of knowing when the installer has actually completed, so we bodge around this by manually adding what we know is going to be added to PATH, which means that IIS will be able to find the DLLs once they’ve been installed. This is obviously not the nicest solution in the world, but it works.

Hopefully this little guide helps anyone else who’s looking to do this sort of thing – please leave a comment if there’s anything you’d like to ask, or if you can help with those improvements!

Phil Kendall
Lead Developer