Just how innovative is the UK?

Our Product Development Director Dale Reed shares his thoughts from the 2015 Innovate UK Conference. 

Innovate UK is the UK’s innovation agency; an executive non-departmental public body, sponsored by the Department for Business, Innovation & Skills. They host an annual event to highlight the best and brightest of British Innovation, with exhibitors and seminars held over a two day period in London.

What was constantly highlighted throughout the event was just how innovative we actually are in this country. Consider these statistics:

The UK represents around 1% of the total global population and yet; we produce 16% of the world’s published scientific papers, and we host 4 out of the world’s top 10 Universities.

Then consider some of the inventions that have really shaped the world we live in today:

Computers? Charles Babbage, British.

Telephone? Alexander Graham Bell, British.

World Wide Web? Tim Berners-Lee, British.

Television? John Logie Baird, British.

You can also add to that list radar, the endoscope, the zoom lens, holography, in vitro fertilisation, animal cloning, magnetically levitated trains, the jet engine, antibiotics and, indeed, Viagra!

Some years ago, Japan’s Ministry of International Trade and Industry made a study of national inventiveness and concluded that modern era Britain had produced around 55% of the worlds ‘significant’ inventions, compared with 22% for the US and 6% for Japan. The point is that the Brits have a long history of innovation and it’s something we should be mightily proud of.

The downside is that however good we’ve been at inventing things, we’ve not been that great at commercialising them. Almost all of those inventions mentioned above have been vastly commercialised by businesses outside of the UK (really only jet engines and antibiotics contribute anything significant to our GDP). We also lose a great deal of our brightest minds to businesses overseas.

Fortunately this seems to be one of the areas that’s being changed, as evidenced by some of the talks I sat in on at the event. Many universities are now teaming up with businesses to place students and under-graduates – something which benefits all parties. Despite some difficulties around IP protection, it’s a huge boon to the student to learn some business sense and commercial ability before being employed full time. The employer gets some very bright minds to help them think around their problems. Many students go on to work with the business full time on graduation, and many businesses continue with the scheme year on year because it’s been so successful for them.

There are also now a lot of Catapult Centres right here in the UK (https://www.catapult.org.uk/). These are a network of world-leading centres designed to transform the UK’s capability for innovation in specific areas and help drive future economic growth. They are a series of physical centres where the very best of the UK’s businesses, scientists and engineers work side by side on late-stage research and development – transforming high potential ideas into new products and services to generate economic growth.

By bringing together the right teams who can work together and innovate, and just as importantly commercialise, the centres are ensuring the UK can continue to be at the forefront of innovation, particularly in technology and the sciences.

Graphene of course is a well-known British invention which I think will soon be joining the list of the world’s most life changing innovations in fairly short shrift. The number of applications seems almost limitless at the moment. We already have the National Graphene Institute, built as part of Manchester University, and fortunately the UK is working hard to ensure we are capable of commercialising the potential for Graphene. Work on another £60,000,000 building – the Graphene Engineering Innovation Centre – is currently underway, which will help look at how to move the research into actual production.

We also have a lot of expertise in quantum mechanics, and again companies in the UK are now working towards commercialisation of highly accurate sensors utilising quantum – for example an accelerometer based on the quantum interference of ultracold atoms. These will be able to provide highly accurate location and accelerometer information without any need for GPS or external factors. Although quite large at the moment it’s expected that they’ll be microchip sized within the next two years. Obviously this could be a huge boon to mobile, telematics and asset tracking systems. It’s currently being developed for use with submarines so they can determine their position accurately without having to surface to use GPS.

Overall I came away from the event feeling extremely positive and excited to be here in the UK at a time when there is so much potential for new technologies and innovation. I’m very much looking forward to Control F1 being a part of it!

Advertisements

A Sparkling View from the Canals

Control F1 sent Lead Developer Phil Kendall and Senior Developer Kevin Wood over to Amsterdam for the first European edition of Spark Summit. Here’s their summary of the conference.

One of the themes from Strata + Hadoop World in London earlier this year was the rise of Apache Spark as the new darling of the big data processing world. If anything, that trend has accelerated since May, but it has perhaps also moved in a slightly different direction as well – while the majority of the companies talking about Spark at Strata + Hadoop World were the innovative, disruptive small businesses, at Spark Summit there were a lot of big enterprises who were either building their big data infrastructure on Spark, or moving their infrastructure from “classical” Hadoop MapReduce to Spark. From a business point of view, that’s probably the headline for the conference, but here’s some more technical bits:

The Future of MapReduce

MapReduce is dead. It’s going to hang on for a few years yet due to the number of production deployments which exist, but I don’t think you would have been able to find anyone at the conference who was intending to use MapReduce for any of their new deployments. Of course, it should be remembered that this was the Spark Summit, so this won’t have been a representative sample, but when you’ve some of the biggest players in the big data space like Cloudera and Hortonworks joining in on the bandwagon, I certainly think this is the way that things are going.

In consequence, the Lambda Architecture is on its way out as well. Nobody ever really liked having to maintain two entirely separate systems for processing their data, but at the time there really wasn’t a better way. This is a movement which started to gain momentum with Jay Kreps’ “Questioning the Lambda Architecture” article last year, but as we now have an enterprise ready framework which can handle both the streaming and batch sides of the processing coin, it’s time to move on to something with less overhead, quite possibly Spark, MesosAkkaCassandra and Kafka, something which Helena Edelson implored us to do during her talk. Just hope your kids don’t go around saying “My mum/dad works with Smack”.

The Future of Languages for Spark

Scala is the language for interacting with Spark. While the conference was pretty much split down the middle between folks using Scala and folks using Python, how the Spark world is going was perhaps most obviously demonstrated by the spontaneous round of applause which Vincent Saulys got for his “Please use Scala!” comment during his keynote presentation. The theme here was very much that while there were people moving from Python to Scala, nobody was going the other way. On the other hand, the newcomer on the block here is SparkR, which has the potential to open up Spark to the large community of data scientists out there who already know R. The support in Spark 1.5 probably isn’t quite there yet to really open the door, but improvements are coming in Spark 1.6, and they’re definitely looking for feedback from the R community as to which features should be a priority, so it’s not going to be long before you’re going to see a lot of people using Spark and R.

The Future of Spark APIs

DataFrames are the future for Spark applications. Similarly to MapReduce, while nobody’s going to be killing off the low level way of working directly with resilient distributed datasets (RDDs), the DataFrames API (which is essentially equivalent to Spark SQL) is going to be where a lot of the new work gets done. The major initiative here at the moment is Project Tungsten, which gives a whole number of nice optimisations at the DataFrame level. Why is Spark moving this way? Because it’s easier to optimise when you’re higher up the stack – if you have a holistic view of what the programmer is attempting to accomplish, you can generally optimise that a lot better than if you’re looking at the individual atomic operations (the maps, sorts, reduces and whatever else of RDDs). SQL showed the value of introducing a declarative language for “little” data problems in the 1970s and 1980s; will DataFrames be that solution for big data? Given their position in all of R, Python (via Pandas) and Spark, I’d be willing to bet a small amount of money on “yes”.

On a related topic, if you’ve done any work with Spark, you’ve probably seen the Spark stack diagram by now. However, I thought Reynold Xin’s “different take” on the diagram during his keynote was really powerful – as a developer, this expresses what matters to me – the APIs I can use to interact with Spark. To a very real extent, I don’t care what’s happening under the hood: I just need to know that the mightily clever folks contributing to Spark are working their black magic in that “backend” box which makes everything run super-fast.

The Future of Spark at Control F1

I don’t think it will come as a suprise to anyone who has been following our blog that we’re big fans of Spark here at Control F1. Don’t be surprised if you see it in some of our work in the near future 🙂

Implementing best practice application design principles in Kentico 8

 

In this post Lead CMS Developer Chris Parkinson discusses how we use n-tier architecture and Ioc at Control F1 to allow us to write abstracted, testable custom code within Kentico, and how we then consume this custom code in the CMSApp and CMSApp_AppCode projects.

NB This post assumes a good working knowledge of Kentico 8 and understanding of SOLID principles, including n-tier and dependency injection / inversion of control. If the latter is new to you, here is a good place to start.

At Control F1 we like to apply good application design principles to any development project, be it Greenfield, mobile or CMS integration. These typically include the use of dependency injection, n-tier, and separation of concerns to allow custom code to be more easily tested.

Sometimes, however, particularly when working with an off the shelf product, we have to come up with workarounds. Kentico is a brilliant tool for quickly developing feature-rich websites, but as with any proprietary software it isn’t always the easiest to extend. We typically use Kentico web application projects at Control F1. This allows us to more easily integrate Kentico with our build processes using MSBuild (another article entirely).

One of the quirks of web application projects relates to the continued use of an App_Code  folder; which is a hangover from the older Website project. In website projects, code added and edited within App_Code is compiled on the fly, allowing server side code to be developed without the need to re-compile. This concept doesn’t really exist in web applications. Here is a good post from Microsoft explaining the differences between websites and web applications in .NET

Kentico has split out App_Code into its own VS project called CMSApp_AppCode, with any code/folders under it moved under a folder called Old_App_Code.

Kentico 8 pic

This causes two fundamental problems:

  1. Any custom objects or providers generated from custom page types or custom classes in Kentico are generated here.
  2. Any custom code that needs to inherit from CMSLoaderAttribute (custom events, extending the order provider etc) HAS to go here.

Remember these two points and we’ll quickly move on and discuss how we architecture our solution.

Historically, Control F1 Kentico solutions consisted of the following projects:

Kentico 8 2

  • Core (Utilities, helpers etc.)
  • Data (DAL layer)
    • Extensions (Extensions to CMS ‘entities’ – out of the box info, objects etc.)
    • Mappings (Classes that map ‘entities’ to ‘domain’ objects)
    • Repositories (Data repository classes to wrap up Kentico providers)
  • Domain
    • DTOs / DTO Extensions
  • Services
    • Service classes that typically wrap up the repository layer but return a ServiceResult<T> object, allowing consistent consumption from the client

The Service and Data projects typically contain both the interface and the implementation, meaning that to use a service you need to reference the Services project.

This is absolutely fine from the CMSApp project – you can happily reference Services and use them as you wish using your Ioc container of choice. But it’s time to revisit the two fundamental problems with Kentico caused by the CMSApp_AppCode project:

1. Any custom objects or providers generated from custom classes in Kentico are generated here.

Going back to our application architecture, we have a Data project which we use to wrap up out of the box Kentico info and provider objects. We also want to do this with any custom objects generated from custom classes in the CMS, meaning that we need to add a reference to App_Data from our Data project. This leads us nicely on to problem 2…

2. Any custom code that needs to inherit from CMSLoaderAttribute (custom events, extending the order provider etc) HAS to go here

Consider an eCommerce website where we might want to trigger custom code when the ‘Paid’ event is fired. We’d typically do this by creating a custom provider object that extends OrderInfoProvider, and by overriding the ProcessOrderIsPaidChangeInternal method to check for the IsPaid property.

public class MyCustomOrderProvider : OrderInfoProvider
{
    protected override void ProcessOrderIsPaidChangeInternal(OrderInfo orderInfo)
    {
        if (orderInfo.OrderIsPaid)
        {
            // TODO - our custom code here
        }
        base.ProcessOrderIsPaidChangeInternal(orderInfo);
    }
}

Now say we want to execute some custom code written in an IOrderService, we’d have to add a reference to the Services project.

Eek, fail!

Kentico 8 3

When we try and add a reference to our services project from CMSApp_AppCode, we get a circular reference error. This makes complete sense. CMSApp_AppCode is referenced by Data which is referenced by Services, so with our current architecture there’s no obvious way to use custom services in CMSApp_AppCode. This is obviously no good – we’ve already written custom code and tested so we don’t want to recreate this using Kentico’s provider objects – the whole point was abstracting this.

The solution is actually fairly simple, albeit requires a bit of refactoring and the use of reflection. Reflection, used with care, is a great tool that allows you to load an instance of an object at runtime – which is perfect for this scenario. Here is another good post on CodeProject explaining reflection in .NET.

Firstly, consider the initial good design principles we were discussing – particularly separation of concerns. If we’re using inversion of control, our consuming application doesn’t really need to know implementation details. All it needs to know is that we have an IOrderService which contains a method called DoStuff().  Therefore, with this in mind we can refactor our application to split out interfaces and implementations into separate projects. The solution now looks like:

kentico 8 4

  • Core
  • Data
  • Domain
  • Services
  • Interfaces
    • Services
    • Data

Our CMSApp_AppCode and CMSApp projects can now reference the interfaces project. However, we still need to create new implementations of these interfaces in order to use them. As we discussed earlier, CMSApp can happily reference the services directly as there are no dependencies on CMSApp from the n-tier layer. From CMSApp_AppCode we can’t reference services directly, but because we’ve abstracted interfaces from implementations we can load the implementations dynamically using reflection and map them to the correct interface.

Steps to achieve this are as follows:

  1. We need to load the assemblies – in this instance we need both Data and Services. We’re using dependency injection in our services and need to pass in the repositories from the data project. We’ve created a helper method to allow us to get the correct directory without specifying the full file path.
    var dataAssembly = Assembly.LoadFile(string.Concat(IoHelper.AssemblyDirectory, "\\Data.dll"));
     
    var _servicesAssembly = Assembly.LoadFile(string.Concat(IoHelper.AssemblyDirectory, "\\Services.dll"));

     

  2. Next we need to search the loaded assembly for an exported type that is assignable to our interface. In this case we’ve created an extension method that returns back an instance of the supplied type (providing a named parameter for when we have multiple instances of an interface).
    public static Type GetType<T>(this Assembly assembly, string name)
    {
        return (from type in assembly.GetExportedTypes()
                where typeof(T).IsAssignableFrom(type)
                && type.Name == name
                select type).
            Single();
    }
    var orderMapperType = dataAssembly.GetType<IMapper<Order, OrderInfo>>();
     
    var orderRepositoryType = dataAssembly.GetType<IOrderRepository>();
     
    var orderServiceType = servicesAssembly.GetType<IOrderService>();

     

  3. And finally we create new instances, passing in any dependencies.
    var orderMapper = (IMapper<Order, OrderInfo>)Activator.CreateInstance(orderMapperType);
     
    var orderRepository =
                   (IOrderRepository)Activator.CreateInstance(orderRepositoryType, orderMapper);
     
    var orderService = (IOrderService)Activator.CreateInstance(orderRepository)

It’s probably also worth mentioning Ioc in a bit more detail. Kentico 8 is a primarily WebForms application, meaning that whilst we could integrate an off the shelf library such as Ninject, Unity or StructureMap, unless you’re using an MVP pattern – which Kentico doesn’t – these libraries are quite fiddly to get to work. We want to use a well architected solution for our custom code that follows the good design principles we’ve previously discussed, but we don’t want to spend too much time fighting the way Kentico works.

In this instance we decided to create our own simple IServiceContainer. This is an interface that sits in the interfaces project under the Ioc namespace and contains public properties for the services we want to expose – repositories and mappings and private. These are used internally, but we don’t want the client to have access to them.

public interface IServiceContainer
{
    IOrderService OrderService { get; set; }
}

Our solution contains two implementations – one within CMSApp that creates new instances of implementations, and one within CMSApp_AppCode that uses reflection to create new instances as previously discussed. We’re also using lazy loading, meaning instances are only created when we actually need them. We typically then create ‘base’ abstract classes for either CMSWebParts / Modules / Loaders etc. that contain a protected property which in turn contains the IServiceContainer implementation. This allows us to call ServiceContainer.OrderService.DoStuff() etc.

CMSApp
public class CMSAppServiceContainer : IServiceContainer
{
    private IOrderService _orderService;
    public IOrderService OrderService
    {
        get
        {
            if (_orderService == null)
            {
                _orderService = new OrderService();
            }
            return _orderService;
        }
    }
}
CMSApp_AppCode
public class CMSAppAppCodeServiceContainer : IServiceContainer
{
    private IOrderService _orderService;
    public IOrderService OrderService
    {
        get
        {
            if (_orderService == null)
            {
                _orderService = (IOrderService)Activator.CreateInstance(_orderRepository)
            }
            return _orderService;
        }
    }
}

In summary, I hope you find this useful. I also hope that I’ve successfully demonstrated that with a bit of effort it’s fairly straightforward to write abstracted testable code within your Kentico solutions that can be used in the CMSApp and CMSApp_AppCode projects.

 

Control F1 wins an Examiner Business Award!

Last night the Control F1 team were suited and booted for the Examiner Business Awards, and we’re delighted to share that we were the proud recipients of the University of Hudderfield’s Innovation and Enterprise Award.

We fought off stiff competition from worthy finalists Wellhouse Leisure and The Flood Company Commercial Ltd. to win the accolade.

Our Co-founder and Technical Director Carl said:

We’ve put a lot into research and development – to the point of really pushing the boundaries – and it’s wonderful to see it paying off. Innovation is a core value for us and we’re delighted to have this recognised through tonight’s award.”

CF1 EXAMINER

Control F1 named a Little British Battler!

We are extremely pleased to announce that Control F1 has been named as a Little British Battler as part of the “Magnificent Seventh” TechMarketViews Little British Battlers Day. This accolade is awarded to innovative SMEs who are punching above their weight in business – a category we are very proud to fit into!

Out of hundreds of applications, we have secured one of the 12 highly sought after places. The other 11 SMEs from across the nation include enterprise auction platform Perfect Channel, analytics providers Aqila Insight and nanotechnology experts Memset. As the only winner that’s headquartered in Yorkshire, we’re extremely proud to be distinguished as an example of the innovation, expertise and drive of the Northern Powerhouse.

The award provides a platform to showcase the skills and ability of our fantastic – and ever growing – team at Control F1. We are currently expanding at an extraordinary rate, working in many different sectors and innovating in all.

The Control F1 team will head to London on 12th November to receive bespoke feedback from TechMarketView Research Directors and Senior Partners from London’s technology merchant bank MXC Capital.

Our Managing Director Andy Dumbell has commented:

“We’re delighted to have been named a Little British Battler – we may be relatively small in size, but our ambitions are big! In fact, it has been a big year for us all round – we’ve quadrupled our turnover, created over 30 job opportunities and secured external investment. Winning this accolade is the icing on the cake!”

Knowing of the success of the previous winners of the Little British Battler Programme, this takes us a step closer to achieving our goals, and we are extremely honoured to have won this coveted title.

TechMarketView will be publishing highlights from the day as well as the Little British Battler Report, which will be published in early December. Keep an eye on our social media channels – we’ll be live tweeting from the event, as well as covering the release of the highlights and subsequent report from TechMarketView.

Internet of Things World: Europe 2015 – it’s not just about robots

This post is an extract taken from Control F1 MD Andy Dumbell’s piece for Internet of Things (IoT) World News, following the IoT Europe conference in Berlin. Read the full piece here.

I’m writing this post whilst flying home from Berlin, feeling enthused, excited and inspired, after attending the first “Internet of Things World: Europe” conference. The show itself brought together thought leaders, alliances, and companies big and small from all parts of the evolving IoT (Internet of Things) sector.

Why did we go? Apart from the usual reasons for attending a conference – to learn, network, and pick up free t-shirts – we hoped to gain a better understanding of the IoT ecosystem; to crystallise where we can add value, and to find communities to collaborate with.

So, what exactly do we mean by IoT? This question was raised throughout the event, and I felt quite reassured by the lack of consensus, as we often debate the issue here at Control F1 – “it’s not just about robots!” Everyone had their own definition. One speaker’s presentation started with “IoT = Big Data”. Another view was that the IoT is a less organised version of M2M (machine to machine). Others pondered over whether it’s simply the next generation of M2M.

Here’s my stab at this: the IoT is connecting everyday objects across digital networks – such as the internet – trying to infer meaningful information whilst creating value.  Connected things can include just about any asset: clothing, appliances, vehicles, parcels, people, pets, buildings, planets – the list goes on. The IoT enables communication with such assets, to monitor them through sensing solutions, create intelligence, and manage and control them remotely.

However, for me the more important question is: why does it matter? The simple answer is that the IoT can make our lives better, but it is only worthwhile when it creates real value. For example, IoT innovations can save lives! By generating information and enabling timely communication, we can solve problems and make informed decisions, which leads to intelligence, convenience, efficiencies, effectiveness, smart socks and so on.

One of the highlights from the show was Katja von Raven’s talk on opening doors. Her business, Chamberlain, a manufacturer of smart home control products sold worldwide, has embraced the IoT to create a market leading smart garage door opener. The obvious benefit is increased convenience versus traditional products – you can ask your iPhone “did I leave the door open again?”, and then close it remotely. And Chamberlain has created new value for its customers through an alerting service – 70%+ of them use this feature, and 40% of subscribers say they could not live without it. A simple and effective solution made possible through the IoT.

I always enjoy hearing an inspiring success story – especially a technology driven one. Chamberlain took the brave decision to adopt the IoT and rethink its business model, transforming into a manufacturing and digital tech company. This was driven through consumer-guided decisions to create a useful product, rather than a misguided attraction to shiny new toys adding to the Internet of Pointless Things.

Advancements in connectivity also provided for interesting discussions. We heard about 5G. We heard about LoRa’s mission to standardise low power wide area networks to meet the IoT market needs. And we heard about SIGFOX’s low-cost, low-throughput, low-energy-consumption network – which can literally see through walls!

I was, however, surprised that Bluetooth didn’t have a stronger presence. I attended the Bluetooth Europe conference in London last month where they presented their planned roadmap, which includes mesh network capability, IPv6 support, as well as other interesting advancements that the IoT community could benefit from. The conference would have also been a great place for Amazon to showcase their new AWS IoT services.

Unsurprisingly Big Data and Analytics were also part of the theme, with insights drawn from various verticals on how to get value from billions of connected things. For example, the automotive sector is providing near real-time intelligence to motorists through connected vehicles, interpreting data from sensing solutions and broadcasting updates on congestion, road risk and better route options.

The European Commission talked about their continued support for IoT innovation and future deployment, with hundreds of millions of euros committed to funding research and experimentation, from smart farming and food security, to autonomous vehicles in a connected environment.

The IoT still feels a bit like the Wild West – fast, risky, but an exciting place to be. Past scepticism has subsided, with developments from major players making the IoT a tangible business opportunity. The pace of innovation is incredible. It has been catalysed by major advancements in connectivity, cloud tech, hardware, and driven by a generation of enthusiastic startups, innovators, forward thinking businesses, and communities, driving industry forward.

As a company we have worked in IoT from our conception in 2010, providing innovative software solutions and consultancy for big brands and startups alike. These have ranged from high-end fashion accessories that double as a personal security device, to the technology that allowed Nestle to launch a competition with hidden tracking devices in its chocolate bars (lucky winners were hunted down and handed a briefcase containing thousands of pounds!)

In summary, the IoT Europe conference served to reaffirm our strategy, and inspired us to continue innovating. There is no doubt that the IoT is changing our lives for the better, emerging as the third wave of development for the internet. The future will be quite different from the world we know today. We want to be part of the driving force that gets us there.

Configuring Elastic MapReduce 4 applications from the AWS console

Lead Developer Phil Kendall recently blogged about getting started with Spark on EMR. In this follow up post he explains how to configure EMR 4 applications from the AWS console.

Update 12th November: Jon Fritz, one of the Elastic MapReduce PM team, let me know that they’ve now fixed this bug in the console

Back in July, Amazon released “v4” of their Elastic MapReduce platform which introduced some fairly big changes as to how applications are configured. While there are some nice examples on that page, those examples don’t work if you try them in the AWS console: if you copy and paste an example into the “Edit software settings” box and then try and create a cluster, you get the following error:
ClassificationNullIsNotValid
…which is perhaps not the world’s most informative error ever, and definitely a bit disappointing when all you’ve done is taken an AWS-supplied example. After much frustration, I finally discovered that it’s the capitalisation of the keys that is significant: if you change the supplied example to

[
  {
    "classification": "core-site",
    "properties": {
      "hadoop.security.groups.cache.secs": "250"
    }
  },
  {
    "classification": "mapred-site",
    "properties": {
      "mapred.tasktracker.map.tasks.maximum": "2",
      "mapreduce.map.sort.spill.percent": "90",
      "mapreduce.tasktracker.reduce.tasks.maximum": "5"
    }
  }
]

…then everything works just fine – note the lower case “c” and “p” in “classification” and “properties” as opposed to the upper case versions used in AWS’s example. I’ve sent feedback to the AWS team on this one so I suspect it may end up getting fixed pretty soon, but if anyone else is suffering from the same problem then hopefully this gets you out of a hole!

Adventures in Spark on Elastic MapReduce 4

Lead Developer Phil Kendall on getting started with Spark on EMR.

In June, Spark, the up and coming big data processing framework, became a first class citizen on Amazon Elastic MapReduce (EMR). Last month, Amazon announced EMR release 4.0.0 which “brings many changes to the platform”. However, some of those changes lead to a couple of “gotchas” when trying to run Spark on EMR, so this post is a quick walk through the issues I found when getting started with Spark on EMR and (mostly!) solutions to those issues.

Running the demo

Jon Fritz‘s blog post announcing the availability of Spark on EMR contained a nice simple example of getting a Spark application up and running on EMR. Unfortunately, if you try and run through that demo on the EMR 4.0.0 release, then you get an error when trying to fetch the flightsample jar from S3:

Exception in thread "main" java.lang.RuntimeException: Local file does not exist.
	at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.fetchFile(ScriptRunner.java:30)

This one turns out to be not too hard to fix – the EMR 4.0.0 release has just moved the location of the hdfs utility so it’s now on the normal PATH rather than being installed in the hadoop user’s home directory. That can trivially be fixed by just removing the absolute path, but while we’re in the area, we can also upgrade to using the new command-runner rather than script-runner. Once you’ve done both those changes, the Custom JAR step should look like this:

HDFS

…and you can then happily run through the rest of the demo.

Spark Streaming on Elastic MapReduce

The next thing you might try is to get Spark Streaming running on EMR. On the face of it, this looks to be nice and easy – just push your jar containing the streaming application onto the cluster and away you go. And your application starts…. and then just sits there, steadfastly refusing to do anything at all. Experienced Spark Streaming folk will quite possibly recognise this as a symptom of the executors not having enough cores to run their workloads – each receiver you create occupies a core, so you need to ensure that there are enough cores in your cluster to run the receivers and to process the data. To some extent, you’d hope this isn’t a problem as the m3.xlarge instances that you get by default when creating an EMR cluster each have 4 cores, so there must be something else going on here.

The issue here turns out to be the default Spark configuration when running on YARN, which is what EMR uses for its cluster management – each executor is by default allocated only one core so your nice cluster with two 4 core machines in it was actually sitting there with three quarters of its processors doing nothing. Getting around this is what the “-x” option mentioned in Jon Fritz’s blog post did – it ensured that Spark used all the available resources on the cluster, but that setting isn’t available with EMR 4.0.0. The equivalent option for the new version is mentioned in the “Additional EMR Configuration Options for Spark” of the EMR 4.0.0 announcement: you need to set the “maximizeResourceAllocation” property. To do that, select “Go to advanced options” when creating the cluster, expand the “Edit software settings (optional)” section and then add in the appropriate configuration string: “classification=spark,properties=[maximizeResourceAllocation=true]“. This does unfortunately mean that the “quick options” for creating a cluster is pretty much useless when using Spark as you’re always going to want to be setting this option or a variant of it.

Getting to the Spark web UI

When you’re running a Spark application, you may well be used to using the Spark web UI to keep an eye on your job. However, getting to the web UI on an EMR cluster isn’t as easy as it might appear at first glance. You can happily point your web browser to http://<cluster master DNS address>:4040/ as usual, but that returns a redirect to http://ip-<numbers>.<region>.compute.internal:20888/proxy/application_<n>_<n>/ containing a reference to the internal DNS name of the machine which isn’t too helpful if you’re outside the VPC inside which the cluster is running. I haven’t found a perfect solution to this one yet, but you can just replace “ip-<numbers>.<region>.compute.internal” with the external DNS name of the master – so you’re pointing at something like http://<cluster master DNS address>:20888/proxy/application_<n>_<n>/ – and then you can happily browse around the web UI from there.

Onward and upward

With all that, I’ve pretty much got up and running with Spark on Elastic MapReduce 4. Now, it’s back to the actual Spark applications again…

Enterprise Apps World 2015 – mobile makes the world go round

Last week Control F1 Lead Mobile Developer Gabrielle Littler and Marketing Manager Florence attended Enterprise Apps World 2015 at the Cloud World Forum to find out more about the latest mobile trends. In this post, Florence summarises her top five takeaways from the event.

  1. Mobile tech is changing the way we work.

As work becomes increasingly project based, the word “office” takes on a new dimension – a bedroom; a café perhaps; or a remote beach on the other side of the globe – as long as it has Wifi, that is. We’re seeing a big rise in flexible working, and the 9 to 5 will soon be no more. As meetings become more organic and work streams more fluid, even corporates are starting to engage in co-working and space sharing practices.

Mobile technology has played a big part in making this transition achievable. A case in point comes from Adidas’ Marc van der Heijden, who described how the entire Adidas Business transitioned over from traditional server usage to cloud based storage system Dropbox. “The majority of our employees were already using Dropbox – either in their personal or professional lives – so it became a no brainer”, Marc told us.

  1. The future of mobile is “always on”.

Amazon is leading the way with its Echo – a wireless speaker connected to Amazon Prime which is able to anticipate what you want and deliver it to you – before you even have to log on to your iPad and make an order.

Similarly Google Now – designed to give Apple’s Siri a run for its money – comes with an added layer of intelligence, proactively delivering information that it thinks might be of interest through what it gleans by analysing your texts.

Apple has responded by making Siri accessible to our apps. In the future advertisers, for instance, will be able to integrate Siri and make their brand pop out of their apps through harnessing Apple’s voice recognition technology.

  1. Empathy is key to good design.

Kelly Manthey of Solstice Mobile, a huge proponent of human-centred design, described how mobile technologists must acquire a deep understanding of their customers in order to deliver the best possible mobile experience. “Interview, shadow, walk a mile in people’s shoes – you need to gain empathy with your users to give them a truly frictionless experience”, Kelly told us.

  1. Technology is becoming part of the fabric of our day-to-day lives.

On the subject of frictionless experiences, Levi’s and Google have created a smart thread, which can be woven into clothing. Used in conjunction with a button-sized computer, the fabric can communicate with a mobile phone, functioning as a touchpad or joystick to control your handset without you so much as taking it out of your pocket.

Meanwhile, the Smile Suggest Chrome extension takes responsive design to a whole new level, basing whether or not to bookmark a page on how much you are smiling, all monitored through your webcam.

  1. Mobile should enhance a physical experience, not replace it

Songza – acquired by Google last year – provides a good example of this. Listening to music is an experience that most people enjoy,  but Songza takes the experience up a notch. The app allows users to select music based on their moods through unique playlists compiled by “music experts”.

Most of what we learnt last week wasn’t about new technology per se; rather, the focus was on new and inventive uses of existing mobile tech to improve our daily lives and enhance the way in which we interact with the world. After all, what is technology for if not to build a better future for us all? Here at Control F1, that’s certainly something we can get on board with.

Big data or better data?

Much has been said and written about ‘big data’ over the past few years, but how much of all the information that we’re keeping hold of do we actually use? Control F1 Product Development Director Dale Reed discusses.

The term ‘big data’ – the ability to compile a mass of data for each of your customers or prospects – is becoming the latest moniker of the digital age. But the reality is that much of the data we hold on our contacts has little or no relevance to their purchasing patterns or when they are in the market to buy our products, and the data that we gather can fast become obsolete and simply bloat our systems with useless, or worse, USED, but inconsequential and often misleading data.

Because servers and storage are relatively cheap these days, the temptation, like a hoarder stacking their newspapers or other treasures, is to accumulate every microscopic bit of information you can and add it to the pile, ‘just in case’. But if we carry that analogy forward, our hoarder will soon find it difficult to put their finger on an important article they wanted to read, when it’s buried under thousands of other pieces of information, and soon the obsession becomes the collection and storage of data, rather than the information it contains.

Conventional wisdom tells us that when dealing with big data, we should be concentrating on the three V’s. Volume, velocity, and variety. Volume, obviously, is the amount of data we can expect to have to store. Velocity is the speed at which that information is going to be accumulated. And variety is the variability of that data, as each source will often have a different format and purpose. By looking at these three factors, we can calculate the storage space and bandwidth we need, as well as the layout of our big data warehouse.

However, perhaps even before looking at the conventional three ‘V’s, we should instead focus on a fourth; Vision. What do we want to use the data for? In our efforts to keep up with big data, this simple and obvious point is often overlooked. By determining the purpose of the data, and the end result we want to achieve, we can drive everything else in the process including, crucially, what information we actually need to store to achieve our goal.

By spending some time on our vision, we can decide what data actually helps us identify and target customers for marketing purposes. Should we be capturing information from social media? How much transactional data should be kept before it becomes inconsequential or even misleading? Can some data be summarised, and therefore eliminated from the data? 

As a simple example, for transactions over three months old, just keep the dates and basket totals. Transactions over 12 months old, just keep a monthly, weekly or daily total. Transactions in the last three months, we can keep all of the basket and individual product purchases.

By keeping a consolidated, clean, and trim marketing database that contains only the data that we know from experience influences customer decision making, we can expect to reap the rewards of the data we hold, and not get bogged down in searching through terabytes of useless or out of date information for the one nugget that can actually help us achieve our targets.

So take the time to analyse your existing data. Find the information that will be useful. Make the painful decision to let go of data that you know will lose its usefulness over a period of time. Hoarding information will not, in the long term, improve your ability to qualify customers for a campaign. Only storing the right information will allow you to do that.

Let’s put aside the ‘big data’ terminology for now, and try for ‘better data’ instead.

This post is an extract, taken from a piece originally written for dmbgroup. View Dale’s original piece here.