Reflections from Strata + Hadoop World: how big data is moving up in the business

As part of our commitment to new and disruptive technologies, we sent Lead Developer Phil Kendall to Strata + Hadoop World in London. As well as taking over our Twitter feed for the week, Phil’s written up a summary of his experience at the conference – enjoy!

One of the themes from the keynotes at Strata + Hadoop World was the idea that big data is moving up in the business. For a lot of companies, big data has been a tactical solution, deployed only to fit a specific need. There were obviously some companies – for example those in the online advertising and mobile gaming industries – who had built their business around a big data model, but for the more traditional industries, big data wasn’t “a part of their DNA”. What we saw at Strata + Hadoop World was some big, risk-averse companies starting to restructure around a big data model: we had both Maite Agujetsas, CTO of Santander Group, and Phillip Radley, Chief Data Officer at BT (watch his keynote here) explaining how they were rebuilding their business with big data as a first class citizen.

One of the advantages of being a tactical solution is the freedom to innovate – and this was the theme of an excellent keynote from Tim Harford of the Financial Times. While making marginal improvements is a valuable activity (to steal from Tim’s keynote, it’s how Team GB won 70% of the track cycling gold medals at the 2012 Olympics), we also need to invest in those ideas which go “outside the box” and can make a radical change to a business, an industry or whatever else. Those ideas are risky – the vast majority of them aren’t going to give a good return on investment. How can we ensure that we’re still going to invest in those issues when big data has become a core, business critical part of a big business?

If you haven’t watched Tim’s keynote yet (and you really should), then don’t expect Tim to provide any answers here – his keynote was more a plea to ensure that we do keep providing that kind of investment, along with some great statistics from medical research in the US which showed that risky research can be as valuable as those small changes in the long run.

Some other themes I picked up on from the conference:

  • Hadoop has been around since 2005 or so. In the past 10 years, we’ve learnt a lot more about what we’re wanting to do with big data, and that’s meaning that Hadoop is starting to show its age. Is the time now right to move to a “2nd generation” product for big data, and is that product Spark? (I may be slightly biased on this one having been to the fantastic Spark Camp by Paco Nathan. The kool-aid was very tasty).
  • How is the Internet of Things and the enormous quantities of data that we’re going to see going to affect how we build systems? Ted Dunning of MapR and Maaten Ectors of Canonical (the people behind Ubuntu) gave different talks on this – Ted’s focused on how to get our existing databases to scale up massively (and you can view a variant of it here), whereas Maarten’s focused on how to push the processing of that data closer to the sensors so that we don’t actually need to send and store all that data in the first place. A different take on the problem came from Simon Elliston Ball of Hortonworks – once we have all these devices pushing out data, who’s going to write the code to interpret the information from each and every one of them? For even a relatively simple project like making his house work out when to turn the lights on, Simon was already writing large amounts of code – so he turned to machine learning to do it all for him. Did it work? Not so far, but the idea of reducing the amount of code that developers need to write in order to deploy a solution is an interesting one.
  • How are we going to deal with the inevitable ethics questions that come out of big data? Majken Sander and Joerg Blumtritt gave a really powerful example in their talk on “Algorithm ethics: The inevitable subjective judgments in analytics”: when we have fully automated cars, who decides what happens in a crash? Should the car swerve out of the way to avoid hitting a 30 year old entrepreneur if that involves hitting a 70 year old grandmother? Perhaps more importantly, how can the general public know which decisions the algorithm is making? The theme was repeated in Francine Bennett and Duncan Ross’s talk “Using data for EVIL” (watch Francine’s summary of the talk here) – which helpfully let us know all the ways in which we can be unethical with our data. But of course they didn’t want us to do that… Aside: if you’re a data scientist actually looking to do good rather than evil, check out DataKind, the charity of which Francine and Duncan are directors.

Where does that leave us all? We know that big data is changing the technology landscape, but in the next few years we’re going to see it change industries outside the high tech sector – and that’s a great opportunity either for new players to come in and disrupt markets, or for incumbents with the vision to invest to reestablish their dominance. In the longer term, the Internet of Things is coming – while it’s very much in its infancy right now, we’re starting to see the beginnings of ubiquitous connectivity. I don’t think anyone has a real handle on how that’s going to change things, either at a technological level or an ethical one. About the one thing I am sure of is that it’s going to be an interesting ride – jump on board, or soon your business will be looking around and saying “where did our market share go?”


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s