Deconstructing Big Data for Travel #1: What is Big Data?

We are witnessing a shift that will completely transform travel. The term given to this shift is Big Data and it will change everything, from the way we travel to the way we interact with travel suppliers to our experience at airports.

No matter what part of travel you are involved with and no matter what job you work in, Big Data will transform it. However, there is a problem with Big Data. The problem is not Big Data itself, rather the hype.

The hype around Big Data and the term itself may eventually disappear but the phenomenon is here to stay – and it’s just starting. Tim Harford summed this up in a recent Financial Times article by saying “Big Data has arrived, but big insights have not.”

What we call Big Data today will become a ‘new normal’ and all businesses and government will have large volumes of data to improve what they do and how they do it. And most travel businesses create a large volume of data and have access to a lot of customer information, but they don’t really know how to leverage it to make good strategic decisions. Without this foundation, adding big data into the mix often adds little value.

More importantly, big data lacks actionable information with which travel businesses can make effective decisions that benefit their customers and their bottom line. Big data can reveal much about what’s going on, when it happens and where it happens. But travel businesses haven’t really arrived at the day when big data can reliably tell us why customers behave in a certain way.

But we are getting ahead of ourselves. Let’s start out with the basics: what exactly does Big Data mean?

What is Big Data?
The term Big Data speaks to the fact that we can now collect and analyse data in ways and in volumes that was simply impossible only a decade ago. The field has grown rapidly since then with many businesses, especially in the Internet sector, creating completely new business models and hugely profitable enterprises based on the principles of Big Data. Analyst firms now also actively track the market and there are formal definitions that separate Big Data solutions from traditional databases and applications.

The most popular of these was provided by Gartner in 2012, which defines Big Data to be “high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing”.

These “3 V’s” of Big Data are very useful to help identify what a Big Data system looks like, but they don’t reveal the pace of change and progress we are experiencing. There was a time only 10 years ago, when a Terabyte was seen as the biggest achievable unit of storage. Today, we can all buy a laptop or PC for a few thousand dollars with a Terabyte of storage, or better still purchase a Terabyte of storage from a Cloud provider like Dropbox for $99 a year. So Big is not what it used to be…!

Big Data for Travel

What does ‘Big’ look like? By 2050 we will have 45 Zetabytes of data globally stored across 100 billion interconnected devices.

Now, you can store an awful lot of data in a Terabyte of storage capacity and if you can scale that up to 480 Terabytes, you could store a digital catalogue of all the world’s books in all languages. That’s almost half a Petabyte. Scale that up to 5000 Petabytes or 5 Exabytes you can store all words ever spoken by every person that ever lived. And we’re still a long way from a Zetabyte, which would require a staggering 250 billion DVD’s to store this volume of data.

So if we have 4.4 Zetabytes of data globally today, where is it all coming from?

Connections: Generating more data on everything
Up to 2010 the primary source of data growth in the world of computing was PC’s. Today, that is different. Everything we do in our increasingly digitised world leaves a “data trail”. This means the amount of data available globally is growing dramatically and rapidly. So much so, that we have created more data in the past couple of years than in the entire previous history of mankind.

Most of the data is coming from the billions of connections we are creating with each other since the advent of the World Wide Web. This includes the messages and emails we send each other every second via Email, WhatsApp, Facebook Messenger, WeChat, Twitter etc. but also from the one trillion digital photos and videos we take and send to each other each year. What we search for on search engines like Google and Bing is also very important, as is what we actually purchase on e-commerce web-sites like Amazon. All of these things together represent a sort of collective consciousness that capture our preferences, desires and behaviours, both as individuals and as groups or populations of people.

Looking forward, this is only going to grow exponentially, as the Internet of Things (IoT) really takes off. Think of all the data from the sensors we are now surrounded by already. The latest smartphones have sensors to tell where we are (GPS), how fast we are moving (accelerometer), what the weather is like around us (barometer), what part of the screen we are touching (touch sensor) and much more. We now have smart TVs, smart watches, smart meters, smart kettles, cars and even smart light bulbs. Of course, we also have interconnected laptops, mobile phones, tablets, Fit Bits, semi-autonomous (parking) and networked cars (Bluetooth), home appliances and energy management systems. A more recent development is ubiquitous imaging at scale e.g. from large numbers of low-earth orbiting satellites from new entrants like Terra Bella and Planet Labs, that challenge the most scalable data storage infrastructure.

Cloud: Big infrastructure for Big Data
With all of these connections generating such large volumes of data, Internet companies like Yahoo, Amazon and Google needed to find more scalable, low cost and “elastic” solutions to store this data. The solution was to purchase large quantities of commodity servers and storage hardware and hook them together into a single cluster in a data centre. To make this work, software like MapReduce and Hadoop was written, which was then published to the public domain as open source software from the Apache Foundation. This trend continued, with many more contributions to this open source initiative, including Spark which brought this work into the realm of real-time processing and machine learning.

All of the large tech companies saw this trend and started to build administration software around this cluster, while also offering access to their clusters as a Cloud service. This essentially moved these clusters into the realm of enterprise computing and it enabled not just Big Data, but other important shifts in the IT sector such as Software as a Service. The key to the success of the Cloud is its scalability, but also its elasticity (the ability to ‘spool up computer resources quickly, with no fixed costs). This Cloud capability makes it very attractive for every business on the planet that does not really need to have the hardware it runs its software as a core competence. And this includes all travel businesses.

Next time:
The difference between Descriptive, Predictive and Prescriptive Analytics
and Predictive propensity models for Travel.



Download the guide to using Big Data for airline growth.