Finding the small in big data

CSW Editor, Martin Banks, discusses finding the valuable small data in the morass of big data now starting to swamp business with Tibco’s European CTO, Mark Darbyshire.

  • 11 years ago Posted in

Arguably, big data is boring, and most of it is totally irrelevant. And that, of course, is the real point of it all, for the key function of all big data developments is to tease out those items of small data, the anomalies and exceptions that identify the transient events which are the basis of all useful predictions.

It is often those transient events which are the key clues to identifying and understanding the entirety of the experience customers have in dealing with any business, and that is now becoming one of the key goals for businesses. This is the key, especially in the fast-moving world of cloud-based services, to making a business `sticky’ to its customers.

Finding those events, however, can be difficult, and as Mark Darbyshire, European CTO of Tibco, pointed out, the real trick now is being able to combine different sets of data so that different transient events can be correlated together.

One example of this that has gained prominence over the last year is the amount of real-time metrics data produced by a modern jet engine. For example, the General Electric GE90 engine used on the Boeing 777 and Airbus 330 produces as much data about itself in a day of operations as that produced by all the Tweets on Twitter in the same timescale. And those planes each have two engines, so one plane produces twice as much data a day as Twitter.

But in the end, the vast majority can be seen as irrelevant – except of course as a confidence builder in the continued reliability of the engine. It is still producing `X’ lbs of thrust and spinning at `Y’ RPM.

“If I’m just looking at the same data source and its 17,500 rpm, and two hours later it’s the same, then all I can deem from that is high confidence,” Darbyshire said. Whereas what I’d like is a lot of disparate sources of data, so I want the engine data, I want the outside temperature, I want how much caffeine the pilot has had. The correlation between these shows me a much deeper insight into the overall system.”

This idea translates well to business applications, of course. Businesses now need a set of data sources, probably unique to their operations, that could be used to feed a similar type of analysis, so that users are not just looking  for the `click’ at the end of the process. There is now a growing need to look at the whole transaction process to get a deeper insight.

For example, Darbyshire suggested that it could easily be possible to track how long it took a customer to fill out their credit card details and estimate their age from that.

“You could gain a lot of data by looking at the end to end, how many clicks per purchase, how many times does a purchase fail to that person, there’s a lot of information along the way,” he said. “Why did a process take a minute for one person but a lot longer for the next? They’re both ordering the same product, so from the business perspective it should be the same.”

Another example might be tracking the customers that return the most goods bought via online services. This can have multiple commercial impacts for any business, ranging from carrying the costs associated with the return, such as rechecking the integrity of the product and re-packaging it, though to possible reputation damage via social media. Understanding those customers more – or simply block further transactions with them – means that both options then become available.

“I don’t want to have to be resending it you every three weeks because you can’t be bothered to open your front door, because it turns out you’re deaf and can’t hear your front door,” he observed. “Increasingly, it’s not just about the large amounts of data, it’s also about being clever getting the right sort of data sources to then analyse and correlate.”

In Darbyshire’s view big data analytics is now less about the engineering of data and more about data artistry. It is now possible for data scientists to determine who your friends are, where you go to work, who your parents are, and who your children are, all from your call records.

While rich in content from the business perspective, it does start to raise almost moral questions that need to be addressed, and when it comes to Europe at least, Darbyshire expects to some fairly clear cut privacy laws.

“I have this exact scenario with a mobile phone service provider,” he said. “It has a global gateway that collects all this data and then anonymises all the phone numbers before it hits the log file. So I can consistently see the behaviour on individual users because I’ve got a consistent token, but I don’t know their phone number.

“The question then is: is that immoral? How is this any different to the corner shop not knowing your name? They don’t know where you live but they know that you come in at 9am every morning, and buy a copy of the Times and then have a cup of coffee. So they keep back a copy of the Times, and as they know how you want your coffee they pre-prepare it... it’s no different to this scenario is it? I know nothing about you but I know everything about your behaviour, does that get you around the immoral dilemma?”

This is, on some basic level, how every business should operate, taking advantage of customer behaviour patterns by modifying its own behaviour. Darbyshire is well aware of the risks of having the `wrong’ sort of data, which is fast becoming one of the key issues for companies looking to use big data analytics. It is one of key reasons that having data scientists as part of the data collection/aggregation/exploitation loop is now becoming ever more important.

“Behaviour and identity are very different; your identity is where you live, your phone number, what car you have. Your behaviour would be what sort of cars you want to buy,” he observed.

“It’s very unlikely that under normal circumstances a mechanic would go, right I’ve got the licence plate so I’ll get onto the DVLA site and see the address... people have got better things to do with their time. Unless of course he is already evil, but you can see how preposterous the example appears.

“It is much more likely that mechanics will observe that, while stamping your log page they notice that the last three stamps have all been in this year. That should tell them that your car needs a lot of servicing, so why don’t they offer you a service programme running over the next five years, which will be 30 percent cheaper than you coming in four times a year.”

That is a good example of what big data is all about, finding the right information and exploiting in order to offer customers something advantageous to them. Sifting those nuggets from the morass of data `normality’ is, however, still a trick that many businesses need to master.