Big Data: separating the hype from the major opportunities 

‘Big Data’ is not a new concept. First coined in a 1998 paper, its roots can be traced back to the concept of the ‘information explosion’ in the 1940s. Its potential is seemingly limitless; the ability to make the most of huge amounts of data can provide a boost to businesses looking tounderstand their audiences and even help solve global crises. By Daniel Beazer, Senior Analyst, Peer 1 Hosting.

  • 9 years ago Posted in

DESPITE SOME WIDELY publicised use cases Big Data suffers from its reputation as a technology in search of a solution. For now, adoption remains small, and if you leave data warehousing out of the tally, miniscule. Only 13 per cent of respondents to a recent Gartner survey had Big Data projects in production (and the vast majority of those were data warehousing projects).

The number one reason given for the lack of adoption is the immaturity of platforms. Big Data in its open source flavour (i.e Apache Hadoop), is known for being ‘difficult’, and using a vendor like Cloudera to deploy and support can be extremely expensive. It’s the Linux/Red Hat issue all over again, feasible and worth it only if you have a massive engineering team.

But the platforms out there will gain maturity, and no doubt at a rate faster than Linux adoption. The enterprise is also starting to become more and more data driven. Like any other disruptive technology trend, Big Data requires a complete rethink of the existing systems in place to make the most of it. Plenty of analytical tools have appeared to supplement Big Data’s growing popularity but arguably less focus is being placed on the infrastructure requirements.

Impact on the data centre
The first step should be to look at the potential impact that Big Data can have on businesses. Data analysis tools and business intelligence software used to visualise and analyse huge sets of data will require all of the actual ‘number crunching’ to be carried out in the datacentre itself. This places an unprecedented amount of operational pressure on the data centre’s existing capacity, expanding already sizeable energy requirements.

In general, physical data centre architecture tends to respond to the growth in demand by scaling up to accommodate growth, adding more operational capacity. This method may provide added capacity but doesn’t consider the implications for performance and responsiveness; as physical servers scale up in capacity, processing power is often compromised. Smaller enterprises with limited budgets should take note before investing in big data tools such as business intelligence platforms, as their physical server framework is likely to be overwhelmed by the ‘data deluge’.

The great advantage a data centre has as a home for big data applications is plentiful and scalable connectivity, both of the LAN and WAN kind. If petabytes of data are being replicated in real time across multiple Hadoop nodes, you are going to need that fibre cross connect. Secondly, Big Data analytics is generally a joint endeavour. Projects are run in conjunction with partners who manage various parts of the project, be it integration, analysis, or re-purposing. It’s handy for them to be able to collocate their servers in the same data centre with only a fibre cross connect between them and the production site. If the future of the application is unbundled, the data centre is still going to be a popular spot for customers.
Moving to the cloud
Cloud hosting is an alternative option. Businesses are increasingly opting to rent virtual compute and storage rather than invest in their own physical servers, cutting their operational costs by outsourcing the responsibility of managing their data to an external provider. This is particularly useful for businesses that may not have the budget flexibility to acquire their own physical space.
IT departments and business owners should also consider that Big Data doesn’t necessarily require a constant level of high capacity in the datacentre; projects can be short, perhaps lasting a month or two.

As such, it’s vital that businesses look to implement a highly scalable solution in which capacity and processing power that can be adjusted to peaks in demand. The growing popularity of this solution is likely to be the best option for businesses unsure on how to manage the ‘data deluge’ that Big Data presents, particularly for those currently using their own server space. As against these advantages there are the problems associated with a shared platform, i.e. network constraints and the fact that you are tied to your cloud providers’ partner eco-system rather than one of your own choice.

The future of the data centre
Cloud hasn’t killed the data centre model, but it is having an impact. For example, the colocation industry is still growing at a rate of eight per cent annually despite concerns that the cloud would undercut its core business model. Many businesses with the budget and expertise to operate their own facilities will still be able to adapt their existing infrastructures to the growth of Big Data, likely with a virtualised storage model.

Cloud is not the only technology that has dropped dramatically in price in the last few years. The cost of building an in-house storage platform has fallen so rapidly in the last few years, it’s now a viable option for a medium sized enterprise. Falling hardware costs, the growing acceptance of software by storage vendors and the standardisation of components have all contributed. Making sure IT infrastructure is ready to handle data requirements not just today but for five years time should be a priority for every business owner or IT department.