Given the rapid and spectacular advance of most data centre technologies, the way that we deal with data itself is something of an anachronism. Just look at the great strides that the industry has made in response to the problems of the day. When data volumes began to skyrocket and underutilized servers couldn’t cope, along came virtualisation. The infinitely more complex networks required by virtualized environments led in their turn to Ethernet fabrics. Rising energy costs and environmental concerns gave rise to hot aisle / cold aisle configurations and Power Usage Effectiveness (PUE) metrics. It seems that no problem in the data centre has been too complicated for the industry to solve – except those caused by data itself.
Data centres are still managing and storing information with essentially the same antediluvian technology as they used before the data deluge – the humble database. With the exponential growth in data volumes, these huge creaking repositories of information have become the weak link in the data centre ecosystem. Weak data management results in bottlenecks that slow down projects, drive up costs, and drag down project quality. Development teams in large firms often wait for weeks to receive a new development or test environment and wait even longer to refresh data in those environments. As schedules slip, test cycles are cut, bugs slip into production, and everyone loses.
Getting the right project data to the right people at the right time requires data centre administrators to get to grips with the data itself. The following five tips will help any administrator to start unblocking the data bottlenecks, and begin to solve problems that have repercussions far from the data centre itself.
Top tips:
1) Block-level caching and sharing
Databases retrieve data blocks from storage, assemble the blocks and return query results. However, storage is often somewhat slow, and even tier one storage is often on spinning disks. Each call to the array burns up time and the aggregate of millions of calls is slower database performance. Intelligent caching and block sharing technology can eliminate calls to the storage array. Recent benchmarking that Delphix and IBM completed showed that using intelligent caching can eliminate over 50 per cent of the calls to the storage array. In fact, transactions per minute for data warehousing workloads were five times faster with block-level caching and sharing than with standard physical databases and storage.
2) Virtualized backup and restore
Data volumes have grown to the point where they have overloaded batch windows, and many organizations can only afford (in terms of time) to backup key databases once per week, due to the hit on networks, servers, and storage. When an outage does occur, recovery of the large data sets can take days, halting key data centre operations while in process. Virtualizing backup and restore at the block level can eliminate the delay and hit to systems during backup. For some customers, the backup window has been reduced from eight hours to eight minutes by avoiding full or incremental backups and using block virtualization instead.
3) A/B index testing
Improvements to indexes for large databases can result in significant performance gains. However, database design makes it very difficult to A/B test changes to indexes in large databases, so this is rarely done. Jonathan Lewis, a prominent Oracle performance tuning consultant in the UK, has said that index optimization is a common technique he uses to extract additional performance in critical databases. In practice, creating multiple test copies of a 10TB database, each with a slightly different set of indexes, is too costly and time consuming. Intelligent virtualization of the data blocks enables admins to create multiple copies, try different index changes, and then roll these into production, with noticeable gains.
4) Offloading reporting
A common performance drag on transaction systems is operational reporting. Perhaps someone in finance wants to run some reports near quarter end to check revenue. Over time, more people are given access to transaction systems for reporting until, very commonly, the reporting queries slow down the system and transactions become slow. Offloading operational reporting to a sync'd system frees up CPU and I/O on the production system, and virtualization can create these reporting systems quickly and easily.
5) Upgrade to SSD storage
Solid state/flash storage provides a clear performance boost over traditional spinning disks, and results in snappier application response. Typically, price / performance considerations restrict the use of flash in the data centre, but with virtualization-driven consolidation of databases, it's possible to get 10x improvement in price / performance for flash, supporting rollout much more broadly in a data centre.
This advice, if followed, will certainly improve the overall performance of data, but it doesn’t solve the more holistic problem of unnecessary data creation in the first place. This is where the virtualization, the technology that enabled us to store these vast volumes of data, is being repurposed to address the deluge of data.
One of the biggest contributors to ‘information proliferation’ is the way that organisations are forced to create multiple copies of big databases. For every live database, there is an extended family of anywhere between two and 30 copies which are maintained by employees in different departments for any number of applications, such as testing and development, back-up, reporting, business continuity and so forth.
Today we’re beginning to see the emergence of technologies that harness the power of virtualisation and apply them to databases themselves. These new techniques go beyond simply running a database in a virtualised environment; instead, they virtualise the database itself. Instead of employees creating multiple copies of the same database - each with their own additions and amendments, and varying degrees of data ‘freshness’ and accuracy - the technology makes a single copy of each database and presents each person with a virtual instance every time it is needed.
If the tips above are the remedy for an overindulgence in data, then database virtualisation is one of the most powerful preventatives of the problem. It’s crucial that data centre administrators take an holistic approach to their information – and this means both prevention and cure if they are to tame the problem of rampant data and deliver faster processes throughout their organisation.