When a virtualized organization runs into an I/O ceiling whereby the application is simply demanding more performance than the backend storage infrastructure can deliver, the common remedy seems to be, “just add flash.” And if that isn’t enough, the prescription is “just add more flash.”
However, what many virtualized organizations may not understand is that flash isn’t a cure – but rather a treatment, a very expensive hardware treatment that doesn’t address the real symptoms of an inherent software problem. When organizations focus on the root of their I/O problems, they can cure I/O inefficiencies and ensure they are getting the most from the flash or spindles they already have in place before unnecessarily purchasing more.
Hardware can process I/O, but it can’t optimize I/O. The first place to start when examining performance in a virtual environment is by looking at the I/O stream itself. Are the I/O characteristics small, fractured, and random? Or are they large, block sequential? If small, fractured, and random, optimizing the I/O stream to behave more like large block sequential reads and write will do more for VM performance than any other optimization approach.
Most virtualized organizations suffer performance dampening from death by a thousand cuts – a surplus of small, random, fractured I/O that steals bandwidth and makes organizations more dependent on IOPS than they need to be, resulting in overspending on flash to compensate.
The problem of I/O traffic that is increasingly smaller, more fractured, and more random than it needs to be, finds its roots where I/O is created on the virtual machine and also where I/O is mixed and randomized and the point of the hypervisor.
On the virtual machine layer, the Windows operating system is unaware of file sizes - which lends itself to free space allocation inefficiencies at the logical disk layer. As files are written, re-written, extended and erased, the relationship between I/O and data begins to breakdown significantly over time. Whereas a 32K file might have been written and read with a single I/O operation on Day 1 when systems were brand new out of the box, as time goes on, Windows fractures files into increasingly smaller and smaller pieces with each piece requiring its own Input/Output operation. Instead of needing one I/O to process 32K, it may require eight I/O in 4K chunks. It is very common to see instances across our virtualized organization of single files that have been broken down into hundreds and even thousands of pieces at the logical disk layer that requires hundreds and thousands of unnecessary I/O operations. This IOPS inflation means systems have to work much harder than necessary to process any given workload. Organizations lose up to half their throughput from server to storage and are led to believe the answer is more IOPS to improve performance, when in fact, the answer is to increase I/O density with more contiguous writes so more data is carried with every write and subsequent read. It’s akin to choosing how to move a gallon of water across a room – with 300 Dixie cups or a gallon jug.
What’s the one factor that hampers application performance the most in virtual environments? Without a doubt, it’s the problem of increasingly smaller, fractured, and random I/O. Systems are unnecessarily taxed in order to process any given workload when processing I/O with these characteristics because they are forced to smuggle bandwidth from virtual machine (VM) to storage. When virtualized organizations end up in a situation where they are dealing with I/O that is smaller, more fractured, and more random than it needs to be, they will find themselves battling dampened performance as their most I/O intensive applications suffer.”
Many IT administrators respond to such application performance issues by simply throwing more expensive server and storage hardware at them—more spindles or flash—without understanding what’s causing the real problem. But simply masking the problem with hardware solutions wastes resources, because I/O inefficiencies at the Windows OS and hypervisor layers end up robbing optimal performance.
What’s the solution? I/O optimization software presents a compelling argument for server-side DRAM caching by targeting small, random I/O—the culprit that penalizes performance the most. But before we learn more about the solution, let’s first dig deeper into what’s behind the problem.
I/O Inefficiencies
In virtual environments, there are two primary I/O inefficiencies that force systems to work much harder than they need to, which leads to slower system performance whether you use a solid-state drive (SSD) or hard disk drive (HDD):
· Small, fractured I/O. The first I/O tax generated by the Windows OS is small, fractured I/O, which occurs due to inefficient usage of free space allocation. Windows OS does not come equipped with intelligence about file size; instead, it scans for the next available allocation at the logical disk layer when writing files. In effect, since the system isn’t capable of selecting the optimum allocation available, it fails to properly manage free space. The result? One file becomes fragmented into many pieces at multiple logical disk addresses. This subsequently hurts performance because each piece of the file requires its own dedicated I/O operation to process as either a read or a write. So for example, instead of being able to process a single 32K file with one I/O operation, the system may break the file down into eight 4K chunks. When this happens, you end up with too much I/O overhead—eight fractured I/O to process a file that really should have been processed with a single I/O. The added burden affects not only the server but also—more crucially—the backend storage device.
· The “I/O blender” effect. It’s bad enough for systems that are taxed with the I/O overhead of small, fractured I/O, but even worse when all those I/O streams become blended and essentially randomized. This is what happens in the second I/O tax in a virtual environment, which is known as the “I/O blender” effect. This storage performance problem occurs when disparate VMs on a single host send otherwise sequential I/O traffic down to the hypervisor. Those I/O streams then become “blended,” which results in a completely random I/O pattern being sent out to storage. Flash may be able to perform well with random reads but not with random writes, the latter of which cause additional cycles of reading, erasing, and rewriting blocks simply to write data to flash memory.
A Software Solution
Many organizations mask these I/O inefficiencies by opting out of actually solving the issues behind them, overspending on expensive hardware based on the false assumption that the answer is more flash or spindles. But the truth is that besides the fact that many IT budgets can’t afford that kind of brute-force approach, relying on hardware solutions only masks the problem while squandering much of an organization’s new investment in flash. As an alternative to hardware’s inefficiencies, when administrators try I/O reduction software first, they can significantly boost storage performance and avoid expensive hardware upgrades.
I/O optimization software exists today that can more effectively solve the application performance issues for virtualized environments. Such transparent, “set-and-forget” software operates with almost no overhead costs, using only idle, available resources. This software has been built from the ground-up to help solve the toughest application performance challenges by better utilizing server-side DRAM for caching—without requiring any new hardware. It works by optimizing the I/O profile for greater throughput while also targeting the smallest, random I/O to be cached from available DRAM. This not only reduces latency but also clears the infrastructure of the kind of I/O that diminishes performance.
Leveraging the DRAM that’s already available in an organization is very different than leveraging a dedicated flash resource for cache, whether the storage medium is PCI-e or SSD. DRAM isn’t capacity-intensive, but it’s much faster than a PCI-e or SSD cache positioned below it. For this reason, it’s ideal for the first caching tier in the infrastructure.
How can administrators best leverage a capacity-limited but extremely fast storage medium? Commodity algorithms that only consider characteristics such as access frequency don’t work for DRAM. But I/O optimization software determines the best use of DRAM for caching purposes by collecting data on a wide range of data points like storage access, frequency, I/O priority, process priority, types of I/O, nature of I/O (sequential or random), and time between I/So. The software then leverages its analytics engine to identify which storage blocks will benefit the most from caching, reducing “cache churn” and the repeated recycling of cache blocks.
Part of the magic of I/O reduction software is that it runs on dual engines: one engine increases I/O density and sequentializes writes, while another DRAM read caching engine focuses on “caching effectiveness” instead of “cache hits.” These two very different engines that optimize reads and writes allow the software to address I/O taxes in a virtual environment with an effective one-two punch:
· Write I/O optimization engine. Automated write I/O optimization technology prevents small, fractured I/O and sequentializes I/O streams by understanding when the Windows OS is about to facture a file into multiple pieces. The technology provides the OS with file size intelligence to help it choose the best available allocation at the logical disk layer, instead of the next available allocation that would likely result in multiple, fractured I/O to process the file as a write or subsequent read. By providing file size intelligence to Windows, the OS becomes capable of making much smarter decisions when writing files, allowing files to be written (and read) in a clean, contiguous, sequential state. This not only increases I/O density, but it also prevents I/O fracturing so that systems can reclaim degraded throughput and process a greater amount of data in a shorter time frame. With fewer I/O being mixed and randomized at the hypervisor for every gigabyte of data, this engine also helps combat the “I/O blender” effect.
· Read I/O optimization engine. On the read side, a server-side DRAM read caching engine leverages available DRAM to target I/O that penalizes storage performance the most—small, random I/O. This behavioral analytics engine makes the best use of DRAM for caching by collecting usage data and I/O characteristics across a wide range of data points. By servicing I/O at the top of the technology stack from the fastest storage media possible, organizations reduce latency and further reduce the amount of I/O to storage, complementing the I/O reduction from the write I/O optimization engine. As a dynamic cache, the read I/O optimization engine throttles DRAM based on the application’s needs, avoiding the problematic possibilities of resource contention or memory starvation.
By prioritizing the smallest, random I/O to be served from DRAM, I/O optimization software eliminates the most performance-robbing I/O from affecting the infrastructure. Not only does this software approach protect IT’s investment in the existing hardware infrastructure, but it also solves performance bottlenecks with no disruption and ensures that organizations reap the most benefit out of any future storage system investment, whether with SSD or HDD. Administrators no longer need to worry about carving out precious DRAM for caching purposes, as the software dynamically leverages available DRAM. Depending on the I/O profile, I/O optimization software can commonly result in gains from 50 percent to well over 600 percent on existing systems with only 4GB of RAM per VM. While this level of faster application performance is typical, some organizations may even opt to add additional memory for caching if they are looking for the fastest performance possible.