Uptime Institute has reported on data center outages for several years, surveying operators on their experiences with outages, and closely tracking publicly recorded incidents. Even with the inherent difficulty collecting and assessing this information, clear trends from our research emerge: In surveys from 2018 and 2019, and now supported by our 2020 survey, outages occur with disturbing frequency, bigger outages are becoming more damaging and expensive, and what has been gained in improved processes and engineering has been partially offset by the challenges of maintaining ever more complex systems. Avoiding unplanned downtime remains a top technical and business challenge for all owners and operators.
“Our 2020 survey results reflect a strong, growing sector facing increased change and complexity,” said Andy Lawrence, Executive Director of Research, Uptime Institute. “The growing complexity, along with the greater consequences of failure, creates the need for more vigilance and more sophisticated approaches to resiliency, performance and operations”.
Uptime Institute annually conducts its comprehensive global survey across the data center industry. This year’s survey was conducted March-April 2020 and includes responses from nearly 850 managers at organizations that own and operate data centers in more than 50 countries. This group is the focus of Uptime Institute’s new report. A second survey was conducted among a group of over 500 suppliers, designers, and advisors. Those survey results will be released in September 2020.
Bigger, More Frequent Outages, More Painful to Business, and Operators Admit Most Outages Were Preventable
Outages continue to occur with disturbing frequency, and the outages are becoming bigger, more damaging, and more expensive — a fact supported by Uptime Institute survey findings for three years running. In each of these surveys, about one-third of all respondents said they had had been affected by a significant, serious, or severe outage — which can cause substantial financial and reputation damage, impacting organizations in a tangible way in the past year. And over the previous 3-year period, more than three-quarters said they had experienced such an outage.
For 2020, Uptime Institute also delved deeper on the impact of an outage, including smaller service outages that are often not officially recorded. (See Uptime Institute’s Outage Severity Rating for more insight into impacts from various outage categories.) Outages of this type are troubling more for their frequency than for their singular impact — and because they may signal bigger problems.
Three-quarters of organizations admit that, upon reflection, most recent significant outages were preventable. With additional attention and investment, the number of outages will most likely decrease. Power problems continue to be the largest single cause of major outages.
Transparent Clouds are Good for Business
Not surprisingly, organizations surveyed are increasingly embracing public cloud; as a venue for IT workloads, public cloud is expected to increase from 8% of all workloads today to 12% within a two-year period. This growth accounts for the biggest increase of usage for an IT venue in the survey. However, public cloud usage is minimal as a percentage of total enterprise IT workloads, and this limited adoption appears to be a strong growth opportunity for public cloud providers.
But there are obstacles. According to respondents, the lack of visibility, lack of transparency and accountability of public cloud services is clearly a major issue for these enterprises considering public cloud for business-critical applications. A fifth of managers said they would be more likely to run their critical workloads in a public cloud if there were a higher level of visibility into the operational resiliency of the service.
Average Site Energy Efficiency has Flatlined; Rack Densities are rising, but Facilities are Not Stretched.
The findings of last year’s survey showed that data centers had become marginally less efficient in the preceding year (average PUE of 1.67 in 2019, compared with 1.58 in 2018). In 2020, the average PUE (power usage effectiveness) for a data center was 1.59, a slight improvement. (Most operators strive for a PUE ratio as close to 1.0 as possible.) Because more work is now done in big, efficient facilities, the overall energy efficiency of IT has improved.
The mean density for rack density in 2020 was 8.4 kilowatts per rack. Additionally, densities are rising, but not enough to drive wholesale site-level changes in power distribution or cooling.
· The enterprise data center is neither dead nor dying, while the Edge is still on the edge. The migration of critical loads to a public cloud is happening slowly, with more than half of all IT workloads expected to remain in traditional on-premises data centers through at least 2022. Edge computing requirements are expected to increase slightly in 2020, but fewer than a fifth of all respondents expect a significant increase.
· Artificial intelligence will not take over – yet. Artificial intelligence and automation will not reduce data center operations staffing requirements in the next five years. After that, however, most think it will.
· The data center staffing crisis is getting worse. The number of managers stating they are having difficulty finding qualified candidates for open infrastructure positions is rising steadily. Women continue to be under-represented. More effort is needed to address the workforce gender imbalance and take advantage of the larger and more diverse skilled talent pool.