However, while these tools may be sophisticated, it’s not always necessary or even desirable to set up an entirely automated closed loop control environment. In Kohler Uninterruptible Power’s experience with UPS installations, for example, an element of manual intervention can be desirable if not essential.
In this article, Alex Emms, Operations Director at Kohler Uninterruptible Power discusses the role of data centre systems management, and explains why one-way electronic communication is, sometimes, the best option.
What do we need to monitor?
To optimise a remote monitoring and control management strategy, we first need to identify the tactical and strategic benefits that we want. Examples of tactical information include reported battery temperature, voltage and resistance values; any excessive levels warn users of impending fault conditions, allowing corrective action to be taken. More strategic information would include reports on power consumption, where power is being consumed, and variations in processing load. This informs longer-term planning for the allocation and distribution of power within the data centre, and can provide opportunities for reducing wasted capacity.
Monitoring of environmental variables, particularly temperature and humidity, is essential to ensure that cooling strategies remain effective. It’s also important to monitor less obviously process-related factors, such as security and access, to protect the well-being of the UPS and other data centre equipment.
Why the data centre environment is so challenging
While reasons such as these make a management strategy desirable, achieving it isn’t necessarily straightforward; there isn’t always knowledge of the equipment currently deployed, or of its status. The data centre’s equipment population is usually the result of ad-hoc growth to meet developing demand, with older equipment being exchanged for newer, more powerful upgrades. Poorly-managed additions and normal employee turnover can erode knowledge of the equipment that’s installed; there can be ‘zombie servers’ that consume power and space while contributing little or nothing to the data centre’s productivity.
With the advent of virtualisation, processing loads as well as hardware become highly variable. They can rapidly change and switch location within the data centre as virtual machines are deployed to meet variations in user demand.
The role of DCIM
While these issues create barriers to understanding the status of a data centre and its equipment, overcoming them is essential. The consequences of an IT failure can be extremely serious, with interruptions to service, possible hardware damage, loss of data and impact on reputation. Apart from the risk of failures, a lack of visibility denies users the opportunity to improve power efficiency – an increasingly critical requirement for both commercial and social responsibility reasons.
Fortunately, this situation, though challenging, is widely recognised, and solutions, in the form of data centre infrastructure management (DCIM) systems, exist. These provide access to accurate, actionable data about a data centre’s current state and future needs; critically, they can also exchange information with building management systems to provide a more comprehensive, higher-level overview of data centre status.
Operational sustainability and its impact on data centre availability
The Uptime Institute (UI) is best-known for its development of the Tier Classification system, which allows stakeholders to efficiently and accurately align their data centre’s availability level with its business requirements. However, the UI recognises that the long term availability of a data centre infrastructure is not guaranteed by Tier level alone; it is also based on its operational sustainability. Like DCIM as described above, their operational sustainability concept extends to the data centre building as well as its ICT equipment, and defines the behaviours and risks, beyond Tier classification, that affect data centre uptime.
According to the UI, the three elements of operational sustainability, in order of decreasing impact to operations, are Management & Operations, Building Characteristics, and Site Location. Their Abnormal Incident Reports (AIR) database reveals that the leading cause of reported data centre outages are directly attributable to shortfalls in management, staff activities, and operations procedures.
This poses the question; how can you set up a remote management system to eliminate or minimise these problems of human behaviour? As we shall see, in Kohler Uninterruptible Power’s experience, the answer doesn’t necessarily lie in higher levels of automation.
Impact on UPS remote monitoring and management strategies
Monitoring and control of UPSs must be part of any DCIM strategy. This increasingly involves an element of remote communications, as many organisations’ data infrastructure now includes ‘edge’ micro data centres, so-called because they are located out at the edge of an enterprise, geographically distant from any operations centre.
These locations may be to place the data centres close to the point at which data is being generated, avoiding the need to send large volumes of raw data over long distances. However, sustainability and green energy can also be factors. The WindCORES project, for example, has deployed data centres into wind turbine structures; these data centres are powered over 90 percent by wind. With plenty of space inside many wind turbine towers for IT and infrastructure equipment, this sets a path for low-emissions distributed data centres of the future.
Irrespective of the reason, the result is a proliferation of widely-distributed small or micro data centres, many of which are unmanned. A remote monitoring and control management system, which improves data centre reliability and efficiency through a comprehensive DCIM as described, may appear as an ideal solution in such circumstances. This also applies where a remote support resource is being used to monitor a larger data centre hub.
However, in Kohler Uninterruptible Power’s experience, this shouldn’t necessarily include automating the control aspect. Firstly, there can be a mistrust of two-way communications systems, and they are seen as a security risk in some organisations. There has been more than one instance of communications equipment manufacturers being blacklisted over concerns about data misuse and associated security risks.
Even without this concern, KUP’s experience has shown that if remote access and control of a UPS is too generally availability, damage can be inflicted either by carelessness or malign intent. Better security can be promoted through using one-way communications solutions. The UPSs should remain closely monitored, but the reaction to a fault should be a phone call or email to alert an authorised technician located near to the UPS. This makes it easier to limit access to the appropriate personnel.
Nevertheless, once such a warning has been flagged, an appropriate response is essential. Technicians need to arrive on site, even if remote, within an agreed timeframe, and equipped with the training, documentation, equipment and parts needed to effect any repairs.
This means that, when evaluating potential UPS vendors, it’s essential not to look only at the system’s functionality, performance and reliability. A review of their service team is equally important. Does it have the geographical coverage needed, and can it meet the criteria given above?
Although not strictly part of a remote monitoring or control strategy, provision of an effective scheduled maintenance strategy is also essential. By ensuring that the batteries and other UPS components are in top condition, UPS life will be extended. Additionally, dependence on any remote control strategy - however it’s implemented – will be reduced, along with exposure to failure.
Conclusion
In this article, we have seen how gaining accurate visibility and control of your data centre equipment can be challenging. This is because larger, hub data centres may have large quantities of equipment that’s poorly understood for one reason or another, while smaller, edge facilities are typically remotely located and unstaffed. Nevertheless, achieving visibility and control is essential, to spot latent problems, prevent disastrous failures, and optimise energy efficiency.
A popular answer is to deploy data centre infrastructure management (DCIM) systems, as these are designed to deliver the information that’s needed, in time to allow effective responses. By linking to building management systems (BMSs), users can obtain a holistic view of how the entire data centre is performing, to allow better-informed responses and strategic planning.
Kohler Uninterruptible Power recognises the critical requirement for a level of monitoring and control, whether it’s a full DCIM system or something simpler. However, based on their experience, they sound one note of caution; running a full monitoring and control system with two-way communications can pose a security risk. It may be better to deliberately build manual intervention into the control loop, to mitigate this issue.