With the emergence of cloud computing, networks and the services they provide are increasingly virtualized, transforming the way in which IT is delivered and consumed. By 2016, Gartner projects that the bulk of IT spending will be invested in cloud computing platforms – with almost half of large enterprises having some form of cloud deployments by the end of 2018.1 This transformation presents an opportunity for network operations and management teams to evolve from more traditional manual provisioning of network devices and services to agile orchestration and an operations-centric environment, integrated with a digital performance strategy that supports their journey to the cloud.
Admittedly, acquiring a unified view of network, server and application performance is difficult, but when working across both physical and virtual infrastructures while supporting a wide range of applications, services and device-types means that gaining accurate and timely performance data with a unified view has never been more challenging.
However, as the challenge grows, so too does the necessity of having appropriate systems in place to ensure stability. There have been a host of high-profile outages reported in the media over the past months, with major disruption caused in industries such as air travel with the well-publicized grounding of flights for United Airlines, financial services shutdowns with high street banks and even the New York Stock Exchange impacted, and multiple instances of telecoms networks failing under crippling loads. It is likely that as cloud adoption grows so will the potential for downtime - which can have a catastrophic impact on business efficiency and customer loyalty.
IT departments can prepare for these inevitable challenges by acknowledging them, preparing for the issues they will face on the journey to virtualized infrastructures and ultimately leveraging this knowledge as the foundation for a robust performance visibility strategy.
So what do these challenges look like, and how can they be managed to avoid a situation where a team is left fighting to mitigate outages with tools that are ill-equipped to handle a physical and virtual infrastructure?
Managing the complexity of multiple performance-management tools
Traditional tools employed for managing performance have typically been point solutions supporting a specific area – i.e. applications, network infrastructure, server infrastructure and virtualization. This means that many organizations today have four or five different tools employed as discrete components. Because cloud services are delivered as one offering, the ability to collect data from different sources, overlay that data in one interface and then view performance metrics as a whole becomes extremely important for identifying and addressing service-impacting issues. Further, many of the new virtualization tools lack the standard operations center functionality required to proactively assure application performance and reliability.
A single performance management platform is the best-fit solution if operators are to avoid the cost and complexity of having to deploy and configure multiple systems and then aggregate performance data back to a single point to achieve end-to-end visibility.
Securing total performance visibility
Cloud computing provides companies with access to resources over a variety of devices and locations. This presents a challenge as it requires a performance management platform that can integrate a variety of data types across a range of infrastructures.
Traditionally, network managers use SNMP polling techniques to monitor this data – but performance visibility gaps often appear when using this method alone. In order to extend performance visibility, providers should look to leverage more than one method of data collection and analysis. Through utilizing NetFlow, IP SLA, and IaaS- and PaaS-based APIs to monitor response times, IT departments can gain insight into the health of the network and infrastructure, allowing them to act when necessary to identify where problems originated.
Detecting abnormalities across the infrastructure
Where companies utilize on-demand self-service to allow their customers to interact with the service provider, they should consider the impact those end-users could be having on the infrastructure. Without complete visibility, IT departments may not be able to tell when a consumer is inadvertently creating increased traffic.
A robust performance plan should be able to identify an abnormal spike in consumption, or unusual behavior, and provide a clear path to the offending site or end user. One way of doing this is utilizing metrics-to-flow analysis, which will reveal the composition of traffic.
Additionally, it’s worth considering the automation systems facilitating new users and devices in the network as it expands and changes. Ensuring robust APIs are in place, interfacing with compute, network and service management automation systems, can ensure performance visibility over a rapidly expanding and changing environment.
Ensuring complete transparency
Most cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction. This usage can be monitored, organized, and reported, providing complete infrastructure visibility for the provider and consumer. In order to maximize this transparency, IT departments should look for a performance visibility partner who can measure cloud consumption. This can ultimately save money for companies utilizing a usage-based service cloud, as they can ensure that they are being billed the right amount, allowing them to issue a robust challenge if it seems they may have been overcharged.
Effectively managing resources
Clouds pool network, compute, and storage resources shared across a range of users, including multi-tenant environments. Enabling performance visibility for multi-tenant cloud environments requires tenant-by-tenant isolation, with careful attention to user and data access permissions. An enterprise supporting multiple business units will likely want options to share the network, compute, and storage resources across each of its “customers” in a multi-tenant environment.
An effective performance visibility strategy will have the flexibility to report segregated experiences for different end-users in a controlled manner, securing visibility of relevant devices and objects. It will allow a team to define policies that automatically group and classify discovered devices, ensuring restricted access via authentication.
Expanding monitoring capabilities
As IT silos continue to converge, especially in virtualized data centers and with cloud-based services, the need to monitor more than the network infrastructure increases. This includes monitoring what is happening inside the ESX server, the virtual machines (VMs), and across all network devices. Expanding monitoring capabilities in this way allows operations to effectively determine where a service impacting performance degradation is occurring, in the network, on the ESX server, or the VM itself.
Conclusion
Ultimately, whether an organization is automating the provisioning of virtual machines, networks, and storage in a private cloud, migrating its enterprise applications to a public cloud, or leveraging the growing set of Software, Platform, or Infrastructure as a Service models, the performance monitoring platform should support the journey to the cloud.
Being cognizant of these challenges will help ensure that an effective, scalable performance management strategy is in place, providing a business with the ‘virtual sanity’ necessary to confidently incorporate virtualization into its infrastructure with minimal risk of outages.
1 ‘Gartner Says Cloud Computing Will Become the Bulk of New IT Spend by 2016’