PayGo ensures high availability of SQL Server in the AWS Cloud

PayGo is using SIOS DataKeeper on Amazon Web Services (AWS) utilizing Elastic Compute Cloud (EC2) virtual servers with solid-state drive (SSD)-only storage for rapid, automatic failover needed to ensure high availability (HA) for the company's mission-critical SQL Server applications.

  • 5 years ago Posted in
PayGo is an integrated utility payment solution provider that manages the largest energy company prepay programs in the United States. PayGo is currently running four production environments in AWS, with another coming online soon, with SQL Server 2017 Standard Edition running on Windows Server 2012 R2, and plans to migrate to Windows Server 2019 after testing is completed.

 

The Challenge

As a private, non-profit organization, "Our backend SQL Servers hold terabytes of data that must be available 24x7," explained Chad Gates, senior director of infrastructure and security, PayGo. "As a Windows shop, we prefer to use Windows Server Failover Clustering (WSFC) for data protection and continuous operation in case of any failures. But WSFC requires some form of shared storage, like a storage area network (SAN) and that isn't natively available in AWS."

 

With AWS's lack of shared storage, PayGo was forced to use SQL Server's transaction logging and log shipping to protect the data. Although requiring manual intervention, this approach was acceptable for disaster recovery (DR) purposes. But it could not provide the rapid, automatic failover capability needed to ensure high availability (HA) for the company's mission-critical applications. "We had another option, but we believed there were more cost effective solutions," according to Chad. "We could use the Always On Availability Groups feature in SQL Server Enterprise Edition, but that would cost us hundreds of thousands of dollars that could be spent on other mission critical initiatives. We felt there must be a better solution, so we started looking for other options."

 

The Evaluation

In its search for a capable and cost-effective HA solution, PayGo established four criteria: seamless integration with Windows Server Failover Clustering; high disk throughput performance to satisfy demanding recovery point and time objectives; ease of implementation and dependable ongoing operation; and responsive technical support from the vendor.

 

Receiving a recommendation to look at SIOS, Chad concluded, "SIOS DataKeeper Cluster Edition overcame the problem caused by the lack of shared storage. Its use of a mirrored drive looks like shared storage to the WSFC. It was exactly what we wanted." SIOS DataKeeper also met PayGo's other three criteria better than any other solution considered.

 

The Solution

PayGo first installed SIOS DataKeeper SANLess Clustering software in its own private cloud, and later migrated the configuration to AWS. "Because SIOS DataKeeper supports private, public and hybrid cloud environments, we migrated the entire configuration, including all application software and data, easily and without any issues," Chad recalled. PayGo currently has two SQL Server nodes in each of its four SANless HA clusters. To provide protection against localized failures, the servers are deployed in separate Availability Zones. And to ensure high transactional throughput performance, each server has two network interfaces with one dedicated to SIOS data replication. The SANless clusters employ synchronous data replication through the sub-millisecond (ms) latency connectivity AWS delivers between Availability Zones.

 

The Results

SIOS DataKeeper met and exceeded PayGo's high expectations for a high availability solution, including ease of installation and operation, and responsive support. "We have been using SIOS DataKeeper for several years now, and it has proven to be the most rock-solid piece of software we have," Chad claimed.

 

Given its proven operation, including during actual failures, the IT team has minimized the ongoing testing needed for its production SANless clusters. The clusters are now tested only after changes are made to any of the hardware or software, scheduled on a monthly basis, and the test itself consists of a simple failover and failback. PayGo also upgrades only one node at a time in each cluster to simplify roll-back, if needed. With SIOS DataKeeper performing so well, the only reason PayGo now has for upgrading to SQL Server Enterprise Edition would be outgrowing the Standard Edition's database size limitation.

 

Looking Towards the Future

The IT team at PayGo is currently considering adding DR protection to the HA clusters by deploying a third node in a separate AWS region. The distance involved in this case (between datacenters in Virginia and Ohio) experience a latency of 12-13 ms. While that requires asynchronous replication to ensure high throughput performance in the active node, the combined HA/DR solution would recover much quicker than what is possible with log shipping.

 

"Whether you need to protect applications on a physical server, a private cloud, a public cloud or a hybrid cloud, you need to meet the same SLAs for application availability regardless of location.  Applications running in clouds also need to be protected against the inevitable cloud outage through the use of availability zones and regions with automated intelligent failover," said Frank Jablonski, VP of global marketing, SIOS Technology.  "PayGo is using SIOS to provide a fast, easy way to deploy applications in a high availability environment in the AWS cloud while continuing to use Windows Server Failover Clustering."
Humans may do a lot less of the testing themselves in the future, but they will still have to peer...
JFrog has released the findings of an IDC survey indicating developers are spending significantly...
New research from Mendix finds that low-code tools are no longer simply a tactical solution for...
Global study of over 1,300 tech professionals uncovers opportunities for enhanced security training...
Global IT Business-to-Business (B2B) revenues, coming from data centers, IT services and devices,...
Confluent adds Table API support for Apache Flink® making it even easier for developers to use...
Although 85% of total respondents have integrated AI apps into tech stacks in the past year, most...
Redefining “impossible” legacy projects, 75% of software executives see up to a 50% reduction...