Migrating Microsoft SQL Server Always-On Availability Group Clusters to Azure
By: David Patterson, Principal Technical Consultant
When organizations decide to migrate workloads to a cloud provider, there is often an initial ‘lift and shift’ approach that moves the workload as-is onto a virtual machine. And while there are many arguments for refactoring the applications during this process, the reality is that time constraints can make this a challenge. SQL Server workloads that are stand-alone can be migrated using a variety of tools, but when you approach SQL Server Clusters, the choices are greater, and the complexity can increase significantly.
Here, we’ll explore some of the primary considerations and the approach for migrating SQL Always-On Availability Group clusters to Azure in a cloud-friendly manner.
SQL Server offers two high-availability options: Failover Cluster Instances (FCI) and Always-On Availability Groups (AGs). FCI clusters use older technology dating back to the late 1990s, where shared disks were the common High Availability (HA) option. Since the release of SQL 2012, AGs have taken the lead in database resiliency – this article will focus on only this type of cluster. But how should an SQL AG cluster residing in a datacenter be migrated as an IaaS solution to the cloud with minimal effort and near-zero downtime?
The basic premise is to create new SQL Servers in the target cloud environment and join those to the current Windows Failover Cluster Service (WFCS) that is being used in the datacenter. This will cause the cluster to become a multi-subnet cluster (if it wasn’t already). Additional work will need to be done in the Cluster Manager to identify the IPs for the cluster service on the new VMs and the AG Listeners, which are used by applications to connect to their databases.
At a high level, the steps are:
- Create new SQL VMs and domain-join them to your current Active Directory domain (including the Windows Clustering feature)
- Add the new VMs to the Cluster Manager on an existing SQL Server
- Join the new VMs to the Availability Groups and replicate the data
This list is very simplistic, and each step involves many options and configuration settings. For example, if you wish to automatically replicate the databases, then the underlying disk layout must match the current servers in both capacity and drive lettering (manually restoring databases can remove this requirement).
SQL AG Options
Within Azure, there are two main options for creating an AG cluster. The first uses a single subnet for all the SQL Virtual Machines, and this method requires that an Azure Load Balancer be created to act as the front-end Listener for each availability group. This is also true in AWS or GCP, as the major cloud providers do not allow Address Resolution Protocol (ARP) to broadcast within their environments. In a traditional datacenter, ARP would signal to the DNS system when a failover occurs, but in the cloud, this function is replaced with a load balancer that is configured with health-probes to monitor the underlying VMs.
The second option is to place each SQL VM in a different subnet, which eliminates the need for a load balancer. This functions by adding extra IP addresses to the NIC of each VM, one per AG (and if using SQL 2016 or older, one IP for the cluster service). Windows 2019 and newer use a Distributed Server Name (DSN) for the cluster service, which is automatically registered with the Cluster Name Object in Active Directory. In either scenario, consideration should be given to placing the VMs in different Availability Zones or at the least in an Availability Set (more details).
Another consideration is whether to use WFCS at all. It is possible to create an Availability Cluster without using a WFCS-based installation – however this option is used only for creating read-only replicas at scale and offers no high-availability failover options. For clusters that rely on automatic failover, a WFCS-based cluster is needed (more details).
Database encryption will affect how databases are migrated to the new servers. Databases that are not using database-level or column encryption can be replicated automatically by SQL Server using a seeding method (which again requires an identical disk layout on the target VM). Databases that are encrypted cannot use this feature and will instead need to be backed up and restored to the new servers. The Certificate (or Asymmetric Key) used to encrypt the database would need to be restored to the new servers.
When using multi-subnet SQL Clusters, WFCS will unbind the IP from an AG when a failover occurs, and DNS will be updated – although the Time-to-Live (TTL) value may need to be shortened. When using Microsoft DNS (stand-alone or AD-integrated), this process is transparent. When using other DNS solutions, such as Infoblox, some additional settings may be needed. If Infoblox is configured to include the IP of each host VM in the Update ACL, then everything should failover correctly. If the hosts are not in that ACL, then DNS time-outs can occur, as the Hosts will be sending the DNS updates to Infoblox and will need permissions in that ACL to effect the change in DNS records. The Infoblox Grid Control Service would need to be restarted when the host VMs are first added to the Update ACL.
When the time comes to move the databases into Azure, the replication type will come into play. Replication is either synchronous or asynchronous. In the current versions of SQL Server, a single AG Cluster can support up to nine host VMs, but only three of them can be synchronous. This then necessitates a type of dance when migrating. Below is an example of one process that includes five host VMs, two original datacenter VMs, two Azure primary VMs, and a single Disaster Recovery (DR) VM. This step assumes that the servers are all configured with SQL Server, joined to the same domain and WFCS, and that data is replicating to at least the first Azure VM.
The Azure VMs are joined to the cluster, and at the time of migration, the first Azure VM should be changed to synchronous replication. Some additional time should be allowed to account for any latency in the replication as the Azure VM synchronizes any remaining transactions. The SQL AG monitor screen in SQL Server Management Studio (SSMS) will show how many pending transactions are left to bring the Azure VM into sync with the datacenter VMs.
The databases in each AG can then be failed over to the primary Azure VM.
The next step is to change one of the datacenter’s VMs to asynchronous, freeing up one of the synchronous slots for the 2nd Azure VM, which will then be converted to synchronous replication. Availability Groups can then be load balanced as desired between these servers.
Next, the DR SQL VM can be joined to the cluster and brought into asynchronous replication. This VM should stay as an asynchronous copy to reduce the latency of applications using the primary servers. Keep in mind that data loss is possible with asynchronous replication, and the amount of data loss is roughly equal to the network latency between the regions.
In the end, the other datacenter server should be set to asynchronous replication (if keeping online), as the network latency between Azure and the datacenter may impact application performance. Otherwise, disconnecting the datacenter VMs would complete the migration to Azure.
There are many technical steps in configuring a SQL IaaS Cluster solution, but careful planning and understanding of the database options and their configurations can lead to a smooth transition into cloud-based workloads. While this is not the only method to migrate SQL Server to a cloud provider, this process minimizes downtime, reducing the migration’s impact on the business.
To learn more, reach out to AHEAD today.