High-Availability and Redundancy
Clustering, Mirroring and Log Shipping
The SQL Server features most often mentioned for high-availability and redundancy solutions are clustering, log shipping and database mirroring. However, grouping these technologies under the same label is a bit misleading because they each address different aspects of the problem. None of them alone provides a complete solution for both high availability and redundancy.
Understanding the strengths and weaknesses of each technology will help you design the best overall solution for your organization.
There are a wide variety of terms used to describe the same concepts. To avoid confusion, here are definitions of the terms we will use:
- High-Availability: A database system that is configured to require very little or no scheduled downtime and also has technology in place to minimize the risk of unplanned downtime.
- Redundancy: Having alternative hardware, software and data resources that can be used in case of failure of any production resource.
- Primary: A production SQL Server instance or individual database that contains the data that is to be replicated.
- Secondary: A server or database that is a repository for the replicated production data and which is configured to take over the role of the Primary when a failure occurs.
- Failover: The action of transferring control from the primary server/database to the secondary. A hot failover is one that requires no manual intervention and is invisible to database users. A warm failover is one that requires some degree of manual intervention and often involves some amount of server down-time.
Let’s look at the details.
Clustering is a server-wide solution as opposed to mirroring or log shipping which operate on individual databases. Clustering applies to all databases on the server.
Clustering is very useful as far as it goes. It is the best choice for rapid, fully automated failover in case of hardware failure. Users seldom notice when a failover occurs. Since clustering handles continuous availability so well it is widely used for 24x7 database servers. Clustering is generally much more costly than other solutions but very often it is money well-spent.
Clustering protects against failure of the server hardware. It does not provide data redundancy. There is only one copy of the data and both servers in the cluster use the same database files. If the storage hardware fails or some other data-related problem makes the data files unusable, a failover to the other server in the cluster won't help. If the data files are not usable it will take just as long to get your system back on-line as it would in a non-clustered environment.
Log Shipping and Mirroring
Unlike clustering, both log shipping and mirroring are database-level options. You can pick and choose what databases you need to set up for warm failover. Also unlike clustering, both technologies provide data redundancy.
Log shipping is perhaps the simplest technology to set up and maintain. All transactions occurring on the primary server are automatically saved into a backup file. The file is sent to the secondary server on whatever schedule you choose. Those transactions are automatically applied to the secondary database, bringing it into close synchronization with the primary. When the primary server encounters a problem that takes it off-line you can switch to this already-running copy of the primary database.
The failover is much faster than fixing whatever caused the primary server to fail, but still it is a manual process and more disruptive to users than a cluster failover. The database will be unavailable for a period of time ranging from a few minutes to a half hour or more. That is assuming that the technician who has to repond to the situation is immediately available and is aware of the outage.
Determining the interval for log shipping is a critical decision because that determines the maximum amount of possible data loss in the case of failure of the primary server. If you are log shipping on a 15 minute interval, for example, you have the possibility of losing up to 15 minutes worth of transactions that have occurred on the primary server but have not yet been sent to the secondary. In many cases, those transactions can be recovered and no data loss occurs, but the bottom line is that you are at some risk to lose up to 15 minutes of data.
An important and often overlooked benefit of log shipping is that the secondary database is accessible to users in a read-only mode while on stand-by. Since the secondary is a recent copy of the production data it can be used to off-load the reporting burden from your production database.
There are some limitations on the reporting capabilities of the secondary server. The secondary server is always slightly behind the primary. Reports that need absolutely current data should not use the secondary server. If your reports can accept data that is a few minutes old, reporting from the secondary database is a good way to spread the processing load around.
Another limitation is that during the few seconds when the new transactions are being applied to the secondary server, it is not available for reporting. That means that sometimes a user will have to re-run their report. You will need to decide whether that is acceptable to your users.
Mirroring keeps a warm copy of your database running on another server. The movement of changes from the primary database to the secondary is near enough to real-time that the small difference can usually be ignored. Like log shipping, mirroring does have data redundancy because the primary and secondary databases have individual copies of the same data.
With mirroring, the vulnerability to data loss after a disaster is much less than with log shipping because of the speed at which data changes are applied to the mirror. There are few unreplicated transactions in the primary database.
Failover of a mirror secondary database can be automated with the addition of a third server called a witness. (A free version of SQL Server Express can handle this part of the job). However that does not mean the whole failover process can be automated as it is in a clustered environment. After failover, the applications that use the database must be connected to the new database server which has a different name and a different IP address. It is difficult to automate this step so most mirror failovers have some degree on manual intervention required. However, it is possible to automate the entire process in some situations. It is worth looking into whether your environment could support automated failover of applications in a mirrored environment.
Another limitation to mirroring is that the secondary database is not accessible to users while in stand-by mode. There are a number of papers available on the web describing how to use snapshot views of the secondary mirror database for reporting, but there are issues that make this difficult. If you would like to look further into this possibility, here is a paper that describes a method to work around most of the difficulties.
Reporting from Mirror Snapshots
Since none of these methods offer the complete package, the ideal solution often involves combinations of technologies. for example, you might choose to implement clustering to provide fast failover in case of hardware failure. In addition to clustering, implementing log shipping on the same database will keep a warm copy of the database running on another server. That will give you both rapid, unattended failover and data redundancy. This solution also gives you an opportunity to move the reporting load off of your production server. Add a good backup plan to the mix and you have all your bases covered.
What About Replication?
SQL Server replication is often considered to be a tool for data redundancy and high availibility. In some scenarios it makes sense to use it for those purposes but in our opinion it is not very often the best solution. We chose not to include it in this article.
One size does not fit all. You must carefully consider your budget, your organization’s tolerance for data loss, the importance of unattended failover, etc. Then you can match up your requirements to the capabilities of these options and come up with an architecture that meets your needs.
As always, if you have questions about this article email me or give me a call at 503.914.6442.
SQL Consulting, Inc.