Double-Take Software products are able to handle a lot of different types of situations that can crop up between a production server and a target device. For example, if connectivity between two servers is temporarily cut and then resumed, the Double-Take Software product line can take appropriate corrective action and resume normal operations without the need for administrative intervention. That being said, there are some things that we just can’t do automatically, but that can be quickly fixed manually.
A common example of this is when the various management consoles used by Double-Take Software products report that a connection is “Retrying” or “Retrying Ops” with warning or error icons attached. What’s happening is usually easy to fix, but requires the Administrator taking a few manual steps for the sake of safety. The manual steps are designed to allow you (the Administrator) to choose which of several valid paths to take to return to full protection.
Retrying occurs when a Double-Take Software product finds that a file we are attempting to write to on a target device is locked. While the Replication Engine can successfully protect any file (open, locked or otherwise) on a production server, it cannot write to a locked file on a target server. Since the Engine will maintain I/O write order at all times, being unable to write to a file creates a situation where it has to wait for that file to become free before any other write can be made. If the file does not get freed up quickly, you will see the error in the management consoles and through other forms of alerting once configured. You can see exactly which file is being locked by examining the Double-Take logs on the servers in question. They will detail which file or files are causing the problems.
Correcting the issue is as simple as stopping whatever process is using the file on the target device. In some cases, it may just be an application running on the target that is actively using the file in question. In other cases, you may be attempting to replicate system files, but not using the appropriate tools from Double-Take Software.
If an application is running on the target server and doesn’t need to be, shutting the application down will typically correct the problem. The Replication Engine will be able to make the write, and will move along. If the application needs to be running, then you may want to exclude those files from replication, as if a service must run on the target, then you most likely do not want that data overwritten by the copy on the production box.
If you are trying to replicate system information (like Windows or Application binaries) then you may not be using the correct Double-Take Availability tools. System state replication from server to server requires that you use the Full Server Failover Manager. So, if you’re using the Replication Console alone, or some other wizard, chances are you’ll run into a Retrying error. The good news is that the Full Server Failover Manager is a component of Double-Take Availability, so you already own everything you need to properly protect the system. Just stop whatever connection you have in place, run the correct wizard, and you will be all set.
Retying errors can be a trying thing when you’re getting protection configured. Following these guidelines can correct them quickly, and get you back to a fully-protected state.
Filed under: DT 101, Double-Take Availability