Chapter 12. Implementing Snapshot, Merge, and Transactional Replication

Data replication allows you to distribute data from a source database to one or more destination databases. The source and destination databases can be on different Microsoft SQL Servers or on other database systems as long as an OLE DB provider is available for each destination database. You have precise control over when replication occurs, what data is replicated, and how other aspects of replication are handled. For example, you can configure replication to happen continuously or periodically. Before we examine how to implement replication, let’s look at why you would want to use replication and review the main concepts it involves.

An Overview of Replication

You use replication to copy data on one server and distribute it to other servers. You can also use replication to copy data, transform it, and then distribute the customized data to multiple servers. You generally use replication when you need to manage data on multiple servers on a recurring basis. If you need to create a copy of a database just once, you do not need replication—instead you should copy the database as discussed in the section titled "Tips and Techniques" in Chapter 7, or in the subsection titled "Restoring a Database to a Different Location" in Chapter 14. If you need to copy and transform data from one server to another server, you do not need replication either; instead you should use the import and export procedure discussed in Chapter 10. Some reasons to use replication include:

  • To synchronize changes to remote databases with a central database. For example, if the sales team uses remote laptops, you may need to create a copy of data for their sales region on the laptop. Later, a salesperson in the field may add information or make changes when they are disconnected from the network. By using replication, these modifications could be synchronized with the central database.

  • To create multiple instances of a database so that you can distribute the workload. For example, if you have a central database that is updated regularly, you may want to push changes out to departmental databases as they occur. Employees can then access data through these departmental databases instead of trying to connect to the central database.

  • To move specific data sets from a central server and distribute them to several other servers. For example, you would use replication if you had a central database and needed to distribute sales data to all the databases in your company’s department stores.

  • To customize data and distribute it to multiple subscribers. For example, if your company sold subscriptions to your consumer credit database, you could replicate the data for subscribers, customizing the data for each subscriber.

Replication is designed to meet the needs of a wide variety of environments. Replication architecture is divided into several different processes, procedures, and components, each of which is used to customize replication for a particular situation. The replication architecture includes:

  • Replication components. The server and data components used in replication

  • Replication agents. Applications that assist in the replication process

  • Replication variants. The types of replication you can configure

Replication Components

Before working with replication, you will need to know the main components of the process and how you use them. Servers in the replication model can have one or more of the following roles:

  • Publisher. Publishers are servers that make data available for replication to other servers. Publishers also track changes to data and maintain other information about source databases. Each data grouping has only one publisher.

  • Distributor. Distributors are servers that distribute replicated data. Distributors store the distribution database, metadata, historical data, and (for transactional replication) transactions.

  • Subscriber. Subscribers are the destination servers for replication. These servers store the replicated data and receive updates. Subscribers can also make changes to data. You can publish data to multiple subscribers.

The data being published for replication are referred to as articles and publications. Articles are the basic units for replication and can consist of a table, a subset of a table, or other database objects. Publications are collections of articles that subscribers can receive. You should associate articles with a publication and then publish the publication. Articles can contain:

  • An entire table.

  • Only certain columns from a table, obtained by using a vertical filter.

  • Only certain rows from a table, obtained by using a horizontal filter.

  • A table subset containing certain rows and columns.

  • A view, indexed view, or user-defined function.

  • A stored procedure.

You can also specify whether schema objects are replicated. Schema objects include constraints, indexes, triggers, collation, and extended properties. When you alter tables, views, procedures, functions, or triggers using DDL statements like ALTER TABLE or ALTER VIEW on a published object, the changes are propagated by default to all SQL Server subscribers. You cannot publish any of the following for replication:

  • The model, tempdb, and msdb databases

  • System tables in the master database

In the publication and subscription model, setting up replication involves the following steps:

  1. Selecting a replication type and model

  2. Performing any necessary preliminary tasks

  3. Configuring a distributor and enabling publishers and publication databases

  4. Creating a publication

  5. Creating subscriptions to the publication and designating subscribers

Replication Agents and Jobs

SQL Server uses various helper applications to assist in the replication process. These applications are called replication agents and they include:

  • Snapshot Agent (snapshot.exe). Creates snapshots of data. It includes schema and data, which are stored for distribution. The Snapshot Agent is also responsible for updating status information in the distribution database. The Snapshot Agent runs on the distributor. Each published database has its own Snapshot Agent that runs on the distributor and connects to the publisher. Snapshot Agents are used with all types of replication.

  • Distribution Agent (distrib.exe). Applies data from snapshot replication or transactions from transaction replication to subscribers. The Distribution Agent can run on the distributor or on subscribers. It runs on distributor for push subscriptions, and on subscriber for pull subscriptions. This agent is not used with merge replication.

  • Merge Agent (replmerg.exe). Synchronizes changes that occur after the initial snapshot is created. If any conflicts occur when the changes are being synchronized, the conflicts are resolved using the rules set with the conflict resolver. Depending on the configuration, Merge Agents run on the publisher or on subscribers. Merge Agents are used only with merge replication.

  • Log Reader Agent (logread.exe). Moves transactions marked for replication from the transaction log on the publisher to the distributor. Each database that is published using transactional replication has its own Log Reader Agent that runs on the distributor and connects to the publisher. Log Reader Agents are used only with transactional replication.

  • Queue Reader Agent (qrdrsvc.exe). Stores database changes in a queue where the updates can be asynchronously propagated to the publisher. This allows subscribers to modify published data and synchronize those changes without having an active network connection to the publisher. Queue Reader Agents are used only with transactional replication with the queued updating option.

SQL Server 2005 does not have a separate cleanup agent. Instead, the following replication maintenance jobs are used to perform cleanup tasks:

  • Agent History Clean Up: DistributionDBName. Removes replication agent history from the distribution database. By default, this runs every ten minutes.

  • Distribution Clean Up: DistributionDBName. Deactivates subscriptions that have not been synchronized within the maximum distribution retention period and removes replicated transactions from the distribution database. By default, this runs every ten minutes.

  • Expired Subscription Clean Up. Removes expired subscriptions from publication databases. By default, this runs every day at 1:00 A.M.

  • Reinitialize Subscriptions Having Data Validation Failures. Flags all subscriptions that have data validation failures. The next time the Merge Agent or Distribution Agent runs, a new snapshot is applied at the subscriber. By default, this job is not enabled.

  • Replication Agents Checkup. Detects replication agents that are not actively logging history and writes an error to the Windows event logs if a job step fails. By default, this runs every ten minutes.

  • Replication Monitoring Refresher For DistributionDBName. Refreshes cached queries used by the Replication Monitor. By default, this starts automatically when SQL Server Agent starts and runs continuously.

Replication Variants

SQL Server supports several different types of replication. These replication variants are as follows:

  • Snapshot replication. Takes a snapshot of current data and replaces the entire copy of the data on one or more subscribers. With subsequent snapshots, the entire copy of the data is again distributed to subscribers. Although exact copies are a benefit of snapshot replication, this technique increases the amount of overhead and traffic on the network. Another disadvantage of snapshot replication is that it only runs periodically, which usually means that subscribers do not have the most current information. In SQL Server 2005, snapshot preparation has been enhanced to allow processing of multiple articles while scripting schema or bulk copying data. This technique, referred to as parallel processing, is used automatically when possible. SQL Server 2005 also features resumable snapshot delivery, which allows an interrupted snapshot delivery to be resumed automatically. When the delivery is resumed, only files that have not been transferred or are partially transferred are transferred—any snapshot files that have been completely transferred are not re-transferred.

  • Transactional replication. Uses transactions to distribute changes and primarily, in server-server environments. When replication starts, a snapshot of the data is sent to subscribers. After the snapshot is sent, selected transactions in the publisher’s transaction log are marked for replication and then distributed to each subscriber separately. Snapshots are then taken periodically to ensure that the databases are synchronized. Distributed transactions are used to ensure that incremental changes are applied consistently. A benefit of transactional replication is that you replicate individual transactions rather than an entire data set. Transactional replication can also occur continuously or periodically, which makes the procedure more versatile than snapshot replication by itself. To allow for easier implementation of transactional replication of large databases, SQL Server 2005 allows you to initialize a transactional subscription from backup. Thus, rather than using a snapshot to initialize a subscription, you can restore any backup taken after the creation of the publication on a subscriber. SQL Server 2005 also creates concurrent snapshots whenever possible to reduce the amount of time that locks are held during snapshot generation. This limits the impact on users that are working with the database while the snapshot is being generated.

  • Merge replication. Allows subscribers to make changes to replicated data independently, and primarily in a server-client environment. Later, you can merge these changes into all of the related source and destination databases. The snapshots needed to initialize merge replication can be pre-generated for each subscriber, or you can specify that subscribers can initiate snapshot generation during the initial synchronization. Merge replication does not use distributed transactions and cannot guarantee transactional consistency. Instead, merge replication uses a conflict resolver to determine which changes are applied. By default, merge replication processes changes on a row-by-row basis. You can also group sets of related rows as a logical record. This ensures that related sets of records are always processed in their entirety at the same time on a subscriber. SQL Server 2005 provides article-level statistics during merge replication. This gives better tracking of the merge phase and also allows you to define the order of article processing during merge synchronization. This technique, referred to as declarative ordering, is useful if you use triggers or rely on triggers firing in a specific order.

In snapshot and transactional replication, subscribers normally do not change data. However, with transactional replication, you have several options for allowing subscribers to change data:

  • Immediate updating. Allows subscribers to make changes and then immediately update the publisher. The publisher then replicates these changes to other subscribers.

  • Queued updating. Allows subscribers to make changes and then store the changes in a queue until they can be applied to the publisher. The publisher then replicates the changes to other subscribers. Immediate and queued updating is only supported in snapshot and transactional publications.

Queued updating provides fault tolerance that may be needed when databases are geographically separated. Immediate updating requires an active connection to the publisher, but queued updating does not. By using queued updating, subscribers can asynchronously apply changes, which means that they can store changes when a link is inactive, and then when the link is active, they can submit the changes to the publisher.

You can also use immediate updating, with queued updating as a failover, when you expect publishers and subscribers to be connected but do not want to lose the ability to make updates if a link fails. Here, you configure both updating options, using immediate updating as the primary update mechanism and then switching to queued updating when needed. You can invoke failover at any time. However, you cannot fail back afterward until the subscriber and publisher are connected and the Queue Reader Agent has applied all pending updates in the queue.

Both immediate updating and queued updating use transactions and the standard two-phase commit process to apply updates to the publisher. Transactions ensure that the update can be committed if it is successfully applied or rolled back if there is a problem. The transactions are applied from a specific subscriber to the publisher. After changes are made to the publisher, the publisher replicates the changes to other subscribers.

Transactions are completed automatically through the update process and are managed by the Distributed Transaction Coordinator. Custom applications that modify subscriber data can be written as though they were updating a single database. In a standard (default) configuration, updates to the subscriber are applied only when they can be replicated through a transaction. If the update cannot be replicated through a transaction, the subscriber will not be able to modify the subscription data.

SQL Server detects subscriber changes that would conflict with changes on the publisher. If it detects a conflict, it rejects the transaction and does not allow the data changes. Usually, a rejection means that the subscriber needs to synchronize with the publisher before attempting to update the data locally.

When you include stored procedures as articles in a snapshot publication, SQL Server replicates the entire stored procedure from the publisher to the subscribers. Changes caused by the execution of the stored procedures are replicated with new snapshots. If you use transactional replication, however, you can replicate execution of the stored procedure instead of replicating the changes that the execution causes. By sending an execute command rather than data changes, you reduce the amount of data that needs to flow across the network and improve the performance of SQL applications that use replication.

If you replicate the execution of stored procedures, you have two configuration settings. You can use standard procedure execution, or you can use serialized procedure execution. With standard procedure execution, procedure execution is replicated to all subscribers, even if those procedures are executed in different transactions. Because multiple transactions may be executing at a particular time, subscribers’ data cannot be guaranteed to be consistent with the publisher’s data. With serialized stored procedures, procedures are executed in sequence if they are referenced within serialized transactions. If the procedures are executed outside of serialized transactions, changes to the data are replicated instead. This behavior guarantees that the subscribers’ data is consistent with the publisher’s data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset