Planning for Replication

As you have learned, the architecture for the replication process is extensive. This ensures that the architecture is versatile enough to meet the needs of almost any replication situation. Unfortunately, this versatility also makes replication tricky to configure. To make the replication go smoothly, you should do a bit of planning, which involves selecting a specific replication model and performing any necessary preliminary tasks before you start configuring replication.

Replication Models

The main decision to make when you select a replication model involves the physical layout of the publisher, distributor, and subscriber databases. Replication models you may want to use include:

  • Peer-to-peer model. Allows replication between identical participants in the topology. The advantage of this model is that it permits roles to move between replicated nodes dynamically for maintenance or failure management. The disadvantage is the additional administration overhead involved with moving roles.

  • Central publisher model. Maintains the publisher and distributor databases on the same server, with one or more subscribers configured on other servers. The advantages of this model are manageability and ease of maintenance. The disadvantages include the extra workload and resource usage on the publication server.

    Tip

    Tip

    The central publisher model is the most commonly used replication model. Unfortunately, you will often find that the extra load on the publication server slows down server performance. To reduce the server load, you should put the distributor on its own server. Be aware, however, that doing this will not entirely eliminate the workload on the publication server. The publisher and distributor still need to communicate, and they still need to pass data back and forth.

  • Central publisher with remote distributor model. Maintains the publisher and distributor databases on different servers, with one or more subscribers configured on other servers. The advantage of this model is that the workload is more evenly distributed. The disadvantage is that you have to maintain an additional server.

  • Central subscriber modelA single subscriber database that collects data from several publishers. For example, if you have ServerA, ServerB, and ServerC, ServerA and ServerB would act as central publishers, and ServerC would act as the central subscriber. In this configuration, when updates are distributed from ServerA and ServerB, they are collected on ServerC. A central subscriber could then republish the combined data to other servers. To use this model, all tables used in replication must have a unique primary key; otherwise, the replication model will not work properly.

  • Publishing subscriber model. Relays the distribution of data to other subscribers; you can use this with any of the other models. For example, if you have two geographically separated sites, a publisher could replicate data to servers at site A and then have a publishing subscriber at site B that distributes the data to servers at site B.

Preliminary Replication Tasks

After selecting the replication type and model you want to use, you prepare for the replication by performing preliminary tasks. The following subsections describe the main tasks involved, according to replication type.

Preparing for Snapshot Replication

If you use snapshot replication, the data being replicated is copied in full to data files on the distributor. Normally these snapshot files are the same size as the data you are replicating, and they are stored in the SQL Server Repldata folder by default. You should make sure that the drive on which the replication data is stored has adequate free space. For example, if you are using snapshot replication to distribute publication A with 500 MB of data, publication B with 420 MB, and publication C with 900 MB, you should have at least 2 GB of free space. Some of the free space is needed for processing overhead; the rest is required for the actual data.

You can also store snapshot data in an alternate location where subscribers can retrieve it at a later time. If you use an alternate location, you have the option of compressing the snapshot file, which reduces the disk space requirements only for the files you are compressing. It does not change the overall space requirements, and it does not always reduce the initial and final space requirements. With compression, the Snapshot Agent generates the necessary data files and then uses the Microsoft CAB utility to compress the files. When the subscriber receives compressed snapshot files, the files are written to a temporary location, which is either the default client working directory or an alternate location specified in the subscription properties. The subscriber uses the CAB utility to decompress the files before reading them.

Real World

Real World

When you create snapshot files in the default location and in an alternate location on different drives, the files are created separately. This means that the total disk space required typically is what you would expect, based on the size of the files. However, when you create snapshot files in the default location and in an alternate location on the same drive, both files are initially created in the default location and then the alternate location file is copied to its final destination. This means that the total disk space required in the default location is twice what you might expect. Compression does not help because the Snapshot Agent generates the necessary data files and then compresses them.

Replication timing is another important consideration in snapshot replication. When the Snapshot Agent creates a snapshot of a published table, the agent locks the entire table while it bulk copies the data from the publisher to the distributor. As a result, users cannot update any data in the table until the lock is released. To reduce the impact on operations, you should carefully schedule when replication occurs. Some actions that may help include:

  • Identifying times when operations are at their lowest levels or users do not need write access to the tables you are replicating

  • Identifying times when snapshots must be made and scheduling users to do work during that time that does not require write access to the tables you are replicating

SQL Server 2005 processes multiple articles while scripting schema or bulk copying data. This parallel processing can increase the speed and efficiency of the snapshot generation process. SQL Server 2005 also features resumable snapshot delivery, which allows an interrupted snapshot delivery to be automatically resumed. When the delivery is resumed, only files that have not been transferred yet or have been transferred partially are transferred—any snapshot files that have been completely transferred already are not retransferred.

Preparing for Transactional Replication

Because transactional replication builds on the snapshot replication model, you will want to prepare for both snapshot and transactional replication. When you use transactional replication, an initial snapshot is sent to the distributor, and this snapshot is then updated on a periodic basis, such as once a month. Between snapshots, transactions are used to update subscribers. These transactions are logged in the distributor’s database and are cleared out only after a new snapshot is created.

Transaction logs for published databases are extremely important to successful replication. As long as replication is enabled, pending transactions cannot be cleared out of a published database until they have been passed to the distributor. Because of this, you may need to increase the size of a published database’s transaction log. Furthermore, if the publisher cannot contact the distributor or if the Log Reader Agent is not running, transactions will continue to build up in the publisher’s transaction logs.

With transactional replication, all published tables must have a declared primary key. You can add a primary key to an existing table using the ALTER TABLE statement (see Chapter 9). Additionally, if a publication uses very large data types, you must make sure that you keep the following limitations in mind:

  • When updating varchar(max), nvarchar(max), and varbinary(max) data types, you should use the .WRITE clause of the UPDATE statement to perform a partial or full update. If you are performing a partial update of a varchar(max) column, you might update the first 100 characters of a column. If you are performing a full update, you might modify all the data in a column. @Offset and @Length values for the .WRITE clause are specified in bytes for varbinary(max) and varchar(max) data types and in characters for the nvarchar(max) data type.

    Best Practices

    Best Practices

    For best performance, you should insert or update data in multiples of 8,040 bytes, which ensures that the data is written using whole data pages. Any .WRITE updates that insert or append new data are minimally logged when the database recovery model is set to Simple or Bulk Logged. Minimal logging is not used when you update existing values.

  • When modifying text, ntext, or image data types, UPDATE initializes the column, assigns a valid text pointer to it, and allocates at least one data page unless the column is being updated with NULL. To replace or modify large blocks of text, ntext, or image data, Microsoft recommends that you use WRITETEXT or UPDATETEXT instead of the UPDATE statement. However, support for WRITETEXT and UPDATETEXT is deprecated and may be removed in future versions of SQL Server.

  • The SQL Server configuration option MAX TEXT REPL SIZE controls the maximum byte size of the text and image data that can be replicated. Operations that exceed this limit will fail. Set the maximum text replication size with the sp_configure system stored procedure.

The snapshot process can be modified for transactional replication in several important ways. To allow for easier implementation of transactional replication of large databases, SQL Server 2005 allows you to initialize a transactional subscription from backup. Thus, rather than using a snapshot to initialize a subscription, you can restore on a subscriber any backup taken after the creation of the publication. SQL Server 2005 also creates concurrent snapshots whenever possible to reduce the amount of time that locks are held during snapshot generation. This limits the impact on users that are working with the database while the snapshot is being generated.

Preparing for Merge Replication

For merge replication, all published tables must have primary keys. If a table contains foreign keys or is used in validation, you must include the reference table in the publication. Otherwise, update operations that add new rows will fail because SQL Server cannot find the required primary key. Additionally, merge replication affects time-stamp column usage. Time stamps are generated automatically and are guaranteed to be unique only in a specific database. Because of this, SQL Server replicates time-stamp columns but does not replicate the literal time-stamp values contained in the columns. These values are regenerated when the initial snapshot rows are applied at the subscriber.

Like transactional replication, merge replication has a few limitations when it comes to text and image columns. For example, you must explicitly update text and image columns with an UPDATE statement. When using merge replication, subscribers can make changes to replicated data independently, and these changes can be merged into all of the related source and destination databases. The Merge Agent watches for changes that conflict with other changes. If it detects a conflict, a conflict resolver is used to determine which change is applied and which change is rolled back. The Merge Agent can track changes at a column level or at a row level. In column-level tracking, a conflict exists when changes are made to the same column in a table in more than one copy. In row-level tracking, a conflict exists when changes are made to the same row in a table in more than one copy.

Normally, subscribers to merge publications only synchronize updates with the publisher. Subscribers can also synchronize with other servers, and they do this by designating alternate synchronization partners. It is useful to have an alternate synchronization partner when you want to ensure that updates can be made even if the primary publisher is offline or otherwise unavailable.

Note

Note

By default, merge replication processes changes on a row-by-row basis. You can group sets of related rows as a logical record. This ensures that related sets of records are always processed in their entirety at the same time on a subscriber. You can also use declarative ordering to define the order of article processing during merge synchronization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset