C H A P T E R  16

Parent-Child Design Pattern

Having one child makes you a parent; having two, you are a referee.

—British journalist David Frost

With SSIS, there are several ways to introduce modular programming into your ETL processes. One of the most flexible methods is by using the parent-child design. This pattern enables you to create packages to perform very specific tasks. If these tasks should be disabled, this design pattern allows you to utilize parameter bindings to disable functionality at the package level. As the quote suggests, the more complex your ETL process becomes, the more of an organizational role the parent package plays, rather than just the role of a package that calls another.

ETL processes can become increasingly transparent when a clear design is utilized. This chapter covers some of the benefits of utilizing the parent-child design pattern and shows you how to implement it in your own processes. You will also see the different ways to implement the design pattern, allowing you to pick the one that best suits your needs.

Understanding the Parent-Child Design Pattern

The parent-child design pattern in a nutshell refers to any ETL process that utilizes an SSIS package to execute another package. This can be done by using the Execute Package task or even the Execute Process task by calling the dtexec.exe utility. Our recommendation is to stick with the Execute Package task because it will allow you to configure the execution of the child package much easier than the Execute Process task. Figure 16-1 shows you what a static parent package would look like. We call this a static package because any changes in the ETL structure or the addition and removal of packages would require a modification to this package.

Image

Figure 16-1. Example of a static parent package

The static package is developed with each of the tasks having their own individual settings. This implementation of the parent-child design pattern is recommended only if the execution process will not change. Maintaining and updating this implementation can be become extremely challenging very quickly. The third executable, EPT_CH16_Apress_DisabledChildPackage, has different configurations than the other three executables. In order to disable the package during execution, we pass a Boolean variable from the parent package, CH16_Apress_StaticParentPackage.dtsx, to a parameter bound to the child’s Disable package property. We cover the parameter passing later in this chapter. The static implementation of the parent-child design pattern is very similar to the steps in a SQL agent job.

All the packages contain one Script task that displays a simple message to show that the package was executed. Listing 16-1 shows the C# that is contained with the nondisabled child packages. It shows a simple message that informs you of the name of the child package that was executed. As you can infer from the variable that is utilized, we listed System::PackageName as the only variable available to the ReadOnlyVariables property of the Script task.

Listing 16-1. Child Package Script Task

MessageBox.Show("The current package's name is: " +
    Dts.Variables["System::PackageName"].Value.ToString()+" ");

The script for the disabled child package is slightly different, in order to alert you that this package should not be executed. This example is included to demonstrate the ease of configuring your ETL process by using a dynamic parent-child design pattern. Listing 16-2 provides the script present in the disabled child package. There is short message contained within this message box, alerting you that you should not be seeing this message.

Listing 16-2. Disabled Child Package Script Task

MessageBox.Show("The current package's name is: " +
        Dts.Variables["System::PackageName"].Value.ToString()+" "+
        "This package is supposed to be disabled. If you are seeing this message,
        the parameters were not assigned the proper values.");

The modularity behind the parent-child design packages can be described in the following terms:

The parent package controls the entire ETL process. Its main duty is to execute all the necessary packages.

Wrapper packages control a smaller subset of packages whose focus should be handling one particular process. Wrapper packages are either called directly by the parent package or by other wrapper packages. For example, a wrapper package can be used to execute packages that are responsible for preprocessing the sources or performing post-load processing.

Child packages control the processing of the actual tasks. These packages are the workhorses of the parent-child design pattern. They are called by wrapper packages.

In the example, we combined the concept of the parent and wrapper package for simplicity. The idea behind the wrapper package is to segregate the different processes that make up your ETL requirements.

Using Parameters to Pass Values

Parameters are absolutely essential for the parent-child design pattern. In prior versions of SSIS, values were passed along by using configurations. With this version’s deployment model, we encourage you to use parameters for passing values between parent and child packages. Figure 16-2 demonstrates the parameter binding we use to disable the child package. We bind a parameter that we defined in the package, DisablePackage, to the Disable package property. We can even bind parameters to task properties within the child package.

Image

Figure 16-2. Parameter bindings of the disabled child package

After we parameterize the child package, any Execute Package task that is configured to execute this package will be able to identify the parameters that are defined within it. The data type of the parameter must match the property’s data type. Figure 16-3 demonstrates the mapping of the parameter when using the Execute Package task.

Image

Figure 16-3. Parameter bindings for the child package

The User::DisableChild variable is a Boolean variable. It must match the data type of the parameter of the child package. Multiple bindings can be added to the package execution. We provided True as the default value of the User::DisableChild variable. This mapping is present only in packages that are not supposed to be executed.

Image NOTE: Even if the package property Disable is set to True, during execution in Debug mode, Visual Studio will open the package and validate it. The package itself will not execute, but it will be opened.

Working with Shared Configuration Information

Connection managers will abound in a design pattern such as the parent-child. In order to keep the clutter to a minimum, you can utilize the project connection managers so that keeping the connection strings in sync becomes much easier. Creating a project connection manager will create the connection manager in every package within the project.

Image CAUTION: Deleting any project connection manager within a package will delete it from the project and all the packages within the project. If you find that you have deleted a project connection manager and cannot undo the action, you have an option to retrieve it. You first need to create the same connection manager type with the same name. After the project connection manager is created, you have to right-click it and select View Code. This opens an XML script that shows all the properties of the connection manager. The property you need to focus on is DTS:DTSID. When you open any package with a task or source component that referred to the deleted project connection manager, you will notice that the name is replaced by a GUID. You can copy that GUID and replace the DTS:DTSID of the new project connection manager with the GUID. After this, save the Code page of the project manager and close it. Reopen any of the open packages, and their contents should be able to recognize their connection manager as the newly re-created project connection manager.

Certain information can be stored within project parameters. Because parameter values can be changed only by T-SQL, parameter values are more or less read-only. Catalog views and stored procedures within the SSISDB can be used to view the values. The following objects in the SSISDB database allow you to access parameter values during execution:

catalog.execution_parameter_values is a view that displays the values that will be used during specific executions.

catalog.get_parameter_values is a stored procedure that displays and resolves parameter values of specified packages within projects stored in the Integration Services catalog.

catalog.object_parameters is a view that shows the design and server defaults for the parameters.

Overriding Properties

With the new deployment model, overriding properties by using parameters instead of configurations streamlines development. Instead of creating configurations by using variables to inherit values and then using the variable values to overwrite properties, you can now simply bind package parameters to those properties. The configurations were required by the child package in anticipation of being passed in by the parent. This often led to the issue of executing child packages on their own for testing purposes or even as a one-off in the ETL process.

Even the individual tasks and containers within a package can have their properties bound to parameters. This allows parent packages even further control of the execution of the child packages. The parent can pass information directly to the tasks, overriding the design-time property configuration. The process is the same as binding parameters to package properties. You have to right-click the object you want to parameterize and select the Parameterize option.

Logging

Logging the execution process with the parent-child design pattern can be a little tricky. The out-of-the-box package logging that is available will simply treat each message separately. The parent package messages and execution events will surround the messages and events from the child package, but there is no real way to tie their executions together other than by looking at their timing and knowing the ETL process. We discuss logging in great detail in Chapter 13.

Implementing Data-Driven ETL

One of the most flexible implementations of the parent-child design pattern is data-driven ETL. The easiest way to implement this is to store the data in a table so that it can be queried. Storing the data in a table allows you to quickly add or remove packages from your process. It also allows you to disable certain packages’ execution if you do not want to permanently remove them from the process. Listing 16-3 shows the structure of a table that can be used to drive your ETL processes.

Listing 16-3. Table CH16_Apress_PackageExecution

CREATE TABLE dbo.CH16_Apress_PackageExecution
(
        Package NVARCHAR(250) NOT NULL,
        PackagePath NVARCHAR(200) NOT NULL,
        ParentPackage NVARCHAR(250) NULL,
        ExecuteOrder INT NOT NULL,
        DisablePackage BIT NOT NULL,
);
GO

CREATE CLUSTERED INDEX CIX_CH16_Apress_PackageExecution ON dbo.CH16_Apress_PackageExecution
(ParentPackage);
GO

This table is designed to show a hierarchical view of the ETL process. You can see the parent-child relationships and even write queries that will give you detailed information about your processes. A primary key can be defined on this table if, for instance, you do not need to execute a package multiple times. The following provides a brief explanation of what each column represents:

Package provides the name of the child package of the current package. The wrapper package will be listed as ParentPackage, and all of the worker packages will be listed in this column. The root-level parent package will list the wrapper packages as its child packages.

PackagePath provides the folder path of the child packages.

ParentPackage provides the name of the package that is currently being executed.

ExecuteOrder provides the order in which the packages will be executed. As tempting as it may be to keep the order neat and sequential, we recommend keeping large gaps so that you can add packages to the execution list without the headache of updating several rows.

DisablePackage determines whether the package property, Disable, will be set to True or False.

We will re-create the example we showed earlier with this dynamic approach. In order to populate the table with the proper execution information, we run the script shown in Listing 16-4. The insert statements populate the table we just created with the same type of information that was hard-coded into CH16_Apress_StaticParentPackage.

Listing 16-4. Populate dbo.CH16_Apress_PackageExecution

INSERT INTO dbo.CH16_Apress_PackageExecution
        (
                Package
                ,PackagePath
                ,ParentPackage
                ,ExecuteOrder
                ,DisablePackage
        )
        VALUES
        (
                'CH16_Apress_DynamicParentPackage'
                ,'C:UsersSQL11DesktopIntegration Services Project1
                                Integration Services Project1'
                ,NULL
                ,0
                ,0
        ),
        (
                'CH16_Apress_ChildPackage'
                ,'C:UsersSQL11DesktopIntegration Services Project1Integration Services
Project1'
                ,'CH16_Apress_DynamicParentPackage'
                ,10
                ,0
        ),
        (
                'CH16_Apress_ChildPackage1'
                ,'C:UsersSQL11DesktopIntegration Services Project1Integration Services
Project1'
                ,'CH16_Apress_DynamicParentPackage'
                ,20
                ,0
        ),
        (

                'CH16_Apress_DisabledChildPackage'
                ,'C:UsersSQL11DesktopIntegration Services Project1Integration Services
Project1'
                ,'CH16_Apress_DynamicParentPackage'
                ,30
                ,1
        ),
        (
                'CH16_Apress_ChildPackage2'
                ,'C:UsersSQL11DesktopIntegration Services Project1Integration Services
Project1'
                ,'CH16_Apress_DynamicParentPackage'
                ,40
                ,0
        );
GO

Image NOTE: The file system path is not so important if the packages are stored on the server. In that case, the process will rely on the project reference to locate the package. If your deployment strategy is to store the packages on the file system, you will have modify this example to include a Script task that will modify the connection string of the Connection Manager used to locate the child package.

As you can see, this is the same order of execution as we had in CH16_Apress_StaticParentPackage. We kept a gap of 10 between all the values of ExecuteOrder. With the data in place, we can take a look at how the dynamic package works. Figure 16-4 shows the variables we will use to control the execution process of the packages.

Image

Figure 16-4. CH16_DynamicParentPackage variables

As you can see, we provided a default value for the Package variable. This is so that when we create the expression for the Execute Package task, it can validate. We basically have to provide the name of the first package that will execute. We also have a variable that contains the name of the current package as its name. This is to assist us with querying the table so that we get only the immediate child packages of the current package. Figure 16-5 shows the control flow of the dynamic version of the example.

Image

Figure 16-5. CH16_Apress_DynamicParentPackage

The Execute SQL task, SQL_GetPackageList, is used to retrieve the data from dbo.CH16_Apress_PackageExecution. It contains a parameterized query that is shown in Listing 16-5. The Foreach Loop container, FELC_PackageExecution, loops through the data set that is returned by the Execute SQL task. The Execute Package task simply executes as the loop iterates through the table.

Listing 16-5. SQL_GetPackageList query

SELECT pe.Package,
        pe.PackagePath+''+pe.Package+'.dtsx',
        pe.ExecuteOrder,
        pe.DisablePackage
FROM dbo.CH16_Apress_PackageExecution pe
WHERE pe.ParentPackage = ?
ORDER BY pe.ExecuteOrder;

The ORDER BY clause is absolutely crucial for this process. It will ensure that the packages are executed in the proper order. Without it, there is no guarantee that they will. Figure 16-6 shows the parameter mapping configuration of the Execute SQL task. Note that the data type and parameter name need to match the query supplied.

Image NOTE: An additional WHERE clause could be pe.DisablePackage <> 1. We did not add this in order to demonstrate that the parameter bindings are preserved with this implementation of the parent-child design pattern. With that clause, you would not have to worry about packages being opened and validated by Visual Studio. They would simply be skipped over.

Image

Figure 16-6. Parameter mapping SQL_GetPackageList

Figure 16-7 demonstrates the Result Set page of the SQL_GetPackageList task. The result set is stored within the Object variable that was shown earlier. With this result set, we can now supply the Foreach Loop container with the enumerator it requires.

Image

Figure 16-7. Result Set SQL_GetPackageList

Figure 16-8 shows the Collection page of FELC_PackageExecution. The Control::Packages variable is used as the ADO enumerator. It will contain the names of the packages and other vital information that can be used for the ETL processes.

Image

Figure 16-8. Collection for FELC_PackageExecution

Figure 16-9 demonstrates the configuration of the variables that will accept the values from each record in the table set. All five columns have been mapped to five different variables in the order that they appear in the SELECT statement. The container’s configuration will not execute unless all the columns are mapped to variables of the appropriate data type.

Image

Figure 16-9. Variable mappings FELC_PackageExecution

The Execute Package task, EPT_ChildPackage, is configured almost identically to packages in the static example. The only difference is that all the executions will have the configurations, regardless of whether the package is to be disabled during execution or not. Figure 16-10 shows the Package page of the task. The Password property is left to its default because none of the packages actually have a password.

Image

Figure 16-10. Package page EPT_Package Execution

The parameter bindings, shown in Figure 16-11, will be the same for all the packages. You need to ensure that the packages at least have a parameter with the same name in all of the child packages. The parameter binding to the package property also should be in place for all the child packages. This parameter mapping is the same as the one shown previously in Figure 16-2.

Image

Figure 16-11. Parameter bindings EPT_PackageExecution

Figure 16-12 demonstrates the expression that really makes all this possible. The expression bound to the PackageName property refreshes with every iteration of the Foreach Loop container. The variable Control::Package is constantly updated with the values in the Package column from the result set of the Execute SQL task. The expression, @[Control::Package]+".dtsx", performs a string concatenation that is acceptable for the property so that the appropriate package can be found within the project.

We provide a default value to the variable because the parent package needs to validate the existence of such a package. When you open the parent package within Visual Studio, it will attempt to look for a package with the name that corresponds to evaluation of the expression. Without a default value, the parent package will attempt to connect to a package named .dtsx. At runtime this will result in a failure during the validation phase.

Image

Figure 16-12. Expressions page EPT_PackageExecution

After this configuration is set up, all you need to do is to execute CH16_Apress_DynamicParentPackage. It will automatically execute the child packages just like the static example. The only difference between the two methods is that the management of the execution process and the addition of packages becomes easier. When you execute this package in Debug mode within Visual Studio, you will see message boxes similar to the one in Figure 16-13.

Image

Figure 16-13. Child package message box

Summary

With designing ETL processes, it is very important to bear in mind the maintainability of the code. The parent-child design pattern offers a flexible approach that will modularize you ETL packages, allowing you to add or remove packages or executables from the process with little difficulty or code change. The static implementation will modularize your code but is not ideal if packages are constantly being developed. The dynamic parent-child implementation allows you to let the data drive your ETL processes. We used the example of a table-driven process. This concludes the design patterns section of the book. The next section covers the more advanced capabilities and functionalities of SSIS.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset