Chapter 12. SQL Server Integration Services Overview

IN THIS CHAPTER

New SSIS Features

The SSIS Architecture

The SSIS Toolkit

Designing SSIS Packages

The World of Wizards

SSIS and Data Warehousing

In a typical enterprise environment, data is often stored in different formats and schemas in disparate sources. It is often necessary to extract and convert this data from different data types and formats into a unified format to be stored in a single destination or processed further for reporting. This task of moving, manipulating, integrating, cleaning, validating, and summarizing data in SQL Server 2005 can be achieved by using the technology called SQL Server Integration Services (SSIS). Formerly known as Data Transformation Services (DTS), SSIS has undergone a complete overhaul to improve performance, to provide better control over data flow, and to introduce conditional flow and looping. The basic architecture and programming object model have changed in SSIS to separate the different processes, such as management, the runtime, and data flow.

This chapter explores the SSIS platform by introducing you to the new SSIS architecture, the new control flow and data flow tasks, the new package authoring environment (the designer and wizards), and SSIS command-prompt utilities. Let’s begin with an overview of new features introduced in SSIS.

New SSIS Features

SSIS includes a complete set of data transformation and integration services, graphical tools, programmable objects, and APIs. You can use SSIS to perform tasks such as integrating data from varied data stores, cleansing data, refreshing data into business intelligence data warehouses and data marts, automating administrative functions, and so on. Some of this functionality was available with DTS in SQL Server 2000. The current release takes the DTS functionality to the next level by introducing the following new features:

Architectural enhancements—The new SSIS architecture is divided into four core components: clients, the SSIS service, the runtime engine, and the data flow engine. The SSIS client component includes the graphical and command-line tools and wizards that can be used for package design and creation. The SSIS service is responsible for managing the storage of packages and tracking packages that are running. The SSIS runtime engine manages and sets appropriate properties during runtime and provides support for logging and debugging. The SSIS data flow engine controls the flow of data from source to destination. The SSIS data flow components include source adapters, transformations, and destination adapters. The separation of the different components to be managed by different services allows for more manageability and efficiency. The new architecture allows you to extract data from multiple sources and write data to multiple destinations.

Performance enhancements—The new SSIS architecture uses in-memory buffers as much as possible to move data from the source to the destination. SSIS is designed to leverage the increased amount of the memory. For example, in the case of a table lookup task, the lookups are stored in the memory cache to avoid expensive trips to the disk. This results in improved performance of the task execution.

New transformations and tasks—SSIS introduces several new tasks that aid in data cleansing and transformation. Examples of these new tasks include Derived Column, Data Conversion, File Extractor, and File Inserter. In addition, different tasks such as Multicast, Conditional Split, Sort, Aggregation, Merge, and Merge Join have been introduced, and they can be very useful for redirecting output based on defined conditions and to perform various functions on the data. The new tasks specific to Analysis Services and data mining include Slowly Changing Dimensions, Data Mining Query, and so on. The new looping containers, such as the For Loop and Foreach Loop, allow you to iterate over files in a folder iterate over an items list, and so on. The new Script tasks allow you to use .NET languages such as C# or Visual Basic .NET to author scripts.

SSIS designer—The new graphical designer includes better tools and templates to enable easy creation of packages. The designer offers better project management, new tasks and templates for easier package creation, tools for better deployment, and storage and debugging tools to monitor package execution. The designer includes Package Explorer, which allows you to browse and access different features of the package; Solution Explorer, to access projects and their associated packages; and different debugging windows, such as output windows, breakpoints, watches, and so on. The designer separates control flow, data flow, and event handling onto multiple tabs/screens, allowing complete control and flexibility to author complex packages and to provide better control flow semantics.

ExtensibilityIf none of the built-in components satisfy your needs, you can create your own custom source, transformation, enumerators, and other control flow and data flow elements by using the SSIS extensibility object model and any .NET programming language, such as Visual Basic .NET or C#.

Import/Export Wizard enhancement—The mport/Export Wizard has been enhanced to include better support for data in flat files, including the ability to manipulate data at the column level. The wizard allows you to create a new database if the destination database does not exist and allows you to map the source and destination columns. You can even omit the columns that are not required. In addition, the wizard allows you to preview the data in real time.

Let’s now take a closer look at the new SSIS architecture.

The SSIS Architecture

With DTS in SQL Server 2000, control flow and data flow were tightly coupled, making it difficult to create and manage complex packages. The new SSIS architecture breaks this tight coupling and separates control flow from data flow. You will see this change when you start designing packages in SSIS designer inside Business Intelligence Development Studio. Control Flow and Data Flow are two tabs or screens in the SSIS package designer. The other change in the SSIS architecture is the division of package creation, storage, execution, and data flow into four core components: clients, the SSIS service, the runtime engine, and data flow engine.

Before further exploring these four core components, here are some SSIS terms that you should become familiar with:

Task—A package contains one or more tasks, which perform various data manipulation and management functions.

Container—Tasks are grouped by entities known as containers. Containers support iterative control flow in packages and group tasks into meaningful units of work. The SSIS architecture is based on containers. The package container organizes containers, tasks, connections, transformations, variables, configurations, and precedence constraints into an executable unit of work. Containers, except for the TaskHost container, can hold other containers, and they provide scope for variables, transactions, and events.

Precedence constraintPrecedence constraints organize the tasks and containers in a package workflow and specify the conditions for execution of the tasks.

Data Flow task—Tasks that extract data from different sources, provide the facility to transform, cleanse and modify data, and store data in the appropriate destination are known as Data Flow tasks.

Package—A collection of different tasks and containers, connections, precedence constraints controlling data flow, variables, and different configurations is combined together into a unit of work called a package. The different components of SSIS manage, store, and execute this unit of work.

Figure 12.1. An SSIS package is a collection of components such as tasks, containers, connections, variables, and configurations.

image

The following sections look at the core SSIS components in more detail.

SSIS Clients

The SSIS clients include the built-in tools, wizards, and command-line utilities, as well the custom applications that you create for designing and creation of packages. The SSIS designer graphical tool allows you to create SSIS packages without writing a single line of code. The designer contains a collection of built-in tasks to choose from, or you can program a custom task by using .NET managed or native code. In addition to the designer, SSIS contains several wizards and command prompt utilities that assist in configuring the packages by creating configuration files, deploying packages, and creating package dependencies. These are discussed later in this chapter.

SSIS Service

SSIS has separated the management and design of packages for better manageability into a Windows service (MsDtsSrvr.exe). This service provides management support for SSIS package storage and execution. This service is off by default, and the startup type is set as disabled but is automatically started when the first DTS package runs. You can manage this service by using the SQL Server Configuration Manager tool. In addition to monitoring local and remote running packages, this service also controls the storage of packages. Packages can be stored either in the sysdtspackages90 table in the msdb SQL Server system database or as XML files (.dtsx) on the file system.


Note

Business Intelligence Development Studio allows you to save SSIS packages to disk files only as .dtsx XML-formatted files. You can export the packages to the msdb database by using SQL Server Management Studio.


The SSIS Runtime Engine

The DTS runtime engine executes packages and provides support for logging, debugging, configuration, connections, and transactions. It provides support for the following:

• Execution of each task specified by the workflow and consideration of different properties, such as concurrent execution of multiple tasks.

• Handling of the scope and namespace of the variables passed between tasks, containers, or packages.

• Logging of events raised during package execution. Table 12.1 lists the SSIS events raised during package execution.

Table 12.1. SSIS Events

images

• Debugging of packages by storing information about enabled breakpoints.

The runtime engine is also responsible for establishing the connection to any external data source used by the package.

The Data Flow Engine

The SSIS data flow engine manages the tasks that move data between the source and destination. SSIS uses in-memory buffers for better performance. The Data Flow task provides and manages these buffers, calls the appropriate data flow components, and provides transformations to modify and cleanse data. The Data Flow task is a set of connected data flow components consisting of data adapters and pipelines. These are the different data flow components (see Figure 12.2):

Source adapters to extract the data—Source adapters are usually the first components in a data flow and do not usually have an input but can have one or more outputs.

Transformations—These are the pipeline processes that are used to modify and cleanse data. They have an input and output to receive data from upstream and send processed data to downstream. Transformations can be synchronous or asynchronous. For synchronous transformation task, every input has an associated output; in an asynchronous task, there can be multiple outputs for a single input.

Destination adapters—These are used to connect and store data in the data sources. They have at least one input and are usually the last component in the graph.

Pipeline paths—The components are connected together by pipeline paths. Paths connect the output of one component to the input of the second. They are different from precedence constraints because they only connect the components and do not constrain the next data flow component.

Figure 12.2. The Data Flow task is a set of connected data flow components consisting of source adapters, transforms, destination adapters, and pipelines.

image

As shown in Figure 12.2, the data flow task can contain one or more source adapters to read the data, one or more transformations to process and cleanse the data, and one or more destinations to store the data.

The data flow engine manages interdependency between the different components of a data flow task and also supports parallel execution of the non-connected tasks. The data flow engine supports merging, splitting, and multicasting of data in a data flow and can also be used to redirect an error to a destination output.

The SSIS Toolkit

SSIS offers a user-friendly integrated environment called Business Intelligence Development Studio (BIDS), which can be used to create, debug, and deploy SSIS packages. This tool leverages the Visual Studio .NET GUI foundation to offer features such as tabbed windows, solution and project management, enhanced designer interfaces, debugging, SourceSafe integration, help integration, and so on. BIDS combines a graphical editor to design packages and several tools and templates to create and debug packages. Several built-in tasks are available in BIDS to allow you to create complex packages without writing a single line of code. The SQL Server Management Studio tool, on the other hand, offers an environment to manage the existing packages.

Let’s now look at some built-in containers and tasks that you can use while designing SSIS packages.

Containers

Containers provide structure and meaning to a package by grouping various tasks and containers together. Containers can include other containers in addition to tasks. Packages use containers for purposes such as to group tasks and containers that must succeed or fail as a unit, to repeat tasks for each element in a collection (such as files in a folder), and to repeat tasks until a specified expression evaluates to false.

SSIS provides four types of containers for building packages: the Foreach Loop container, the For Loop container, the Sequence container, and the Task Host container. Let’s look at each of these types in little more detail.

The Foreach Loop Container

The Foreach Loop container allows a package to loop repetitively, based on an enumerator. The following different enumerator types are available:

Foreach File Enumerator—This type is used to traverse files and subfolders on the disk drive.

Foreach ADO Enumerator—This type is used to traverse rows in an ADO rowset.

Foreach ADO.NET Schema Rowset Enumerator—This type is used to enumerate the data source in a schema.

Foreach Item Enumerator—This type is used to traverse items in a collection such as documents.

Foreach From Variable Enumerator—This type is used to traverse the values of a variable.

Foreach Nodelist Enumerator—This type is used to enumerate the resultset of an XML Path Language (XPath) expression.

Foreach SMO Enumerator—This type is used to enumerate SQL Server Management Objects (SMO) objects such as tables and databases in a SQL Server instance.

The For Loop Container

The For Loop container evaluates a specified expression, and the repetitive flow is continued until the expression evaluates to false. The loop is defined by an initial expression that is optional, an evaluating expression that should result in false for the loop to stop, and an optional iterative expression to increment or decrement the loop counter.

As shown in Figure 12.3, the initial expression sets the variable @var to 0. The tasks in the For Loop container are executed in a loop until @var is less than 10. AssignExpression increments the @var variable each time the loop is run.

Figure 12.3. The For Loop container defines a repeating control flow in a package.

image

The Sequence Container

While designing SSIS packages, often it is necessary to run multiple tasks in parallel or to group together several tasks in a collection such that the collective property of all the tasks in the collection affects the next task downstream. Such tasks and containers can be grouped in the Sequence container. The Sequence container creates a subset of the package control flow to allow you to manage properties on multiple tasks by grouping them in a single container and allows debugging the tasks as a single unit of work. In addition, you can define scope on the variables over the entire Sequence container. An example of the Sequence container is shown in Figure 12.4.

Figure 12.4. Sequence containers group a package into multiple separate control flows, each containing one or more tasks and containers that run within the overall package control flow.

image

The Task Host Container

The Task Host container essentially encapsulates a single task. Setting the properties of the encapsulated task automatically sets the properties of the Task Host Container. This container is useful when you’re creating SSIS tasks programmatically.

Tasks

Tasks define the units of work to be performed as part of a package for process communication and data flow. The tasks in SSIS can be divided into the following categories:

Control-flow tasks—These tasks are used for process-oriented tasks such as communicating with other processes, preparing data for the data flow, working with SQL or Analysis Services objects, or extending package functionality through programming.

Data flow tasks—These tasks are used for data control and manipulation (for example, extracting, transforming, and loading data from a source to a destination).

Database maintenance tasks—These tasks are used to perform administrative and maintenance processes for SQL Server to ensure performance, high availability, and disaster recovery.

Custom tasks—If the required functionality is not available in the three other kinds of tasks, you can extend the functionality by implementing custom tasks. The SSIS programming model exposes a set of APIs and base classes for implementing custom tasks. You can inherit from these classes and override the required methods to achieve the desired functionality.

Let’s now look at each of these task types in a little more detail.

Control-Flow Tasks

The different control-flow tasks can be further divided into SQL Server tasks, process-oriented workflow tasks, data-oriented tasks, Analysis Services tasks, and scripting tasks.

SQL Server Tasks

SQL Server tasks allow access, configuration, and administration of SQL Server objects. SQL Server tasks include the following:

The Execute SQL task—The Execute SQL task allows execution of Transact-SQL (TSQL) statements and stored procedures. In addition, it allows the running of parameterized queries. Input variables for stored procedures and queries are mapped to the value of ? at runtime. Parameters are order dependent. In the case of a stored procedure with several parameters, the SSIS engine follows the order in which the input variables are assigned and maps accordingly. The first input variable is mapped to the first ?, and so on. The return values can be a single row, multiple rows, or XML.

The Bulk Insert task—This task allows you to copy data from flat files to SQL Server tables. It provides various properties of the bulk cinsert format file, such as format, batch size, and so on.

The other SQL Server tasks—including the Transfer SQL Server Objects task, the Transfer Database task, the Transfer Error Messages task, the Transfer Jobs task, the Transfer Logins task, and the Transfer Master Stored Procedures task—allow you to transfer different SQL Server objects to different database instances.

Process-Oriented Workflow Tasks

The process-oriented workflow tasks include the following:

The Execute Package task—This task allows the execution of other SSIS packages as part of the workflow.

The Execute DTS 2000 Package task—This task allows the execution of DTS packages developed in SQL Server 2000 as is, without migration to SSIS. Not all the tasks available to the DTS packages in SQL Server 2000 are available in SSIS. In addition, the architecture and object model have changed tremendously. Hence, not all DTS tasks can be migrated to SQL Server 2005 SSIS. This is discussed in more detail in the section “Migrating DTS 2000 Packages to SSIS,” later in this chapter.

A DTS 2000 Execute package requires the DTS 2000 Runtime to be present. The DTS runtime is part of SQL Server 2000. However, if SQL Server 2000 is not installed on the server, you can install the DTS 2000 runtime while installing SQL Server 2005. During the SQL Server 2005 setup, you click the Advanced Options button and select for DTS 2000 to be installed. The connections associated with packages to be executed by the Execute DTS 2000 Package task are maintained in the SQL Server 2000 format and are not migrated. Figure 12.5 shows the Execute DTS 2000 Package Task Editor window.

Figure 12.5. The Execute DTS 2000 Package task runs packages that were developed by using SQL Server 2000 tools.

image

The Execute Process taskThis task allows execution of applications and Windows batch files as part of the SSIS workflow.

The Message Queue task—This task allows you to use message queuing to send and receive messages between SSIS packages or to send a message to an external application queue.

The Send Mail task—As the name suggests, this task can be used to send email messages over SMTP. You can use this task to send an email message if tasks in the package workflow succeed or fail or in response to an event that the package raises at runtime. The task uses an SMTP connection manager to connect to a mail server.

The WMI Data Reader task—This task runs Windows Management Instrumentation (WMI) Query Languages queries to retrieve information from windows event logs, application or performance information about the different server resources including hardware, and so on. The WMI Data Reader task uses a WMI connection manager to connect to the server from which it reads WMI information.

The WMI Event Watcher task—The WMI Event Watcher task watches for specific configured WMI events. The alert events raised could be an application installation on a production server or degrading performance of hardware resources.

Data-Oriented Tasks

Data-oriented tasks include the File System task, the File Transfer Protocol task, the Web Services task, and the XML task. These tasks are used for copying and modifying files, downloading files from different locations, executing web methods, and working with XML documents.

Analysis Services Tasks

Analysis Services tasks include the Analysis Services Processing task, which allows processing of Analysis Services objects such as cubes, dimensions, and mining models; the Analysis Services Execute DDL task, which allows creating, dropping, and altering of mining models and multidimensional objects such as cubes and dimensions; and the Data Mining Query task, which allows running of prediction queries based on data mining models built in Analysis Services.

Scripting Tasks

Scripting tasks allow you to program code to extend the functionality of a package. The two tasks that allow you to program code for better functionality are the Script task and the ActiveX Script task. The ActiveX Script task allows you to code in scripting languages such as VBScript or JScript. To code in .NET languages such as Visual Basic. NET or C# .NET, SSIS has introduced a new task called the Script task, which allows you to perform complex tasks by writing VB .NET or C# code.

Data Flow Tasks

Data flow tasks control the flow of data from the source to the destination and allow you to cleanse and transform data and redirect portions of the data to different destinations.

The data flow tasks are divided into source adapters, transformations, and destination adapters.

Source Adapters

Source adapters allow you to connect to different data sources, using different connection managers, and to extract and read data. The different connections that you are allowed to connect to are OLE DB, flat files, Excel files, raw files, XML files, and data reader.

Transformations

Different transformation can be applied to cleanse, modify, and transform data. These are the different transformations:

The Conditional Split transformation—This transformation can redirect data rows to different outputs based on certain conditions. Each input row is redirected to an output based on an expression.

The Multicast transformation—The Multicast transformation is similar to the Conditional Split transformation but directs every row to every output. This task is useful when multiple sets of transformations need to be applied to the same data and logical copies of the data need to be created to be processed further, using different transformations.

The Derived transformation—This transformation is useful for cleansing and modifying data because it replaces old columns or creates new column values by applying user-defined expression to the column values. You can also use it to create new columns based on expressions applied to variables.

The Data Conversion transformationThis transformation converts the data type of a column to a different data type, so it is similar to the Cast/Convert function in T-SQL.

The Merge transformation—The Merge transformation provides an output by merging two sorted datasets based on the values in their key columns, resulting in sorted merged output.

The Merge Join transformation—Similarly to the Merge transformation, this transformation merges two datasets but joins them by using FULL, LEFT, or INNER joins.

The Aggregate transformation—You can use this transformation to perform aggregation operations such as an average, a sum, or a count.

The Character Map transformation—You can use this transformation to apply string functions against character data.

The Copy/Map transformation—You can use this transformation to create new columns by copying input columns and adding the new columns to the transformation output.

The File Extractor transformation—You can use this transformation to insert data from data flow into a file. This transformation is useful in moving data of specific formats to separate files.

The File Inserter transformation—This transformation reads data from a file and adds it to a data flow. This transformation is useful for adding more information to processed data.

The OLE DB Command transformation—This transformation executes a SQL command for each row in a data flow.

The Data Mining Model Training transformation—This transformation trains data mining models on the data received from the data mining model algorithms.

The Data Mining Query transformation—You can use this transformation to run data mining prediction queries.

The Dimension Processing transformation—As the name suggests, you can use this transformation to process Analysis Services dimensions.

The Fuzzy Grouping transformation—This transformation performs data cleansing tasks by identifying rows of data that are likely to be duplicates and choosing a canonical row of data to use in standardizing the data.

The Fuzzy Lookup transformation—This transformation performs data cleansing tasks such as standardizing data, correcting data, and providing missing values by looking up values in a reference table, using a fuzzy match.

The Logged Lineage transformation—This transformation allows extraction of environmental variable values and includes them as a part of the data flow.

The Lookup transformationYou can use this transformation to perform lookups in a reference dataset, which can be an existing table or view, a new table, or the result of a SQL statement.

The Partition Processing transformation—You can use this transformation to process Analysis Services partitions.

The Row Count transformation—This transformation counts rows and stores the total in a variable.

The Row Sampling transformation—You can use this transformation to create a sample dataset by selecting a specified number of the transformation input rows.

The Script Component—This transformation uses script to extract, transform, or load data.

The Slowly Changing Dimension transformation—You can use this transformation to coordinate the updating and inserting of records in data warehouse dimension tables.

The Sort transformation—You can use this transformation to sort data in ascending or descending order.

The Union All transformation—As the name suggests, this transformation merges multiple datasets.

The UnPivot transformation—You can use this transformation to create a normalized version of an unnormalized table.

Destination Adapters

Destination adapters allow you to load processed data into different destinations. The different destination adapters available in the Toolbox are the Flat File destination adapter, the OLE DB Destination adapter, the Raw File destination adapter, the Recordset destination adapter, the SQL Mobile destination adapter, and the SQL Server destination adapter.

Database Maintenance Tasks

Maintenance tasks, such as backing up the database, monitoring index fragmentation, checking database integrity, and executing custom scripts to monitor database performance and high availability, are some of the common tasks that database administrators perform on a regular basis. SSIS offers several useful database maintenance tasks for DBAs to include in a maintenance plan without writing custom scripts or coding. The different database maintenance tasks available in the Toolbox include those described in the following sections.

The Backup Database Task

The Backup Database task allows you to back up a single or multiple databases. You can perform a full backup, a differential backup, or a transaction log backup based on the recovery model. In addition, this task also allows you to create a file and filegroup backup for a database with multiple filegroups and data files. The backup device can be specified or created in this task, and it contains all the properties associated with backup creation using T-SQL or SQL Server Management Studio.

The Reorganize Index Task

You use the Reorganize Index task to defragment fragmented indexes in a single or multiple databases. The task also includes an option to defrag large object data such as text, ntext, varchar(max), nvarchar(max), varbinary(max), or xml data. To configure defragmentation of large objects, you check Compact Large Objects in the task dialog box or set the CompactLargeObjects option to True in the task properties box. This option is off by default.

The Check Database Integrity Task

The Check Database Integrity task encapsulates the DBCC CHECKDB T-SQL command to check the allocation and structural integrity of all the objects in a single or multiple databases. To check the integrity of indexes, you select Include Indexes in the dialog or the IncludeIndexes custom property in the properties box.

The Rebuild Index Task

The Rebuild Index task allows you to rebuild indexes on SQL Server objects such as tables and views. The task contain different custom properties, such as allocation of free space, FillFactor percentage, PAD_INDEX to allocate free space specified by the fill factor, SORT_IN_TEMPDB to store intermediate results in TempDB, and IGNORE_DUP_KEY to allow multiple-row index operations. In addition, it provides the option to release table locks by setting the ONLINE option to ON.

Other database maintenance tasks available in the Toolbox include the following:

The History Cleanup task—You use this task to delete history tables in the msdb database.

The Execute SQL Server Agent task—You use this task to run SQL Server Agent jobs.

The Notify Operator task—You use this task to send notification messages.

The Maintenance Cleanup task—You use this task to remove old files related to maintenance plans.

The Notify Operator task—You use this task to send notification messages via email, pager, or net send to SQL Server Agent operators.

The Shrink Database task—You use this task to shrink the size of the database and logs to a user-defined size.

The Update Statistics task—You use this task to update statistics by distributing the key values for one or more statistics groups.

Business Intelligence Development Studio

Business Intelligence Development Studio (BIDS) is an integrated environment for designing, building, and deploying business intelligence applications. BIDS integrates the features previously available in Analysis Manager and DTS in SQL Server 2000. BIDS contains windows described in the following sections.

Solution Explorer

The Solution Explorer provides a solution- and project-based organized view of packages, data sources, data source views, and other files. You can use it to view the various packages and access their properties. In addition, the Solution Explorer allows versioning of packages using Microsoft Visual SourceSafe.

The SSIS Designer Window

The SSIS designer window provides a nice graphical view of packages. It allows you to create, modify, debug, and save packages. The designer contains the following tabs:

Control Flow—This tab contains the process-oriented tasks of the SSIS package, in the order of execution and the looping structures that allow repetitive execution of tasks.

Data Flow—This tab contains the data-oriented part of the SSIS package and allows you to create packages that control the flow of data from the source to the destination.

Event Handlers—This tab allows you to create custom event handlers to manage errors and other events raised during package execution. Error handlers can be created to send emails, retrieve system information, clean up after a particular task completion, and so on.

Package Explorer—This tab displays package objects, including variables, executables, precedence constraints, event handlers, connection managers, and log providers, in a hierarchical view. The Package Explorer tab reflects the SSIS object hierarchy, with the package container being the top of the hierarchy. As this container is expanded, you can view and access the variables, executables, precedence constraints, event handlers, connection managers, and log providers associated with the package. The Package Explorer allows you to view and delete package objects.

Execution Results—This tab displays the package execution results.

Connection Managers—You can use the Connection Managers tab to create, edit, and delete various connection manager types, including OLE DB, ADO.NET, flat files, SMTP, WMI, and so on.

The Toolbox Window

The Toolbox contains all the built-in tasks, arranged in proper order and categorized by functionality. The Toolbox category and items change depending on the settings on the Control Flow and Data Flow tabs. For instance, if you are in the Control Flow tab, the Toolbox shows maintenance plan tasks and control flow items; however, if you are in the Data Flow tab, the Toolbox lists data flow source, transformation, and destination tasks.

The Properties Window

The Properties window contains the properties of each task selected. You can specify the custom properties associated with each task in this window.

The Debug Windows

The Debug windows give you the ability to set breakpoints and debug the SSIS package and watch the data as it flows through the package workflow. This provides tremendous value in catching and fixing errors and in troubleshooting SSIS packages. The following sections describe some of the debug windows available in BIDS.

The Breakpoint Window

A breakpoint suspends package execution at the point where the breakpoint is defined. Breakpoints can be set on the different tasks in the SSIS designer to suspend a package on the different event handlers specified in Table 12.1. You can customize breakpoint behavior by setting the hit count type. The hit count type can be specified as follows:

Always—Execution is suspended when the breakpoint is hit.

Equals—Execution is suspended when the breakpoints are equal to the hit count.

Greater or equal—Execution is suspended when the breakpoint is equal to or greater than the hit count.

Multiple—Execution is suspended when a multiple of the hit count occurs.

The enabled breakpoint can be viewed in the breakpoint window. It also displays the various supported breakpoints and allows you to enable or disable the breakpoints.

The Command Window

The Command window allows you to specify execution commands by bypassing the menu system.

The Immediate Window

The Immediate window allows you to debug, evaluate expressions, print variables, and so on.

Call Stack Window

The Call Stack window lists the SSIS containers that are currently running.

The Locals Window

The Locals window provides the status on the current executables and variables. It lists all the variables within the scope of the current context.

The Output Window

The Output window displays various status messages during the course of package compilation and execution. It displays validation errors when a package is opened, build errors when a package is compiled, and progress/status messages on execution of a package.

The Watch Window

The Watch window is used for viewing, editing, and saving variable values and expressions. It displays various variables and expressions with its associated data types and evaluated expressions.

Progress Reporting

SSIS uses two methods to report the progress of a package during the execution:

• The Progress tab

• Color-coding

The Progress tab displays the order of the task execution and the start and finish times of each task. In addition, it displays any errors encountered during the execution of the package.

Color-coding is also used to display the status of package execution. Depending on color, you can distinguish whether a package is running or whether it has successfully completed or failed. This color-coding is displayed only during package execution. Table 12.2 describes the color-coding.

Table 12.2. Execution Status Color-Coding

images

SQL Server Management Studio

SQL Server Management Studio allows you to manage existing packages. It allows importing and exporting of packages saved from one kind of storage format to another. It allows storing of a package to either the file system in XML format, SQL Server, or the SSIS package store. Management Studio cannot be used for designing or debugging packages, but it can be used to execute a package, view running packages, and import and export packages. When the SSIS service is running, you can use Management Studio to connect to an SSIS instance in the Object Explorer window to see existing and running packages.

Command-Prompt Utilities

SQL Server 2005 includes several command-prompt utilities that you can use to manage and execute the SSIS packages. They include the following:

DTExecUI—This utility allows execution of a package by opening a user interface to set the execution properties.

DTExec—This utility allows execution of existing packages from the command line.

DTUtil—This utility allows management of existing packages from the command line.

The following sections describe these utilities in more detail.

DTExecUI

You can invoke this utility from the command prompt by typing DTExecUI (see Figure 12.6). It opens up a user interface where you can set the properties to execute the package.

Figure 12.6. The DTExecUI utility can be used to run an SSIS package.

image

The DTExecUI utility allows execution of SSIS packages stored in SQL Server, the file system, or the SSIS package store. You can specify the configuration file to be loaded to set the object properties of the package during runtime, or you can set values of individual objects. In addition, you can specify the commands to load, and in what order, during runtime. You can set the connection properties if the package needs to connect to an external data source. You can also set different execution options, such as setting validation warnings or enabling package checkpoints. Log files can be specified to log the execution of a package or set package verification. Alternatively, you can choose to report execution information to the console, and you can specify the level of information to display.

DTExec

DTExec is similar to DTExecUI except that it does not contain a user interface. All the settings are passed as command-line parameters. Based on the exit code, the user can determine whether the command was executed successfully. Table 12.3 lists the return values for DTExec.

Table 12.3. Return Values for DTExec

images

DTUtil

The DTUtil command-line utility allows you to manage existing packages by allowing you to perform operations such as copying, deleting, moving, and signing a package. You can also use DTUtil to verify whether a package exists. Table 12.4 lists the return values for DTUtil.

Table 12.4. Return Values for DTUtil

images

Designing SSIS Packages

As described earlier, SSIS bundles a suite of built-in workflow and data flow tasks that you can use to create SSIS packages. The SSIS extensibility feature enables you to create your own tasks and transformations. You can use the SSIS object model to execute a package programmatically or to change package properties dynamically at runtime. The following sections walk you through the process of designing and executing an SSIS package. But before that, let’s look at the SSIS object model, expressions, and logging features.

The SSIS Object Model

As with DTS in SQL Server 2000, SSIS also includes an object model, which you can leveraged to do the following:

• Modify SSIS package properties at runtime

• Develop, load, and execute SSIS packages programmatically, in lieu of using the designer

• Extend SSIS by coding customized tasks, log providers, enumerators, connection managers, and data flow tasks

You can code against this object model inside a package itself by using the Script task or externally in .NET managed code. The Script task is preferably used for single-use, package-specific situations, such as modifying SSIS package properties during runtime. External .NET programs are usually implemented to create reusable customized code to be invoked in multiple SSIS packages, or to load, modify, or execute SSIS packages. The use of the object model to extend the functionality of SSIS (to create custom tasks and transformations) is beyond the scope of this book. An easy way to visualize the SSIS object model is to look at the Package Explorer in BIDS. The hierarchy should be traversed in code, much as it appears onscreen.

An SSIS Object Model Example

Let’s look at an example of a simple package that loads the contents of a flat file into a SQL Server table. It is not uncommon for a flat file name, or even path, to change from one execution to another, or even within a loop in the same invocation. To resolve such issues, you can access the SSIS object model and assign task properties, using tasks such as the Script task. As shown in Figure 12.7, the tasks in this sample package include a Script task to change the source text file path, a SQL task to truncate the destination table, and a Data Flow task to copy the text file contents to the SQL Server table.

Figure 12.7. This simple package uses the Script task to change the source text file path to a value specified in a package variable.

image

You can follow these steps to try out this sample:

  1. Unzip the ObjectModelExamplePackage.zip file provided with the source code download for this book.
  2. Copy the testfile2.txt text file to C:.
  3. Execute CreateTable.sql by using SQLCMD or Management Studio to create a table named dbo.TestTable in the tempdb database.
  4. Double-click the Integration Services Project1.sln solution file to open it in BIDS.
  5. Open the LoadSQLTable 1.dtsx package, if it is not already open in the designer.
  6. Double-click the JUNECTP.tempdb connection on the Connection Managers tab and update the server name to connect to your SQL Server 2005 instance.
  7. Double-click the Flat File connection and note the filename. It should be C:TestSSIS.txt, a file that does not exist.
  8. Right-click the package designer and select Variables. Ensure that the FilePath variable is set to C:TestFile2.txt, a file that you copied to C: in step 2.
  9. Execute the package by clicking the green arrow on the toolbar or by selecting Debug | Start Debugging. Notice that the Script task updates the Flat File connection to point to C:TestFile2.txt instead of C:TestSSIS.txt, the Execute SQL task truncates dbo.TestTable, and the Data Flow task copies the data from the flat file to the OLE DB destination (that is, into the dbo.TestTable table).
  10. After the package executes successfully, view the data in the dbo.TestTable table by using Management Studio or SQLCMD. You should see the data from the flat file C:TestFile2.txt inserted into the dbo.TestTable table.

The Script task code should appear as follows:

Public Sub Main()

   ' Modify the flat file connection string based on the value
   ' of the "FilePath" package variable.
   Dim DTSConnectionMgr As ConnectionManager

    DTSConnectionMgr = Dts.Connections("Flat File")
   DTSConnectionMgr.ConnectionString = CStr(Dts.Variables("FilePath").Value)

   Dts.TaskResult = Dts.Results.Success

End Sub

This script first declares a ConnectionManager variable and initializes it by referencing the name of the flat file connection manager in the Dts.Connections property. Next, it assigns the ConnectionString property of the ConnectionManager variable to the value of the FilePath package variable. The final statement returns the status of the task execution, much as was done in the ActiveX scripts used in DTS. During the package execution, the source filename is read from the package variable instead of what is currently designated in the connection property of the flat file connection manager. Other properties of an SSIS task can be modified dynamically at runtime in the same way.

SSIS Expressions

While creating packages, often it is necessary to combine variable values or constants to set a property value or to use functions to cleanse or extract a meaningful value from data before loading it to the destination. You can use expressions for such purposes.

An expression is a combination of operators, functions, and literals that yields a single data value. Here’s an example of an expression on a variable with the string data type:

SUBSTRING (variable_name, 1, 4) = "SSIS"

This expression compares the first four characters of the string to SSIS and returns a Boolean value. Based on the return value, you can perform other operations.

The tasks that use expressions include Derived Column, Conditional Split, and For Loop Containers. The elements for which the SSIS expressions are useful include precedence constraints, variables, and property expressions. Property expressions are used to set the properties of tasks and containers at runtime by using expressions on variables and return values.

Logging

SSIS allows you to configure logging for quite an elaborate list of events, and it supports a diverse set of logging providers and commonly required information regarding package execution. You can configure logging from the SSIS designer and utilities such as the dtexecui.exe and dtexec.exe command-prompt utilities. The different log providers available include text files, the Profiler, the Windows Event log, SQL Server, and XML. In addition, you can write custom log providers. You can right-click the designer window in BIDS and select Logging to configure SSIS logs.

Log options can be set either at the package level, at the level of any of its task, or at the container level. You can tune the LoggingMode property at the package, container, and task levels to customize the logging behavior. A package can write to multiple logs. You can choose different events and logged information for different tasks and containers within a package. Hence, you can set logging at a granular level. The SSIS log schema defines the set of information that you are allowed for logging. You can select any event as specified in Table 12.1 to log information in the log file.

SSIS Package to Import Flat Files into Tables

Now let’s look at designing and executing an SSIS package to import flat files into SQL Server tables. Here is the scenario: You have implemented a data warehouse database and need to load Sales and Product data into various dimensions tables. The Sales and Product data is available in flat files in respective folders. The Sales data contains a SalesID column, with the first three numbers representing the Sales category. The sales date field needs to be converted into the appropriate data type before it is loaded into the Sales table. Here is what needs to be done to load the Sales and Product data from the flat files into the database tables:

To load Sales data—You need to implement a Data Flow task that will read the Sales data from the flat files, use the Derived Column transformation to transform and extract the required values using SSIS expressions, and load the data in the SQL Server table destination by using the OLE DB connection manager. Then you use the Foreach File enumerator to loop through the files. The file path can be stored to a variable. The variable value can be used to modify the connect string of the connection manager for the source adapter.

To load Product dataBecause no transformation is required on the data, you use the Bulk Insert task to load the data from the flat files in the Products table. Here also, you use the Foreach File enumerator to loop through the files.

Because the two preceding tasks are independent, they can be executed in parallel. You need to implement those processes as separate packages and execute both the packages in parallel in the main package.

Loading the Sales Data

You an unzip the BookSample.zip file provided with the source code download for this book and follow these steps to load the sales data:

  1. Run the Preparation.sql SQL script file to create a database named BookDW and two tables, named Sales and Products, in this database.
  2. Launch BIDS and create a new Integration Services project. Rename the default Package.dtsx file LoadSalesData.dtsx.
  3. Drag and drop a Foreach Loop Container task from the Toolbox onto the Control Flow window. Double-click the Foreach Loop Container task to launch the Foreach Loop Editor dialog.
  4. In the Foreach Loop Editor dialog, select the Collections page and ensure that the enumerator type is set to For Each File Enumerator. In the Enumerator Configuration panel, select the folder that contains the Sales data flat files and select all the files (*.*). Keep the Fully Qualified radio button selected to retrieve the fully qualified filenames. Click OK to close the Foreach Loop Editor dialog. Figure 12.8 shows the Collection page on the Foreach Loop Editor dialog.

    Figure 12.8. The Foreach Loop Container is used to iterate over all the files in the Sales folder.

    image

  5. Right-click anywhere on the Control Flow window and then select Variables to open the Variables window.
  6. In the Variables window, add the variable FileName of data type String within the Foreach File Container scope. To add a variable in the Foreach Loop Container scope, select the container object in the designer and then click the Add Variable toolbar button on the Variables window.
  7. Assign the collection value of the Foreach File to the new variable. To do this, double-click the Foreach Loop Container on the designer, select the Variable Mapping page, and select the User::FileName variable with the index 0. Click OK to close the dialog.
  8. Create a connection manager for the Sales data flat files. Right-click the Connection Manager window and select New Flat File Connection. Name the connection SalesFileConnection and point it to any text file under the Sales folder. Click the Columns page and then click the OK button. To assign the file path dynamically at runtime, you need to create a Script task to access the Flat File Connection Manager and set its properties programmatically.
  9. Drag a Script task from the Toolbox onto the Foreach Loop Container. Double-click the Script task, select the Script page on the Script Task Editor dialog, and set ReadOnlyVariables to FileName.
  10. Click the Design Script button to open the Microsoft Visual Studio for Applications script editor and write the following Visual Basic .NET script:

    Imports System
    Imports System.Data
    Imports System.Math
    Imports Microsoft.SqlServer.Dts.Runtime

    Public Class ScriptMain
        Public Sub Main()

          Dim DTSConnectionMgr As ConnectionManager

          DTSConnectionMgr = Dts.Connections("SalesFileConnection")

          DTSConnectionMgr.ConnectionString = _
                   CStr(Dts.Variables("FileName").Value)
          Dts.TaskResult = Dts.Results.Success

        End Sub
    End Class

  11. Save the script, close the script editor, and return to the package designer.
  12. Create a Data Flow task to cleanse and load the data. Drag and drop the Data Flow task to the Foreach Loop Container. Connect the precedence constraint (that is, the green arrow) from the Script task to execute the Data Flow task.
  13. Double-click the Data Flow task, and it opens in the Data Flow window.
  14. Drag and drop a Flat File Source from the Toolbox to the Data Flow designer. Double-click Flat File Source, select SalesFileConnection as the Flat file connection manager, and then click OK.
  15. Drag and drop a Derived Column transformation task from the Toolbox to the Data Flow designer. Use the green arrow to connect the Flat File Source to this Derived Column task.
  16. Double-click the Derived Column task. Add two new columns named SalesCategory and SalesDate, as shown in Figure 12.9. The SalesCategory column uses the SUBSTRING([Column 0], 1, 3) expression to extract the first three characters from the first column in the input text files. The SalesDate column casts the third column to the date data type.

    Figure 12.9. The Derived Column transformation task is used apply to expressions to transformation input columns.

    image

  17. To redirect the error rows to an error log file, click the Configure Error Output button on the Derived Column Transformation Editor dialog. For both the derived columns, change the Error column value from Fail component to Redirect row. Click OK to close the Configure Error Output dialog and the click OK to close the Derived Column Transformation Editor dialog. Figure 12.10 shows the Configure Error Output dialog.

    Figure 12.10. The Configure Error Output dialog is used to redirect error rows instead of failing the component.

    image

  18. Drag and drop a Flat File Destination component from the Toolbox to the Data Flow designer and link the error output (by using the red arrow) from the Derived Column to the flat file destination. Double-click the flat file destination and create a new flat file connection. Provide the path and name of the text file that will receive the error rows. After creating a new file connection, select the Mapping page on the Flat File Destination Editor and click OK.
  19. To load the successful rows into the destination Sales table, drag and drop an OLE DB Destination adapter to the Data Flow designer. Link the Derived Column success path (by using the green arrow) to the OLE DB Destination. Next, double-click the OLE DB Destination and connect it to the BookDW database created in step 1. To do this, click the New button next to OLE DB Connection Manager and create a new connection manager that uses Native OLE DB/SQL Native Client to connect to SQL Server 2005 instance containing the BookDW database. Set Data Access Mode as Table or View and then select the Sales table from the Name of the Table or the View list. Click the Mappings page. Map Column 0 to Sales ID, Column 1 to TranID, Column 2 to Customer ID, and SalesCategory and SalesDate derived columns to table columns with the same names. Click OK.
  20. Save the package. Figure 12.11 shows the package design in progress.

    Figure 12.11. The package contains a flat file source, a derived column transformation, an OLE DB destination for successful rows, and a flat file destination for error rows.

    image

  21. In case you need to deploy this package to a remote machine, you need to define the file path for the Sales data directory and the destination SQL Server database connection when the package is loaded and the solution is to configure these properties in the configuration file. To configure the properties, right-click anywhere on the Control Flow window and select Package Configurations. Check the Enable Package Configurations box to enable to configurations. Click the Add button, and the Package Configuration Wizard starts. Click Next on the Welcome screen.
  22. On the Select Configuration Type screen, select XML Configuration File type and specify the configuration file name. Click Next.
  23. On the Select Properties to Export screen, for the EnumeratorProperties for the Foreach Loop Container task, select the Directory property. In the connection manager properties, select ServerName and Initial Catalog. Provide a name for the configuration and click Finish and then close the Package Configuration Organizer dialog. Figure 12.12 shows using the SSIS Package Configuration Wizard to export properties to a configuration file. Save the package.

    Figure 12.12. The Package Configuration Wizard can be used to export properties to a configuration file.

    image

Let’s now create another SSIS package to load product data from flat files to the table:

  1. Open Solution Explorer, right-click SSIS Packages, and click New SSIS Package.
  2. As with the Load Sales data package, use a Foreach Loop container to loop through the product directory and select the flat files to load. The initial steps for this package are similar to those of earlier package to load the sales data. Therefore, repeat the steps 3 through 11 from the earlier exercise to load the sales data, but specify the product directory and select a product flat file for the flat file connection.
  3. Because no transformation needs to be applied to the data, use the Bulk Insert task to load the data to the destination SQL Server machine. Drag and drop the Bulk Insert task to the Foreach Loop container. Double-click the Bulk Insert task, select the Connection page, and specify the source and destination connection details as shown in Figure 12.13. Remember to change ColumnDelimiter to comma (,) instead of a tab.

    Figure 12.13. The Bulk Insert task provides the quickest way to copy large amounts of data into a SQL Server table or view.

    image

Figure 12.14 shows how this package looks like in the designer.

Figure 12.14. A Foreach Loop Container, a Script task, and a Bulk Insert task are used inside a package to bulk load data from files into SQL Server.

image

To execute these packages in parallel, you need to combine both the packages into a main package, like this:

  1. Open Solution Explorer, right-click SSIS Packages, and click New SSIS Package.
  2. Drag and drop an Execute Package task from the Toolbox to the Control Flow designer. Double-click the Execute Package task and select the Package page. Specify Location as File System and create a new connection that points to LoadSalesData.dtsx.
  3. Repeat step 2 to add one more Execute Package task to the Control Flow window and point it to the LoadProductData.dtsx package.
  4. To inform the administrator before the package is started, you need to use the Send Mail task. Drag and drop the Send Mail task to the Control Flow window. Double-click the Send Mail task and set the properties as shown in Figure 12.15.

    Figure 12.15. The Send Mail task is used to send an email message when the package starts.

    image

  5. Set the precedence constraints from the Send Mail task on success to execute the Load Sales data and Load Product data packages in parallel. Figure 12.16 shows what the main package looks like.

    Figure 12.16. The main package uses the Execute Package task to run two other packages in parallel to load sales and product data, after sending an email using the Send Mail task.

    image

  6. Save the package. Right-click the package in Solution Explorer and select Set as StartUp Object.
  7. To monitor execution, it is a good idea to log events and monitor the progress. To log the execution results, right-click the Control Flow window and select Logging. The Configure SSIS Logs window opens.
  8. In the Configure SSIS Logs window, configure the events for each of the tasks as required (see Figure 12.17). You can also select the column details to be included while capturing the log events. To configure the columns, click the Advanced button on the Configure SSIS Logs window.

    Figure 12.17. The Configure SSIS Logs dialog allows you to create and configure a new log to capture log-enabled events that occur at runtime.

    image

  9. Select Debug | Start to start the package execution in debug mode. When the package is executing, based on the color of the tasks, you can determine whether the package execution is successful or fails. As shown in Figure 12.18, the two packages are executed in parallel.

    Figure 12.18. The Load Sales Data and Load Product Data packages are executed in parallel after an email message is sent.

    image

  10. In the Data Flow task, you can create data viewers on the data flow path to monitor the output rows. Open the LoadSalesData.dtsx package and select the Data Flow tab. Right-click the data path (that is, the green arrow) from Derived Column to OLE DB Destination and select Data Viewers, and add a new grid viewer. When the package is executing, SSIS shows a grid of rows being copied to the destination, as shown in Figure 12.19.

    Figure 12.19. SSIS provides a powerful debugging capability called data viewers that can be used to display data between two data flow components during package execution.

    image

  11. To check whether the error rows are written to the error log output file, change the dates in one of the sales data file such that it cannot be converted to the date data type and execute the package again. Before executing the package again, truncate the rows from the Sales and Products table.

    If the package is executed successfully, there will be green color-coding on all the tasks, indicating successful execution.

The World of Wizards

As in previous releases of SQL Server, if you do not want to use the package designer, you can simply step through the Import/Export Wizard to create an SSIS package. The following sections discuss the Import/Export Wizard, the SSIS Configuration Wizard, the Package Installer Wizard, and the SSIS Migration Wizard.

The SSIS Import/Export Wizard

The SSIS Import/Export Wizard guides you through the steps to copy data between different data sources. It guides you to connect to the source and destination and allows different properties, such as copying constraints and copying data results from SQL query. You can access this wizard from BIDS, and the command prompt. To launch the wizard from BIDS, you open the Solution Explorer window, right-click the SSIS Packages folder, and select the Import and Export Wizard menu item. To launch the Import/Export Wizard from the command prompt, you run DTSWizard.exe.

This wizard allows you to select the source and destination data sources. The data sources supported are Microsoft SQL Server, flat files, Microsoft Office Access, Microsoft Office Excel, and other OLE DB providers. You are prompted to copy all the data or a portion of the data, based on a user-defined query from one or more tables or view. The wizard gives you the option to either execute the package immediately or save the package. The package can be saved either in the msdb database in SQL Server storage or in the XML format, as .dtsx file storage.

The SSIS Configuration Wizard

Configurations are loaded at the start of process runtime to dynamically set the values of variables and object properties. This feature is useful if the same package has to be deployed on different servers and you want to assign the properties automatically when the package starts running. The SSIS Configuration Wizard guides you in configuring the configurations to be deployed. To launch the SSIS Configuration Wizard from BIDS, you open the package in the designer and then select SSIS | Package Configurations to launch the Package Configurations Organizer. Then you check the Enable Package Configurations check box and click Add to start the Configuration Wizard.

On the first page of the wizard, you are prompted to select the configuration type. The different types of configurations available are the XML configuration type, environmental variables, registry entries, parent package variables, INI files, and SQL Server.

On the Select Objects to Export page, you are prompted to include the package objects to set the properties. Based on the configuration properties, the configuration file is created. You can edit the file to edit the properties set, if required.

The SSIS Migration Wizard

The SSIS Migration Wizard allows migration of existing DTS packages in SQL Server 2000 to SSIS in SQL Server 2005. In SQL Server 2000, the packages could be stored in structured storage, SQL Server databases, and the metadata services. Though the format of the package storage has changed in SSIS (files are now stored in XML format on the disk drive), the wizard allows you to migrate and store these packages to the disk by using the .dtsx XML format or to SQL Server databases. Not all DTS 2000 packages can be migrated to SSIS because some of the tasks are not available in SSIS or have been changed significantly. Even the DTS object model has changed, so packages created using the DTS object model cannot be migrated. Before looking at Migration Wizard details, let’s discuss what can and cannot be migrated to SSIS.

Migrating DTS 2000 Packages to SSIS

SSIS provides the option to migrate your existing DTS 2000 packages to SSIS either manually (by re-creating the packages from scratch) or by using the SSIS Migration Wizard. Due to the changes in the SSIS architecture, not all the tasks can be migrated to SSIS. Also, some of the tasks have been removed from SSIS, and some of the tasks have changed, making migration a possibility in only some cases. You can also choose to leave a package as it is in DTS 2000, and can use the Execute DTS 2000 Package task to execute it in SSIS.


Note

The DTS 2000 runtime engine must be installed on the computer to run DTS packages by using the Execute DTS 2000 Package task.


The following tasks from DTS 2000 cannot be mapped to SQL Server 2005 tasks:

• Custom tasks—because the DTS object model and architecture have changed in SSIS

• Data Pump tasks such as the Data Driven Query task, the Transform Data task, and the Parallel Data Pump task

• SQL Server tasks such as Copy SQL Server Object, Transfer Databases, Transfer Jobs, Transfer Master Stored Procedure, and Transfer Logins

• Analysis Services tasks such as Analysis Services Processing and Data Mining Prediction

• Other tasks, such as Dynamic Properties and Transfer Error Messages

• Copy Database Wizard tasks

The tasks that can migrate without any issues include the following:

• The Execute SQL task

• The Bulk Insert task

• The File Transfer Protocol task

• The Execute Process task

• The Send Mail task

• The Message Queue task

Although there is an equivalent ActiveX script task in SSIS, an ActiveX script that contains code based on the old DTS 2000 object model cannot migrate because the new SSIS object model has changed. In addition, if the script contains or refers to global variables, the task cannot be migrated successfully. The tasks that cannot be mapped to SSIS are encapsulated into a SQL Server package task by the SSIS Migration Wizard and executed by the Execute DTS 2000 Package task.

Although package passwords are not migrated with the packages to the SSIS format, the SQL Server Package task maintains its password. The Execute DTS 2000 Package task contains the property to specify the passwords.

In DTS 2000, there was no direct method to implement looping; the user could hack into the DTS object model to work around this. If you have implemented DTS 2000 packages with looping, it is better if you re-design those packages to make use of the new looping container in SSIS.

Elements such as precedence constraints are migrated easily to their equivalent SSIS elements. After migration, a precedence constraint can be extended to include or overwrite the result of the precedence with conditions under which the task can run or to combine multiple precedence constraints on a single condition.

In case of connections, the connections associated with tasks that can be migrated are also migrated to the equivalent OLE DB or ODBC connection manager, and properties that are not set in the connection manager are set to their defaults. For tasks that cannot be migrated, such as the Transform Data task, the connections remain part of the intermediate SQL Server package.

DTS 2000 does not define scope for variables, but it considers all the variables as global variables with the package as the scope. All the global variables can be migrated to SSIS and are added to the variables collection of the package in the User namespace. You can extend the functionality of these variables by creating additional namespaces, using expressions to set variable values, or raising events based on a variable’s value.

Migration of transactions to SSIS is supported, but because of the difference in how the transactions are defined in DTS 2000 and SSIS, the successful migration of transactions depends on certain conditions. In DTS 2000, the transaction is tied to steps, which are the combination of the task and its associated precedence constraint. There was no concept of containers in DTS 2000. In SQL Server 2005, the transactions in the package scope follow the container hierarchy. These are the rules that govern transaction migration:

• If the step in the DTS 2000 package is configured to join a transaction, the transaction attribute on the migrated task is configured to support a transaction.

• If the DTS 2000 package is configured to use transactions, the transaction attribute on the migrated package is configured to require a transaction.

• If the DTS 2000 package does not use transactions, then the transaction attribute on the migrated task is configured to support a transaction.

Considering the architectural and performance enhancements in SSIS, the introduction of new tasks and containers, and the fact that not all DTS 2000 tasks can be successfully migrated to SSIS, it is recommended that you consider re-designing old packages from SQL Server 2000 to the SSIS model instead of migrating the old packages.

If you decide to migrate old packages to SSIS instead of re-designing them, the easiest way to do so is to use the SSIS Migration Wizard, as discussed in the next section.

Using the SSIS Migration Wizard

To start the SSIS Migration Wizard, you right-click the SSIS Packages folder in the Solution Explorer in BIDS and select Migrate DTS 2000 Package. Alternatively, you can run dtsmigrationwizard.exe.

The wizard guides you through the following steps in order to migrate the package:

  1. Select package source—The wizard allows migration of packages stored in structured storage files, SQL Server databases, and the SQL Server metadata services. Packages saved in Visual Basic cannot be migrated. For packages saved in structured storage files, you can migrate some or all of the packages and versions saved in the file.
  2. Select package destination—The packages can be migrated either to XML format as .dtsx files or to SQL Server 2005. You need to provide proper authentication and should have appropriate priveleges in order for the wizard to save the package to the choosen storage destination.
  3. Select packages to migrate—The wizard prompts you to select the packages to migrate to SSIS. You can either choose to change the package names in the wizard or to do that after the migration, in BIDS or SQL Server Management Studio. During the package migration, you are prompted to supply the password if the packages are password protected.
  4. Specify Log file—The wizard prompts you to specify a log file location. After running the SSIS Migration Wizard, you should always review this log file so that you can understand any warnings or errors that are encountered.

Based on the parameters you provide in the wizard, the SSIS Migration Wizard tries to migrate the packages to SSIS. The wizard displays the migration progress for each of the packages selected for migration. If the wizard is unable to migrate a package, it prompts you to either continue migrating the other packages or end the migration process.

As discussed earlier, not all the tasks in DTS 2000 packages can be migrated. For tasks that cannot be migrated to SSIS, such as Data Pump tasks, custom tasks, Transfer Database Object tasks, and so on, a SQL Server 2000 package is created to maintain the SQL Server 2000 format. The SQL Server 2000 package is then executed by the Execute DTS 2000 Package task. But a package that contains such tasks cannot be migrated to SSIS as a single package. A SQL Server 2000 package is created for each task, and each of these tasks executed by the Execute DTS 2000 Package task is encapsulated within the main package. For example, if there are two Data Pump tasks in a package, each task is converted into equivalent SQL Server 2000 package, resulting in three packages. Inside the main package, these two packages are called by the Execute DTS 2000 Package task to execute these packages.

The SSIS Migration Wizard issues a warning if the package choosen for migration contains the Dynamic Properties task. If you choose to migrate the package, the task is replaced by a Script task, which lists the property–value pair defined in the Dynamic Properties task, and this script is commented out.T

SSIS and Data Warehousing

Extraction, transformation, and loading (ETL) is a process that is common to data warehousing. It includes the process of extracting data from various data sources throughout an organization and manipulating it to conform to the standards of the data warehouse prior to populating the relational data warehouse. The manipulation falls into two basic categories:

Cleansing or scrubbing—This involves removing or correcting incorrect, incomplete, or duplicate data.

Transforming—This involves conforming data, translating data types, and/or looking up surrogate keys. In most cases, the data conformation and data type translation are done implicitly in the process. For instance, a dimension table may contain all possible business keys for a dimension member. The state of Alabama may be represented as AL in the Human Resources system and as 01 in the Financial system. Both should be represented in the dimension member, but the foreign key would be an unrelated surrogate key.

This chapter focuses on the T (transformation) step in ETL because it is traditionally the most challenging and low-performing part of the ETL process. In fact, in SQL Server 2000’s DTS, ETL was often referred to as ELT (or more appropriately, ELTL). It was far more efficient to bulk-load the data extracted from the source into a staging area, perform the transformation, and then load it into the destination star schema format. Fact table transformations, in their purest form, consisted of dimension surrogate key lookups based on business keys. The most useful means of doing this was by using T-SQL to perform an INSERT from the fact table joined to the various dimension tables using the business keys. It was possible to use DTS’s Data Transformation task for this purpose and use the Lookup feature within that task. This is the equivalent of using many SELECT statements to perform the table lookup for each dimension of each fact table row!

SSIS ETL Concepts

SSIS includes the data adapters and pipelines for dealing with ETL. A source data adapter provides access to an external data source and converts the native data types to SSIS data types. The pipeline is an in-memory series of tasks that operate on the data from the source to transform it to the desired values and format. The destination data adapter then converts data from the SSIS data types to the destination data types and sends the data to the connection string identified therein. The key to transforming data in SSIS is the fact that it occurs in memory. This results in increased performance, and tasks such as Lookup Table Reference, which really slowed down the ETL process, now occur in a few seconds. This is because the lookup tables are each referenced once, causing all the data to be pulled into memory, where subsequent lookups are performed. The resulting performance improvement leads back to the pure ETL process and potentially eliminates the need for an intermediate staging area.

What impact does this have on memory on the SSIS server? This is an important question to consider. Memory-intensive tasks have advanced settings that can moderate these effects. Memory availability is something to consider with this type of processing, though.

Data Source Views

Data source views (DSVs) are another new component of SQL Server 2005. They provide a means of defining particular tables of interest in a potentially complex underlying data model. In the ETL process, this can be used, for example, to focus on one subject area. Table drop-down lists in the SSIS designers can be simplified in this way, displaying only the tables defined in the DSV. This also has a positive performance impact on the time it takes to populate these dialog boxes. DSVs provide a layer of abstraction between the SSIS packages and the actual underlying data sources. Therefore, they can be used to provide disconnected package development capabilities. DSVs are defined within a BIDS solution and can be shared between SSIS and Analysis Services projects in the same solution.

In addition, you can use the Import and Export Wizard as explained in the “World of Wizards” section to transfer data from one data source to another.

Summary

SQL Server replaces DTS by introducing a revolutionary, completely redesigned ETL and integration platform, called SSIS. This platform facilitates the first step in the business intelligence life cycle: integrating, cleansing, transforming, validating, and aggregating data from various sources.

This chapter introduces you to SSIS concepts by discussing enhancements introduced in SSIS, SSIS architecture, and the SSIS toolset. BIDS and other command-line utilities are discussed, as is the step-by-step process to design, execute, and log SSIS packages. The final sections in the chapter talk about SSIS wizards, migration issues, and SSIS’s role in data warehousing.

Chapter 13, “What’s New in SQL Server Analysis Services 2005,” focuses on the second step in business intelligence solution development—analysis—and discusses Analysis Services 2005 in great detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset