Chapter 10. Importing, Exporting, and Transforming Data

Whether you need to move data from a legacy system to a new system permanently or you want to continually move data back and forth for data warehousing, Microsoft SQL Server 2005 Integration Services (SSIS) should be your first choice. With Integration Services, you have access to an extraction, transformation, and loading (ETL) platform that can be fully customized for your specific application and is optimized for high-performance data movement and transformation. You can use Integration Services to copy and transform data to or from almost any data source, including flat file, OLE DB, and ODBC data sources. In addition, BULK COPY (BCP) remains available in SQL Server 2005 as a basic means of importing and exporting data.

Working with Integration Services

As an administrator, the Integration Services tasks you will most often perform include:

  • Installing the Integration Services components using SQL Server 2005 Setup.

  • Using the SQL Server Import And Export Wizard to move data.

  • Upgrading DTS 2000 to SQL Server 2005 Integration Services.

  • Managing or migrating existing DTS 2000 Packages to Integration Services as necessary.

  • Creating and managing Integration Services packages using Business Intelligence Development Studio.

  • Running Integration Services packages using Business Intelligence Development Studio, SQL Server Management Studio, or the dtexec command-line utility.

Before you try to perform any of these tasks, you should know how the Integration Services feature works and how it is used. Once you are familiar with Integration Services, its tools, and its structures, you will be better prepared to manage its components.

Getting Started with Integration Services

Integration Services are designed to move data accurately and efficiently as well as to convert or transform data between heterogeneous data sources. You can use Integration Services when you want to perform any of the following tasks:

  • Move data between heterogeneous systems, such as from Oracle to SQL Server or from SQL Server to Oracle

  • Move data between SQL Servers, including primary and foreign keys

  • Move data from Microsoft Access or Microsoft Excel to SQL Server or from SQL Server to Access or Excel

  • Extract data; transform the data by performing column mappings, filling in missing values, and so on; and then import the data on the destination system

  • Copy views from one database to another

Although SQL Server 2005 supports existing Data Transformation Services (DTS) packages and provides an upgrade/migration path for those packages, Integration Services are the functional replacement for DTS. The architecture of Integration Services is very different from that of DTS. With DTS, the workflow controls and the data movement were all managed through a single component: the DTS engine. The Integration Services process separates workflow controls and data movement using two separate components:

  • Integration Services runtime engine. Stores package layout, executes packages, controls workflow between tasks, and provides other essential runtime services.

  • Integration Services data flow engine. Manages data movement and transformation and supports multiple sources, multiple transformations, and multiple destinations.

Integration Services has an extensible object model that includes a runtime API and data flow API that supports the Microsoft .NET Framework. These APIs allow developers to extend and customize the Integration Services object model. Custom extensions can be developed for tasks, log providers, connection managers, data flow components, and more.

Integration Services Tools

The primary tools for working with Integration Services are the Business Intelligence Development Studio and SQL Server Management Studio. Business Intelligence Development Studio is used to build data transformation solutions, and SQL Server Management Studio is used to manage Integration Services packages. Within Business Intelligence Development Studio, you can access Integration Services Designer, the graphical tool for creating Integration Services packages.

The SQL Server Import And Export Wizard is the new face of the old DTS Import/Export Wizard. The wizard has been updated to support Integration Services and has been extended to provide better support for data in flat files and for real-time preview of data. Integration Services packages created by using the Import And Export Wizard can be opened in Business Intelligence Development Studio and then can be extended using Integration Services Designer.

Tip

Tip

The SQL Server Import And Export Wizard can run the import/export process between any of the available data sources; you do not have to set SQL Server as either the source or the destination. For example, you can use the DTS Import/Export Wizard to copy data from a text file to an Excel spreadsheet.

As with DTS, Integration Services packages are stored in either the msdb database or in the file system. The Integration Services Service is responsible for managing package storage. You can manage packages from the command line using the dtutil command-line utility. You can use dtutil to copy, move, sign, and delete packages. To run packages, you can use Business Intelligence Development Studio or SQL Server Management Studio to access the Execute Package Utility (dtexecui). The command-line counterpart is the dtexec command-line utility.

Integration Services includes the Package Configuration Wizard to assist with configuration management. By running the Integration Services package deployment utility from within Business Intelligence Development Studio, you can install packages to the msdb database in an instance of SQL Server 2005 or to the file system. The deployment utility automatically detects and includes all package dependencies, making it easier to deploy packages.

SQL Server 2005 includes tools for managing and migrating DTS packages from previous versions of SQL Server. When you connect to a server in SQL Server Management Studio’s Object Explorer, you can expand the DTS 2000 Packages node to list available DTS 2000 packages. You can edit or execute these packages and migrate the packages to the Integration Services format.

Integration Services and Data Providers

Data providers are a key part of Integration Services. Without these data providers, you would not be able to communicate with other systems. SQL Server includes .NET Framework, OLE DB, and ODBC data providers for the following:

  • SQL Server

  • Oracle

  • Microsoft Access and Excel

  • Microsoft Analysis Services

  • Microsoft Data Mining Services

  • Microsoft Internet Publishing

  • SQLXML

  • Text files

The text file driver is the all-purpose driver for import and export procedures. If you do not have a native provider for your legacy database and you cannot use the generic ODBC providers, you can usually export your data to a text file and then import it into SQL Server. You can go from SQL Server to a legacy system using the same technique.

Integration Services Packages

The SQL Server Import And Export Wizard is the fastest, easiest way to move data between systems. You use the SQL Server Import And Export Wizard to create basic Integration Services packages, which you can later view or modify using Integration Services Designer. Packages are simply sets of tasks for importing, transforming, and exporting data that you can reuse or schedule to run as often as needed. Packages can be:

  • Stored in the msdb database on a local or remote server.

  • Shared through SQL Server Meta Data Services.

  • Saved to the filesystem in DTSX files, which is useful when you want to copy, move, and e-mail packages to another location.

Note

Note

Integration Services does not support storage in Visual Basic files. DTS 2000 packages that are stored in Visual Basic files cannot be migrated to SQL Server 2005 Integration Services.

You execute packages directly from SQL Server Management Studio or Business Intelligence Development Studio. You can also execute packages from the command prompt using the dtexec command-line utility. Within Integration Services packages, you will find the following features:

  • Connections. Store information about the source or destination of data. In a connection, you specify the data provider to use (such as the Microsoft OLE DB Data Provider for SQL Server), the server to which you want to connect, the login to use for the connection, and the database to work with for the import/export operation. In Integration Services Designer, you select connections using the Data menu.

  • Tasks. Set the operations that need to be performed within the package. Tasks can consist of ActiveX scripts, SQL scripts, SQL queries, commands to transfer SQL Server objects, data-driven queries, bulk insert commands, and external processes to execute. You can even have Integration Services send e-mail when a package completes.

  • Workflow containers. Set when and how a particular task should be executed, such as on completion, on failure, or on success. For example, you could schedule a task that sends e-mail on failure or on success.

  • Control flow procedures. A control flow consists of one or more tasks and containers that execute sequentially or in parallel when the package runs. Precedence constraints connect the package’s tasks and containers and define the conditions for running the next task or container in the package control flow. Tasks and containers can also be grouped in a loop and run repeatedly as a unit within the package control flow.

  • Data flow procedures. Set the step-by-step transformation process for the data. Before you can add a data flow to a package, the package control flow must include a Data Flow task that is responsible for running the data flow. A data flow consists of the source and destination adapters that extract and load data, the transformations that modify and extend data, and the paths that link adapters and transformations.

You can store an Integration Services package on any SQL Server, and you do not need to create or store it on the source or destination server associated with the package. If you are editing, modifying, scheduling, or just viewing an Integration Services package, you need to use the user account of the package owner or an account that operates under the sysadmin role on the SQL Server on which the package is actually stored.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset