CHAPTER 12

Data Lineage

Need to know the source?

Data Lineage is how,

Let’s connect the dots

The Data Lineage feature of ER/Studio enables you to document the movement of data from point A to point B (and any intermediate steps in between). Points A and B can be anything from flat files, data models, databases such as Access, Teradata, Oracle and DB2, XML, databases, and Excel worksheets. This movement is sometimes referred to as Extraction, Transformation and Load (ETL), or source to target mapping. In addition to functionality within ER/Studio for documenting and view mappings, you can also use ER/Studio Data Lineage, which is a separate tool in the ER/Studio XE family, to explore existing or proposed ETL mappings to quickly and accurately perform impact analysis.

Figure 12.1 contains a common data movement process for data warehousing.

Figure 12.1 Data warehouse data movement process

In this illustration, the data is sourced from various systems on the left and fed to a data warehouse that stores the data in a format that is more conducive to reporting. This reduces the amount of overhead on the source systems so resources are not used for reporting directly on them. The data also must be cleansed to ensure the quality of the data used for reporting, multi-dimensional cubes, data mining applications, etc., which are targeted for specific audiences and purposes. An organization using ER/Studio would hopefully have a model of each source application such as Order Entry, the enterprise data warehouse, and the different reporting applications.

There needs to be a data mapping (also known as “data lineage”) that “connects the dots” from each source system to the enterprise data warehouse, and then from the enterprise data warehouse to each reporting application. This mapping document is provided to the ETL developer as their requirements document so they can develop the code necessary to implement the mapping.

In ER/Studio, you can document and view data lineage using the Data Lineage tab in Model Explorer. You can create a visualization of the data movement and transformation, so you can see the relationships between the source and target, how the data flows from one table to another, and how the data is transformed. You can further expand the data lineage documentation in the Table Editor and in the Table Column Editor.

We’ll first talk about using the Data Lineage tab, and then cover the Table Editor and the Table Column Editor

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset