Parent-child relationship I

Attributes can be related to each other (like in the case of city, province, and country) to form relationships between them. Furthermore, the tables holding attribute data can be normalized, as in the DimProduct, DimProductSubcategory, and DimProductCategory or denormalized as in the DimGeography table where we have province and country names repeated in every row. It is important that foreign keys are in place to preserve data integrity and help the RDBMS use the less expensive query path. So, for example, we have in the DimProductSubcategory table a foreign key pointing to the primary key in the DimProductCategory table.

If you look at the content of the DimProduct table, you'll see some records with NULL values in the DimProductSubcategory column; there are no sales transactions for those products, but this is a very common data quality problem. Missing values in foreign keys is something we usually have to deal with. If those products had sales movements, it would be difficult to aggregate them at the subcategory or category level.

Every developer has his/her silver bullet to solve this, I personally don't like NULL values in FK and always try to set them to -1 (whenever possible) creating a correspondent -1 value in the parent table with description n/a. Why -1? Because it is very very very unlikely that another code exists with the same negative value.

Note

I've seen people setting NULL values to 9999, to later discover that the company had more than 10,000 products and the 9999 code was actually allocated; then I saw the same people UPDATE millions of rows to 99999 WHERE key = 9999. To cut the story short, they ended up using an impressive Z99999 modifying an unspecified quantity of table columns from number to varchar… just in case, you never know.

Getting ready

You need to have completed the previous recipe to continue.

We are creating the Country, StateProvince, and City attributes and relating them with parent-child relationships.

How to do it...

Follow these steps:

  1. In the Attributes folder, create a new attribute with these forms:
    • ID: GeographyKey from DimGeography
    • DESC: ConcatBlank(City, PostalCode) from DimGeography
  2. Save it with the name City.
  3. In the same folder, create a new attribute:
    • ID: StateProvinceCode from DimGeography
    • DESC: StateProvinceName from DimGeography
  4. In the Attribute Editor, before saving this, click on the Children tab on the top-left then click on Add and from Child candidates, move City to the right of the shopping cart and hit OK.
  5. Now click on Save and Close and name it StateProvince.
  6. Create a new attribute:
    • ID: CountryRegionCode from DimGeography
    • DESC: EnglishCountryRegionName from DimGeography
  7. Before saving, select the Children tab and add both StateProvince and City, hit OK, then Save and Close and name it Country.
  8. Update the schema.

How it works...

Since the three attributes come from the same denormalized table we do not need FKs: the data for parent and children will be selected from the same DimGeography. Nevertheless, we need to specify which attribute is the parent and which is the child. Sometimes it is not so obvious and we need to do some research to detect the one part in a one-to-many relationship. As a rule of thumb:

  • In case of normalized tables: The table with the primary key is parent, the table with the foreign key is child (no keys? Too bad, see the next point)
  • In case of denormalized tables: An attribute with low cardinality is more likely to be parent, an attribute with high cardinality is probably child.

    When it's not clear at a first look, do some SELECT COUNT(DISTINCT <column_name>) on the columns likely to be part of the relationship.

There's more...

If you go to Data Explorer | System Hierarchy, you can browse the elements of the three attributes and see if the structure makes sense. Expand Country, select one country and expand StateProvince, click on one and browse through the elements inside the City attribute. Is Paris in France and London in the United Kingdom? Good.

Some cities are repeated, because the granularity is postal code.

Note

You can watch a screencast of this operation at:

Exercise 5

Repeat the recipe using the DimDate table and create the following attributes:

  • Date: (ID = DateKey and DESC = FullDateAlternateKey)
  • Month: (ID = CalendarYearMonth and no DESC)
  • Year: (ID = CalendarYear and no DESC)

The Month attribute has Date as child, and Year has both Month and Date as children. If you receive a warning message saying that ID for Month is Text, don't worry, it will work as expected. The curious thing is that the year 2005 begins in July, and the year 2010 only has November (I suppose it is intentional).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset