At the core of a data warehouse, there are facts and dimensions. They can be organized in star schemas, snowflake schemas; they can be more or less complex, and to some extent undocumented. This is not a book about dimensional modeling and discussing how a database layout works better than other is out of scope. Some common sense rules always apply: consistent column naming can be useful (user_id
, iduser
, user_code
, key_user
, yetanotheruserid
, you name it…), constraints on tables help to pull the strings and find a way into unexplored databases. NOT NULL
fields also come in handy.
I personally had my share of good and bad data. I always remember a development team manager who once asked me, "Why do you need primary keys, anyway?" I looked around and thought, "This must be a candid camera…"
If you are drowning in a very complex DB and want some relief, you can go to http://at5.us/Ch2U2 and read about the Directive 595, it may be a real story after all.
In the course of this book, I will use several editors that MicroStrategy Desktop offers. It is also worth mentioning that there is another way to add tables and create objects, with a tool named Architect; it has the same capabilities and a more visual interface. You can find instructions about Architect in the product documentation.
First, we create a connection to the data warehouse and then select the source tables:
datawarehouse
.When you select the tables from the left pane list, MicroStrategy reads the definition of all the columns and stores the information inside the metadata, creating the first schema objects (tables). The objects that we see in the Tables folder are the logical representations of the underlying database and hold no data, just information about the columns and the datatype; if the physical table in the data warehouse changes, this information must be updated (more on this later). It is important to select all the tables that are useful for the project, not less, not more. It doesn't make sense to have 500 tables in the project if we only use 10. It just complicates the design and slows down the Desktop application.
While you have the Warehouse Catalog window open:
Notice that, as expected, the fact tables have far more rows than the dimension ones. This is important when you don't have any database documentation and the table names do not make sense. Reading primary and foreign keys from the database helps identifying the parent-child relationship between tables in case you want to use the automatic discovery features of Architect.
Now that you know how to add tables to the metadata, go on and add:
DimDate
DimGeography
DimProduct
, DimProductSubcategory
, and DimProductCategory
DimSalesTerritory
Remember to update the schema.