Here comes the attribute. In a star schema, we typically find a dimension table with a code and a description: these two columns are the basic components of every attribute.
Thinking about this table, for example, DimProduct
: what is the most important thing that we look for in a table? Yes, the primary key. Without primary keys, we're in for trouble. It may sound too old style, but we desperately need a unique way to identify a single instance of an attribute.
I know, nowadays we have a lot of different technologies to represent information, such as NoSQL, unstructured, and XML data among others. But, if we need to analyze how many women's tights were sold last year, we need to know which women's tights we are talking about. Hence, the need for a primary key, name it a code, an ID, or anything else.
Said that, first of all, we identify the unique ID for a single instance of an attribute: in the case of DimProduct
it is ProductKey
. If you look at the table structure, you'll see another key (ProductAlternateKey
), usually referred to as natural key.
This may lead to dangerous mistakes if there are more records with the same natural key; see DimEmployee
as an example.
Then we search for a column representing a suitable description or a name. This is what the user will see, for example, EnglishProductName
.
We are creating our first attribute:
Product
and click on Save. Great: one more thing, press Ctrl + U to update the schema and we're done. Congratulations, you've just created your first attribute.It may be confusing at the beginning: the attributes have forms and the forms are made of expressions.
An attribute form is just a representation, a means for displaying that element. We can have several different forms: ID and DESC are just two very frequently used. If you think about the Customer attribute we may have other different representations like the e-mail, or the Social Security Number (SSN) or the complete name with salutation, and so on. Remember that everything that we design here will eventually come up into a SQL statement. More specifically, here we have defined what appears in the SELECT clause and the FROM clause:
Looking at the screen capture, there is a left part (FROM) with a list of available tables and columns, and there's a right part (SELECT) where you craft the form expression. You can use column names, constants, as well as functions here. Everything will be converted into: SELECT <Form expression> FROM <Source table>
.
ID forms are so important not only for their uniqueness, but also because they will be the joining mechanism between tables. That's why we use primary keys, because somewhere (hopefully) there might be a foreign key pointing at it; for example, in the FactInternetSales
table there is a ProductKey
column referring to our Product attribute.
Check the successful outcome of this recipe by going into the Data Explorer - COOKBOOK folder and expanding System Hierarchy. You will find your brand new Product attribute. Double-click on it in the right pane to show a list of products sorted by description. Notice that the ID does not appear by default in the data explorer, and in every report only the description is displayed. More on this later.