CHAPTER 4

What are data elements?

Spreadsheets have columns

Just like data elements

Models all around

This chapter defines the concept of a data element and the three different levels at which a data element can exist: subject area, logical, and physical. We also introduce domains and discuss the different types of domain.

A data element is an elementary piece of information of importance to the business, which represents a fact or a definition in an information system, and which may or may not have any eventual existence as a modeled object. In data models, they generally contribute to identifying, describing, or measuring instances of an entity. The data element Claim Number identifies each claim. The data element Student Last Name describes the last name of each student. The data element Gross Sales Value Amount measures the monetary value of a transaction.

Returning to our spreadsheet analogy, the column headings on a spreadsheet are data elements. The cells beneath each column heading are the values for that column heading. Data elements can be thought of as the column headings in a spreadsheet, the fields on a form, or the labels on a report. Ice Cream Flavor Name and Ice Cream Size are examples of data elements from our gelato store. Company Name and Phone Number are examples from my business cards.

As with entities, data elements can exist at subject area, logical, and physical levels. A data element at the subject area level must be a concept both basic and critical to the business. We do not usually think of data elements as subject areas, but depending on the business need, they can be. When I worked for a telecommunications company, Telephone Number was a data element that was so important to the business that it was represented on a number of subject area models.

Within a PowerDesigner Conceptual Data Model, you would create Telephone Number as a Data Item. See Chapter 12 to find out more about PowerDesigner Data Items.

A data element on a logical data model represents a business property. Each data element shown contributes to the business solution and is independent of any technology, including software and hardware. For example, Ice Cream Flavor Name is a logical data element because it has business significance regardless of whether records are kept in a paper file or within the fastest database out there. A data element on a physical data model represents a database column. The logical data element Ice Cream Flavor Name might be represented as the physical data element ICE_CRM_FLVR_NAM or as FLVR_NAM within the table ICE_CRM.

I use the term data element throughout the text for consistency. However, I would recommend using the term that is most comfortable for your audience. For example, a business analyst might prefer the term ‘entity attribute’, ‘attribute, or ‘label’, while a database administrator might prefer the term ‘column’ or ‘field’.

In data modeling, PowerDesigner uses the term ‘Attribute’ as well as ‘Entity Attribute’. For example, an entity property sheet has an ‘Attributes’ tab, and the property sheet for an attribute refers to ‘Entity Attribute’ (we will see this in Figure 12.12). For consistency, we use the term ‘Attribute’ in this book when referring to attributes in models.

PowerDesigner also uses the term ‘Entity Attribute’ on menus and in various dialogs, such as defining list reports or dependency matrices, and extending the underlying model.

The complete set of all possible values that a data element may have is called a domain. A domain is a set of validation criteria that can be applied to more than one data element; it provides a means of standardizing the characteristics of the data elements. For example, the domain ‘Date’, which contains all possible valid dates, can be assigned to any of these data elements:

·         Employee Hire Date

·         Order Entry Date

·         Claim Submit Date

·         Course Start Date

A data element must never contain values outside of its assigned domain. The domain values are defined by specifying the actual list of values or a set of rules. Employee Gender Code, for example, may be limited to the domain of (female, male). Employee Hire Date may initially be assigned the rule that its domain contain only valid dates, for example. Therefore, this may include values such as:

·         February 15th, 2005

·         25 January 1910

·         20030410

·         March 10th, 2050

Because Employee Hire Date is limited to valid dates, it does not include February 30th, for example. We can restrict a domain with additional rules. For example, by restricting the Employee Hire Date domain to dates earlier than today’s date, we would eliminate March 10th, 2050. By restricting Employee Hire Date to YYYYMMDD (that is, year, month, and day concatenated), we would eliminate all the examples given except for 20030410. Another way of refining this set of values is to restrict the domain of Employee Hire Date to dates that fall on a Monday, Tuesday, Wednesday, Thursday, or Friday (that is, the typical workweek).

In our example of the business card, Contact Name may have thousands or millions of values. The values from our four sample cards in Figure 1.2 are:

·         Steve Hoberman

·         Steve

·         Jenn

·         Bill Smith

·         Jon Smith

This name domain may need a bit of refining. It may be necessary to clarify whether a valid domain value is composed of both a first and last name, such as ‘Steve Hoberman’, or just a first name, such as ‘Steve’. Could this domain contain company names such as ‘IBM’, as well? Could this domain contain numbers instead of just letters, such as the name R2D2 from the movie Star Wars? Could this domain contain special characters, such as the name ®, representing The Artist Formerly Known as Prince (the musician Prince changed his name to this unpronounceable Love Symbol in 1993). If we decide that the ‘First Name’ and ‘Last Name’ of a person are separate data elements, we may decide to create two separate domains for them, called ‘Person First Name’, and ‘Person Last Name’. We would probably then require more domains, such as ‘Person Full Name’. Note that the domain names are qualified by ‘Person’, indicating that they do not apply to anything that is not a Person; more ‘name’ domains are required. There are three different types of domains:

·         Format. Format domains specify the standard types of data one can have in a database. For example, Integer, Character(30), and Date are all format domains. The format domain for Ice Cream Size might be Character(15), meaning a particular Ice Cream Size can contain any sequence of characters and be at most 15 characters in length.

·         List. List domains are similar to a drop-down list. They contain a finite set of values from which to choose. List domains are refinements of format domains. The format domain for Order Status Code might be Character(10). This domain can be further defined through a list domain of possible values {Open, Shipped, Closed, Returned}. The list domain for Ice Cream Size would be {one scoop, two scoops, three scoops}.

·         Range. Range domains allow all values that are between a minimum and maximum value. For example, Order Delivery Date must be between Today’s Date and three months in the future. As with list domains, range domains are a refined version of a format domain.

 

Data modeling tools generally regard a Domain as a single type of object, and a given Domain, such as ‘Amount’, could carry out the role of all three Domain types listed above. In addition, Domains in tools may provide the default optionality and description for data elements based on the domain. It is also common for the tool to allow you to prevent or allow data elements from diverging from the properties of the domain. For example, you could force data elements to have the same data type as their domain, and allow them to override the domain optionality and alter the default description. Domains are very useful for a number of reasons:

·         Improves data quality by checking against a domain before inserting data. This is the primary reason for having a domain. By limiting the possible values of a data element, the chances of bad data getting into the database are reduced. For example, if every data element that represents money is assigned the Amount domain, consisting of all decimal numbers up to 15 digits in length including two digits after the decimal point, then there is a good chance that each of these data elements actually do contain currency. Gross Sales Value Amount, which is assigned the amount domain, would not allow the value ‘R2D2’ to be added.

·         The data model communicates even more. When we display domains on a data model, the data model communicates that a particular data element has the properties of a particular domain, so the data model becomes a more comprehensive communication tool. We learn, for example, that Gross Sales Value Amount, Net Sales Value Amount, and List Price Amount all share the Amount domain and, therefore, share properties such that their valid values are limited to currency.

·         Greater efficiency in building new models and maintaining existing models. When a data modeler embarks on a project, she can use a standard set of domains, thereby saving time by not reinventing the wheel. Any new data element that ends in Amount, for example, would be associated with the standard Amount domain, saving analysis and design time.

For an excellent introduction to and a detailed discussion of attribute domains, I suggest that you read Chapter 4 in Data Quality Assessment (Maydanchik, 2007).

What is the most appropriate domain for each of the three data elements below?

·         Email Address

·         Gross Sales Value Amount

·         Country Code

 

Key Points

·         A data element is a property of importance to the business whose values contribute to identifying, describing or measuring instances of an entity.

·         A domain is a set of validation criteria that can be applied to more than one data element.

·         There are different types of domains, including format, list, and range domains.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset