CHAPTER 6

Why are names and definitions important?

Disambiguated

Clear and complete and correct

Standards needed here

I was sitting at a table in a meeting room, listening to managers describing their projects. Although this meeting was part of a celebration of the success of a very large data warehouse project, this part of the meeting was dedicated to lessons learned. Each manager was talking about what he would like to have done if he had the opportunity to do it all over again. I jotted down the keywords from each of the presentations. Looking over my notes afterward, I noticed a common theme: each manager wished that he had put more effort into documenting the meanings behind the terms in his applications. They needed better definitions. They may also have needed better names.

Many data modeling objects have names that take the form of a noun or a phrase that involves a noun. Use the right noun or nouns in the name of an entity, and we’ll all know what the entity is, surely? The same is true for attributes as well, isn’t it? Perhaps not: Ron Ross quotes a practitioner who says The more self-evident the meaning of a term is, the more trouble you can expect. (Ross, 2009, p. 12). How many different definitions of ‘Product’ or ‘Customer’ does your organization have?

Not all of your entity or attribute names will be quite so contentious, of course. That does not mean you have to pay less attention to them, though. Consider the simple entity shown in Figure 6.1.

Figure 6.1 Ambiguous or unclear attribute names

Assume that you and your business experts have discussed what is meant by Order, and you agree on what the entity represents. Now you need to discuss the meaning of the data elements - their names currently lack meaning. For example, which of the many dates that could be associated with an order is the Date attribute meant to represent? Does the Dispatch attribute hold a dispatch date? If so, is that a requested, estimated, or actual dispatch date? Perhaps it tells us how many deliveries are needed, or whether or not the order has been fulfilled. The names in Figure 6.2 are more descriptive and answer more questions, but we still need to agree on their definitions.

Figure 6.2 Clearer attribute names

As Ron Ross says, a noun or noun phrase…represents merely the tip of an iceberg with respect to meaning (Ross, 2009, p. 12). In Figure 6.2, what exactly is the Total Value Amount of an Order? Who estimated the Completion Date? When was the Estimate made? What does ‘completion’ mean?

It’s vital that the definitions we provide for our data answer these questions. Data can only be used effectively if it can be understood, and it can only be understood if the definitions are adequate.

Definitions are important for three main reasons:

·         Assist business and IT with decision making. If a business user has an interpretation of a concept that differs from the one actually implemented, it is easy for poor decisions to be made, compromising the entire application. For instance, if a business user would like to know how many products were ordered each month, imagine the poor judgments that could result if the user expected raw materials to be included with products, but they were not. Or, what if he or she assumed that raw materials were not included, but they were?

·         Help reveal, document, and resolve different perspectives on the same concept. In Chapter 15, we discuss Subject Area Models, which are a great medium for resolving differences in opinion on the meaning of high-level terms. Folks in accounting and sales can both agree that Customer is an important concept. But can both groups agree on a common definition? If they can agree, you are one step closer to creating a holistic view of the business.

·         Support data model precision. A data model needs to be precise, which requires that the subject area definitions are also precise. An Order Line cannot exist without a Product, for example. However, if the definition of Product is missing or vague, we have less confidence in the concept and its relationships. Is a Product, for example, raw materials and intermediate goods, or only a finished item ready for resale? Can we have an order for a service, or must a product be a tangible deliverable? The definition is needed to support the Product concept and its relationship to Order Line. When order data is spread across multiple systems, it is vital to understand the similarities and differences between the purpose and meaning of the data in each system, or we run the risk of counting apples and pears, and calling them all gooseberries.

There are several techniques for writing a good definition. One that Steve likes in particular is to write the definition as if you are explaining the term to a child. You wouldn’t use many big words or restate the obvious. It also would not be too verbose, as a child may not have the same attention span as we do. Avoid defining a concept using only the terms in the name of the concept (e.g. The Customer is our customer.); it’s good practice to write the definition before naming a concept. Examine ways in which the definition could break down; these tend to be the result of exceptional business events, such as a customer filing for bankruptcy, or a container ship sinking with the loss of all cargo. Ensure that your proposed definitions are verified by as wide an audience as possible. Remember that your definitions will be read by people you don’t know, and you cannot predict who those people will be, nor their background, nor what preconceptions they may have when they read your definitions.

Despite their importance, definitions are often omitted or written with minimal attention to their audience. Therefore, when writing definitions, we need to be aware of four characteristics that lead to a high-quality definition that the audience can understand. Those characteristics are clarity, completeness, accuracy, and lack of ambiguity; they’re summarized in this section. Please refer also to another book of Steve’s, The Data Modeler’s Workbench, for an entire chapter dedicated to definitions.

Clarity means that a reader can understand the meaning of a term by reading the definition only once. A clear definition does not require the reader to decipher how each sentence should be interpreted. A good way to make sure your definition is clear is to think about what makes a definition unclear. We need to avoid restating the obvious and using obscure technical terminology or abbreviations in our definitions. Just to restate the obvious, restating the obvious means that we are not providing any new information, we are merely describing something that already has been mentioned or that is easy to find elsewhere. Let’s say, for example, that the definition of associate identifier is Associate identifier or The identifier for the associate. Equally unclear is the use of synonyms, as in the pseudo definition The identifier for an employee. As far as clarity is concerned, we also need to make sure our audience understands the terms in our definition. Using acronyms, abbreviations, and industry jargon in definitions without explaining them can cause one to lose some of one’s audience.

This category focuses on making sure the definition is at the appropriate level of detail and that it includes all the necessary components, such as derivations and examples. Having a definition at the appropriate level of detail means that it is not so general as to provide very little additional value, yet not so specific that it provides value to only one application or department—or that it adds value only at a certain point in time.

Sometimes, in order to meet the needs of the entire company (or even the entire industry), we create a very general definition so that all parties can agree on the meaning. It is usually a very short definition - one that does not offend any of the parties. It is a definition that leaves little to debate, because it meets everyone’s needs at a high-level. General definitions may include dictionary quotations, ambiguous terminology, and omit detail such as units of measure or details of calculations.

An example of a dictionary quotation as a definition for product might be something produced by human or mechanical effort or by a natural process. What value does this dictionary quotation provide to an organization? If this dictionary quotation is only part of the definition instead of being the definition, the definition may be considered complete.

An example of an ambiguous terminology definition for Social Security number might be associated with an employee. We know the Social Security number is associated with an employee, but what does it mean?

An example of a definition containing an omission would be this hypothetical definition of order weight: the total shipping weight of an order delivered to a destination, including packaging, used to ensure that the maximum carry weight on a truck is not exceeded. It is not clear from this definition whether the order weight is in pounds, hundredweights, tons, or tonnes (metric tons). Is order weight the same as shipping weight?

The opposite of making a definition too general is making it too specific. Too specific means that the definition is correct within a certain scope, but does not address the complete scope of the term being defined. Definitions that are too specific usually include references to certain departments, applications, or states. Sometimes they simply consist entirely of examples or derivations. For example, imagine if the definition of the term party included only examples. This is more common than you might think in draft data models, for instance this definition of Party:

customer

supplier

competitor

employee

Examples alone make for an incomplete definition. For a definition to be complete, the broadness of the definition must match the broadness of the term. The examples listed don’t tell us what a Party means to our organization, or why have we included it in our data model.

This category focuses on having a definition that completely matches what the term means and is consistent with the rest of the business. Accuracy means that an expert in the field would agree that the term matches the definition. One of the difficulties with this category is that as we define broader terms that cross departments, such as product, customer, and employee, we tend to get more than one accurate definition, depending on who we ask. A recruiting department, for example, may have a definition for employee that is accurate but nonetheless different from the definition offered by a benefits department. The problem is the state issue, discussed earlier. A good solution to this problem is to use subtypes on your model that contain each of the distinct states of an employee. Through the accurate definition of each subject, every state is captured.

If your definition is ambiguous, then its accuracy is in doubt. One of the easiest ways to make a definition ambiguous is to insert the words ‘may’, ‘should’, or ‘might’. When you use these words in conversation, the person you are talking to can interpret your meaning by your body language, tone of voice, and the context of the conversation. If the person is uncertain, they can ask for clarification. When they read your words on a page or a screen, those options are not available; the words used in the definition must prevent the question being asked, by being unambiguous.

George was flying home from a conference recently, the plane was rolling towards the gate, and a crew member was telling the passengers about the airport they were arriving at. In effect, she was describing business rules, including the rule about smoking in the airport terminal. She said, Smoking may not be permitted in the terminal building. If you were a smoker, wanting to know when you’d be able to light your next cigarette, how would you interpret that rule? The words ‘may not’ make the rule ambiguous, implying that there are possibly some circumstances where you would be able to smoke. Perhaps there will be signs saying ‘You may now smoke’, or directing you to a special smoking area.

In fact, the airport is in a country where it is illegal to smoke in airport terminals; no ambiguity there at all. The airline wisely chose to avoid using the words ‘illegal’ or ‘crime’ in its announcements, but why did it choose to say ‘may not be’ instead of simply ‘Smoking is not permitted in the terminal building’? Perhaps it was the author’s cultural background, or the author wanted to avoid sounding like a parent laying down rules to their children. Whatever the reason, the airline failed to make an accurate statement of the business rule, which might lead a passenger to make a decision they later regret.

There are many objects in data models, most of which need to be provided with a name. The quality of the name you give a model or diagram is not as critical to the organization as the names of entities and data elements, but you must apply the same thought processes to choosing that name. If you’re faced with four data models with similar names and no version numbers, how do you know which one you can rely on? The same is true for diagrams; it is essential to know what a diagram is meant to represent. If it helps (and it often does), consider providing definitions for models and diagrams that you expect others to re-use. PowerDesigner and other data modeling tools allow you to provide definitions for data models and diagrams, so take advantage of that.

Data modelers often overlook the need for quality, accurate relationship names – how many times have you looked at a relationship between two entities and had to ask the modeler what they mean by ‘has associated’? What if the modeler is no longer available?

In Figure 6.3, none of the relationships have been named. At first sight, it appears that the relationship from Contract to Contract Payment would probably be called ‘Results In’. However, on closer examination, we can see that Contract Payment includes two Contract ID attributes. They’re both foreign keys, so there is at least one relationship missing from the diagram; Figure 6.1 is obviously not the complete model. The relationship that we can see is fully optional, so it cannot be the dependent relationship via which Contract Payment inherits the primary key attribute Contract ID. So what is this mysterious optional relationship? If the modeler is not available, do we have time to re-do the analysis that led the modeler to create this relationship? Should it even be there – perhaps it was created in error, and overlooked by whoever reviewed the model?

Figure 6.3 Unnamed relationships

There is also uncertainty over the role of the four relationships involving Currency. Three of them are optional; what conditions would cause these relationships to exist? Why would a Contract or Contract Payment require a second Currency Code? Has the modeler made an assumption, valid or not, about standard industry practice? Will the potential audience for the model all make the same assumption, or a different one? To eliminate these uncertainties, provide meaningful relationship role names, and ensure that the descriptions of the foreign key attributes describe when and why they would exist. Don’t be surprised if asking for more details about the relationships uncovers further changes you need to make to the model.

Unless you’re working in a very simple modeling and application environment, it’s important to have naming standards for the key objects you’re going to create. It’s common to have naming standards for, say, Oracle databases, but you also need naming standards for conceptual and logical models.

For example, how do you structure the names of attributes in Logical Data Models, and how do they transform into database column names? Common considerations include the sequence of words in the name, and whether or not the entity name is included. I suggest you refer to Data Modeling Essentials (Simsion and Witt, 2005, pp. 166-171) for a discussion of the topic.

PowerDesigner provides support for naming standards, and helps you to achieve consistency of names across models, especially when you generate one model from another. See Managing Names and Codes in Chapter 20 for more on this topic.

The PowerDesigner Glossary allows you to manage the terminology you use in your object names. See Deploying an Enterprise Glossary in Chapter 20.

Key Points

·         It’s vital that the definitions we provide for our data answer the reader’s questions.

·         Three main functions that make definitions so important

o                      Assist with decision making

o                      Reveal, document and resolve different perspectives

o                      Support data model precision.

·         When writing definitions, remember your audience.

·         Four characteristics of a high-quality definition

o        Clarity

o        Completeness

o        Accuracy

o        Lack of ambiguity.

·         It’s important to have naming standards for the key objects you’re going to create

o        Not just entities and data elements

o        Relationships are especially important.

·         PowerDesigner provides support for naming standards, and helps you to achieve consistency of names across models, especially when you generate one model from another.

·         The PowerDesigner glossary allows you to manage the terminology you use in your names.

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset