Section II
Levels of Granularity

Description: cavemanflat300.tif

This section covers Conceptual Data Modeling (Chapter 5), Logical Data Modeling (Chapter 6), and Physical Data Modeling (Chapter 7). Notice the “ing” at the end of each of these chapter titles. We focus on the process of building each of these models, which is where we gain essential business knowledge. This pyramid summarizes the four levels of design:

At the highest level, we have the Conceptual Data Model (CDM), which captures the satellite view of the business solution. The CDM provides the context for and scope of the Logical Data Model (LDM), which provides the detailed business solution. The LDM becomes the starting point for applying the impact of technology in the Physical Data Model (PDM). The PDM represents the actual MongoDB database structures.

In addition to the conceptual, logical, and physical levels of detail, there are also two different modeling mindsets: relational and dimensional.

Relational data modeling is the process of capturing how the business works by precisely representing business rules, while dimensional data modeling is the process of capturing how the business is monitored by precisely representing business questions.

The major difference between relational and dimensional data models is in the meaning of the relationships. On a relational data model, a relationship communicates a business rule, while on a dimensional data model, the relationship communicates a navigation path. On a relational data model, for example, we can represent the business rule “A Customer must have at least one Account.” On a dimensional data model, we can display the measure Gross Sales Amount and all of the navigation paths from which a user needs to see Gross Sales Amount such as by day, month, year, region, account, and customer. The dimensional data model is all about answering business questions by viewing measures at different levels of granularity.

The following table summarizes these three levels of design and two modeling mindsets, leading to six different types of models:

Mindset

Relational

Dimensional

Levels of Design

CDM

Key concepts and their business rules such as, “Each Customer may place one or many Orders.”

Key concepts focused on answering a set of business questions such as, “Can I see Gross Sales Amount by Customer?”

LDM

All attributes required for a given application, neatly organized into entities according to strict business rules, and independent of technology such as, “Each Customer ID value must return, at most, one Customer Last Name.”

All attributes required for a given analytical application, focused on answering a set of business questions and independent of technology, such as, “Can I see Gross Sales Amount by Customer and view the customer’s first and last name?”

PDM

The LDM modified to perform well in MongoDB. For example, “To improve retrieval speed, we need a non-unique index on Customer Last Name.”

The LDM modified to perform well in MongoDB. For example, “Because there is a need to view Gross Sales Amount at a Day level, and then by Month and Year, we should consider embedding all calendar fields into a single collection.”

Note that it seems like there is a lot of work to do; we need to go through all three phases – conceptual, logical, and physical. Wouldn’t it be easier to just jump straight to building a MongoDB database and be done with it?

Going through the proper levels of design will take more time than just jumping into building a MongoDB database. However, the thought process we go through in building the application should ideally cover the steps we go through during these three levels of design anyway. For example, if we jump straight into building a MongoDB database, we would still need to ask at least some of the questions about definitions and business rules; it’s just that we would do it all at once instead of in separate phases. Also, if we don’t follow these modeling steps proactively, we will be asking the questions during support, where fixing things can be much more expensive in terms of time, money, and reputation. Believe me, I know—many of my consulting assignments involve fixing situations due to skipping levels of design (e.g., jumping right to the physical). I can’t tell you how many times during my assignments I have heard a manager use the phrase “technical debt” to summarize the high cost to maintain and poor performance of applications built without conceptual and logical data models. For example, take the MongoDB document we created in the previous chapter:

{

titleName : “Extreme Scoping”,

subtitleName : “An Agile Approach to Enterprise Data Warehousing and Business Intelligence”,

pageCount : 300

}

This is a very simple document with just three fields: the book’s title name, subtitle name, and page count. However, even with just these three fields, there are conceptual, logical, and physical questions that need to be answered.

During conceptual data modeling, we would address questions such as these:

  • What is the right name for the concept of “book”? Should we call it a “book” or a “title” or a “copy” or an “intellectual unit”?
  • What is a clear, correct, and complete definition for the concept of “book”? Once we get this definition agreed upon and documented, nobody else will need to go through this painful definition process again.
  • Is the scope only book, or can it include other important concepts such as author, publisher, and customer? That is, what is the scope of the application?
  • Can a book exist without an author?
  • Can a book be written by more than one author?

During logical data modeling, we would address questions such as these:

  • Is the book’s title name required? Is subtitle name required? Is page count required?
  • Can a book have more than one subtitle?
  • How do you identify a book?
  • Is an eBook considered a book?
  • Does an eBook have a page count?

During physical data modeling, we would address questions such as these:

  • How many books do we have, and therefore, how much space will we need?
  • What are the performance impacts of loading and retrieving book data?
  • Where do we need additional indexes to further improve retrieval performance?
  • Should we embed or reference?
  • What are the history requirements?

By the end of this section, you will be able to appreciate, understand, and complete the three different phases of modeling for MongoDB applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset