APPENDIX C

Glossary

abstraction

Abstraction is the removal of details in such a way as to broaden applicability to a wider class of situations, while preserving the important properties and essential nature of concepts or subjects. By removing these details, we remove differences and, therefore, change the way we view these concepts or subjects, including seeing similarities that were not apparent or even existent before. For example, we may abstract Employee and Consumer into the more generic concept of Person. A Person can play many Roles, two of which are Employee and Consumer.

alternate key

An alternate key is a candidate key that although unique, was not chosen as the primary key, but still can be used to find specific entity instances.

architect

An architect is an experienced and skilled designer responsible for system and/or data architecture supporting a broad scope of requirements over time, beyond the scope of a single project. The term implies a higher level of professional experience and expertise than an analyst, designer, or developer.

associative entity

An associative entity is an entity or table that resolves a many-to-many relationship between two other related entities or tables.

attribute

An attribute is a property of importance to the business. Its values contribute to identifying, describing, or measuring instances of an entity. The attribute Claim Number identifies each claim. The attribute Student Last Name describes the last name of each student. The attribute Gross Sales Amount measures the monetary value of a transaction.

business analyst

A business analysts is an IT or business professional responsible for understanding the business processes and the information needs of an organization, for serving as a liaison between IT and business units, and acting as a facilitator of organizational and cultural change.

candidate key

A candidate key is one or more attributes that uniquely identify an entity instance. Sometimes a single attribute identifies an entity instance, such as ISBN for a book, or Account Code for an account. Sometimes it takes more than one attribute to uniquely identify an entity instance. For example, both a Promotion Code and Promotion Start Date are necessary to identify a promotion.

cardinality

Cardinality defines the number of instances of each entity that can participate in a relationship. It is represented by the symbols that appear on both ends of a relationship line. It is through cardinality that the data rules are specified and enforced. Without cardinality, the most we can say about a relationship is that two entities are connected in some way through a rule. For example, where we know that Person and Company have some kind of relationship, but we dont know much more than this.

class word

A class word is the last term in an attribute name, such as Amount, Code, and Name. Class words allow for the assignment of common domains.

conceptual data model (CDM)

A conceptual data model is a set of symbols and text representing the key concepts and rules binding these key concepts for a specific business or application scope, for a particular audience, that fits neatly on one page. It could be an 8 ½ x 11, 8 ½ x 14, or similar sized paper, but it cannot be a plotter-sized piece of paper. Limiting the conceptual data model to one page is important because it forces the modeler and participants to select only key concepts.

conformed dimension

A conformed dimension is one that is shared across the business intelligence environment. Customer, Account, Employee, Product, Time, and Geography are examples of conformed dimensions. Ralph Kimball made the term popular. It requires the modeler to design the conformed dimension with a much broader perspective than just the requirements for a single data mart.

data model

A data model is a wayfinding tool for both business and data professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization, thereby leading to a more flexible and stable application environment.

data modeler

A data modeler is one who identifies data requirements, defines data, and develops and maintains data models.

data modeling

Data modeling is the process of building a data model. More specifically, data modeling is the set of techniques and activities that enable us to capture the data to support the structure and operations of an organization, as well as a proposed information solution that will enable the organization to achieve its goals. The process requires many skills, such as listening ability, courage to ask lots of questions, and even patience.

database administrator (DBA)

The DBA is the data professional role responsible for database administration, the function of managing the physical aspects of data resources, including database design and integrity, backup and recovery, performance and tuning.

denormalization

Denormalization is the process of selectively violating normalization rules and reintroducing redundancy into the model (and therefore, the database). This extra redundancy can reduce data retrieval time, which is the primary reason for denormalizing. We can also denormalize to create a more user-friendly model. For example, we might decide to denormalize company information into an entity containing employee information, because usually when employee information is retrieved, company information is also retrieved.

developer

A developer is a person who designs, codes and/or tests software. Synonymous with software developer, systems developer, application developer, software engineer, and application engineer.

dimension

A dimension is reference information whose purpose is to add meaning to the measures. All of the different ways of filtering, sorting, and summing measures make use of dimensions. Dimensions are often, but not exclusively, hierarchies.

dimensional model

A dimensional model focuses on capturing and aggregating the metrics from daily operations that enable the business to evaluate how well it is doing by manipulating the numbers. For example, examining the measure Gross Sales Amount at a day level and then, after getting the answer, looking at Gross Sales Amount at a month or year level, or at a product or brand level, or a city or country level. The dimensional model is all about playing with numbers.

domain

A domain is the complete set of all possible values that an attribute may be assigned. A domain is a set of validation criteria that can be applied to more than one attribute. For example, the domain Date, which contains all possible valid dates, can be assigned to any of these attributes:

Employee Hire Date

Order Entry Date

Claim Submit Date

Course Start Date

enterprise data model

An Enterprise Data Model (EDM) is a subject-oriented and integrated data model containing all of the data produced and consumed across an entire organization. Subject-oriented means that the concepts on a data model fit together as the CEO sees the company, rather than how any individual functional or department heads see their view of the company. There is one Customer entity, one Order entity, etc. Integration goes hand in hand with subject-orientation. Integration means that all of the data and rules in an organization are depicted once and fit together seamlessly. Every attribute has a single definition and name. Integration implies that with this single version of the truth comes a mapping back to the chaotic real world.

entity

An entity represents a collection of information about something that the business deems important and worthy of capture. A noun or noun phrase identifies a specific entity. It fits into one of several categorieswho, what, when, where, why, or how.

entity instance

Entity instances are the occurrences or values of a particular entity. Think of a spreadsheet as being an entity, with the column headings representing the pieces of information about the entity. Each spreadsheet row containing the actual values represents an entity instance. The entity Customer may have multiple customer instances with names Bob, Joe, Jane, and so forth. The entity Account can have instances of Bobs checking account, Bobs savings account, Joes brokerage account, and so on.

Extensible Markup Language (XML)

Extensible Markup Language (XML) is a specification for storing information, and for describing the structure of that information. XML is both useful and powerful for the same reasons any data model is useful and powerful: it is easy to understand, can be technology-independent, and enables representing complex problems with simple syntax. Similar to distinguishing conceptual data models from logical data models from physical data models, XML distinguishes the data content from formatting (e.g. blue, Arial, 15 point font) from rules.

factless fact

A fact table that does not contain any facts (i.e. measures) is called a factless fact. Factless facts count events by summing relationship occurrences between the dimensions. For example, a fact table called Attendance contains no measures. It links to the Student, Course, and Semester dimensions with the goal of counting relationship occurrences of how many students take a particular course in a particular semester.

foreign key

A foreign key is an attribute that provides a link to another entity. A foreign key allows a database management system to navigate from one entity to another. For example, we need to know who owns an Account, so we would want to include the identifier of the customer to whom it belongs in the entity. The Customer ID in Account is the primary key of that Customer in the Customer entity. Using this foreign key back to Customer enables the database management system to navigate from a particular account or accounts, to the customer or customers that own each account. Likewise, the database can navigate from a particular customer or customers, to find all of their accounts.

grain

The grain is the meters lowest level of detail. It should be low enough that the answers to all of the business questions within the scope of the dimensional model are derivable. It is generally a good practice to define the measures and grain as early as possible in the requirements process. In the ice cream model in this book, the grain is Ice Cream Container and Day.

index

An index is a pointer to something that needs to be retrieved. An analogy often used is the card catalog, which in the library, points you to the book you need. The card catalog will point you to the place where the actual book is on the shelf, a process that is much quicker than looking through each book in the library until you find the one you need. Indexing works the same way with data. The index points directly to the place on the disk where the data is stored, thus reducing retrieval time. Indexes work best on attributes whose values are requested frequently but rarely updated.

key

Key are one or more attributes that help us find specific entity instances. The Library of Congress assigns an ISBN (International Standard Book Number) to every book. A particular tax identifier can help us find an organization. The key Account Code can help us find a particular account.

logical data model (LDM)

A logical data model (LDM) is a business solution to a business problem. It is how the modeler captures the business requirements without complicating the model with implementation concerns such as software and hardware.

measure

A measure is an attribute that may be manipulated in or is the result of a calculation (e.g., sum, count, average, minimum, and maximum).

metadata

Metadata is text, voice, or image that describes what the audience wants or needs to see or experience. The audience could be a person, group, or software program. Metadata is important because it aids in clarifying and finding the actual data.

A particular context or usage can turn what we traditionally consider data into metadata. For example, search engines allow users to enter keywords to retrieve web pages. These keywords are traditionally data, but in the context of search engines they play the role of metadata. In much the same way that a particular person can be an Employee in one role and a Customer in another role, text, voice, or image can play different rolessometimes playing data and sometimes playing metadata, depending on what is important to a particular subject or activity.

meter

A meter is an entity containing a related set of measures. It is not a person, place, event, or thing, as we find on the relational model. Instead, it is a bucket of common measures. As a group, common measures address a business concern, such as Profitability, Employee Satisfaction, or Sales. The meter is so important to the dimensional model that the name of the meter is often the name of the application.

natural key

A natural key is what the business sees as the unique identifier for an entity.

normalization

Normalization is the process of applying a set of rules with the goal of organizing something. With respect to attributes, normalization ensures that every attribute is single-valued and provides a fact completely and only about its primary key. Single-valued means an attribute must contain only one piece of information. If Consumer Name contains Consumer First Name and Consumer Last Name, for example, we must split Consumer Name into two attributesConsumer First Name and Consumer Last Name. Provides a fact means that a given primary key value will always return no more than one of every attribute that is identified by this key. If a Customer Identifier value of 123 for example, returns three customer last names (Smith, Jones, and Roberts), this violates the dependency definition. Completely means that the minimal set of attributes that uniquely identify an instance of the entity is present in the primary key. If, for example, there are two attributes in an entitys primary key, but only one is needed for uniqueness, the attribute that is not needed for uniqueness should be removed from the primary key. Only means that each attribute must provide a fact about the primary key and nothing else. That is, there can be no hidden dependencies.

object

An object in an object-oriented design is synonymous with a class; an entity that combines descriptions of the common behavior of like instances along with their common data attributes. Objects may be business objects, interface objects, or control objects.

ontology

An ontology is a formal way of organizing information. It includes putting things into categories and relating these categories with each other. The most quoted definition of an ontology is Tom Grubers definition: Explicit specification of a conceptualization. In other words, an ontology is a model – a model being a simplification of something complex in our environment using a standard set of symbols.

partition

In general, a partition is a structure that divides or separates. Specific to the physical design, partitioning is used to break a table into rows, columns or both. There are two types of partitioningvertical and horizontal. To understand the difference between these two types, visualize a physical entity in a spreadsheet format where the attributes are the columns in the spreadsheet and the entity instances are the rows. Vertical means up and down. So vertical partitioning means separating the columns (the attributes) into separate tables. Horizontal means side to side. So horizontal partitioning means separating rows (the entity instances) into separate tables.

physical data model (PDM)

The physical data model (PDM) is the logical data model modified for a specific set of software or hardware. The PDM often gives up perfection for practicality, factoring in real concerns such as speed, space, and security.

primary key

A primary key is a candidate key that has been chosen to be the unique identifier for an entity.

program

A program is a large, centrally organized initiative that contains multiple projects. It has a start date and, if successful, no end date. Programs can be very complex and require long-term modeling assignments. Examples include a data warehouse, operational data store, and a customer relationship management system.

project

A project is a plan to complete a software development effort, often defined by a set of deliverables with due dates. Examples include a sales data mart, broker trading application, reservations system, and an enhancement to an existing application.

project manager

A project manager is a person who manages project resources and activities in order to deliver the agreed-upon project outputs.

recursive relationship

A recursive relationship is a relationship between instances of the same entity. For instance, one organization can report to another organization.

relational model

A relational model captures how the business works and contains business rules, such as A Customer must have at least one Account, or A Product must have a Product Short Name.

relationship

Rules are captured on our data model through relationships. A relationship is displayed as a line connecting two entities. If the two entities are Employee and Department, the relationship may capture the rules Each Employee must work for one Department and Each Department may contain many Employees.

slowly changing dimension (SCD)

Reference entity instances will experience changes over time, such as a person moving to a new address, or product name changing, or an account description being updated. On dimensional models, there is a special term that describes how to handle changing values: Slowly Changing Dimension (SCD). An SCD of Type 1 means only the most current information will be stored. An SCD of Type 2 means the most current along with all history will be stored. And an SCD of Type 3 means the most current and some history will be stored.

snowflake

A snowflake occurs when there are one or more tables for each dimension. Sometimes the snowflake structure is equivalent to the dimensional logical data model, where each level in a dimension hierarchy exists as its own table. Sometimes, in a snowflake there can be even more tables than exist on the dimensional logical data model. This is because vertical partitioning is applied to the dimensional model.

spreadsheet

A spreadsheet is a representation of a paper worksheet, containing a grid defined by rows and columns, where each cell in the grid can contain text or numbers.

stakeholder

A stakeholder is a person who has an interest in the successful completion of a project. Examples of stakeholders are project sponsors, business users, and team leads.

star schema

A star schema is the most common dimensional physical data model structure. A star schema results when each set of tables that make up a dimension is flattened into a single table. The fact table is in the center of the model and each of the dimensions relate to the fact table at the lowest level of detail. A star schema is relatively easy to create and implement, and visually appears elegant and simplistic to both IT and the business.

subject matter expert (SME)

A person with significant experience and knowledge of a given topic or function.

surrogate key

A surrogate key is a primary key that substitutes for a natural key, which is what the business sees as the unique identifier for an entity. It has no embedded intelligence and is used by IT (and not the business) for integration or performance reasons. Surrogate keys are useful for integration, which is an effort to create a single, consistent version of the data. Applications such as data warehouses often house data from more than one application or system. Surrogate keys enable us to bring together information about the same entity instance that is identified differently in each source system. Surrogate keys are also efficient. Youve seen that a primary key may be composed of one or more attributes of the entity. A single surrogate key is more efficient to use than having to specify three or four (or five or six) attributes to locate the single record youre looking for.

taxonomy

A taxonomy is an ontology in the form of a tree. A tree is when a child only has a single parent and a parent can contain one or more children. If a child can have more than one parent, than the child is typically repeated for each parent. Examples of kinds of taxonomies are product categorizations, supertype/subtype relationships on a relational data model, and dimensional hierarchies on a dimensional data model.

use case

In object-oriented analysis, a use case is a work flow scenario defined in order to identify objects, their data, and their methods (process steps).

user

A user is a person who enters information into an application or queries the application to answer business questions and produce reports.

view

A view is a virtual table. It is a dynamic view or window into one or more tables (or other views) where the actual data is stored. A view is defined by a query that specifies how to collate data from its underlying table(s) to form an object that looks and acts like a table but doesnt physically contain data. A query is a request that a user (or reporting tool) makes of the database, such as Bring me back all Customer IDs where the Customer is 90 days or more behind in their bill payments. The difference between a query and a view, however, is that the instructions in a view are already prepared for the user (or reporting tool) and stored in the database as the definition of the view, whereas a query is not stored in the database and may need to be written each time a question is asked.

wayfinding

Wayfinding encompasses all of the techniques and tools used by people and animals to find their way from one site to another. If travelers navigate by the stars, for example, the stars are their wayfinding tools. Maps and compasses are also wayfinding tools. All models are wayfinding tools. A model is a set of symbols and text used to make a complex concept easier to grasp.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset