Glossary

6 sigma Six standard deviations used to describe a level of quality in which six standard deviations of the population fall within the upper and lower limits of quality

Access The operation of seeking, reading, or writing data on a storage unit

Accuracy to reality A characteristic of information quality measuring the degree to which a data value correctly represents the attributes of the real-world object or event

Accuracy to surrogate source A measure of the degree to which data agree with the original, acknowledged authoritative source of data about a real-world object or event

Acronym resolution The process of expanding acronyms into their literal meaning

Accuracy A qualitative assessment of freedom from error or a quantitative measurement of the magnitude of error

Active data dictionary An automated metadata management facility that is tightly and interactively woven into the development and analysis process

Actuary A professional mathematician trained in the art of studying life expectancy and accident probabilities

Ad hoc processing One time only casual access and manipulation of data never used before, usually done in a heuristic, iterative manner

Address The location of a unit of data

After image A snapshot of data placed on a log after the conclusion of a transaction

Agent of change A motivating force large enough not to be denied

Algorithm The instructions that govern the flow of activity in a procedure

Alternate spelling A different way of forming a word pattern

Amazon.com A successful dot.com retailer company

Analog A type of computing driven by sensory perceptions and signals, as opposed to a digital computer

Anchor data in a dis The key attribute(s) of a dis

API Application programming interface

Application A computerized system dedicated to solving or empowering a specific business function

Application blocking of data The grouping of different occurrences of data into a single unit of storage controlled by the application programmer

Application database A collection of data organized in support of a specific function

Archival database A collection of data containing information of a historical nature

Archival processing The activities surrounding older and/or inactive data

Array of data A data structure that holds multiple occurrences of data

Artifact A design technique used to support referential integrity in a decision support system (DSS) environment

ATM Automated teller machine—a “money machine”

Attribute A value of data that is distinguishable from other values

Audit trail Data that are useful in tracing the activity of one or more transactions

Availability The measurement of time for the online system to be up and running

Backup A file serving the purpose of allowing an online file to be restored as of some moment in time

Batch Computer environment in which long running sequential programs can run where there is no conflict with the online transaction environment

Batch processing The collection of transaction into “batches” that are processed collectively

BCD Binary-coded decimal

BI Business intelligence

Bias The condition in sampling where the sample contains data that are not representative of the whole

Bill of materials A listing of the components of an assembly

Blather Email message generated internally that have no business relevance

Block of data A large physical unit of data that can contain records of data

Blog A personal diary that is open for the public to scrutinize

Boiler plate Text that is copied verbatim for the purpose of serving as a general template

Browser A program executing on a client to interpret a Web page (usually in HTML) and render a proper image of that page

Buffer A work space, usually in memory

Business process A synonym for value chain, the term used to differentiate a value chain of activities from a functional process or functional set of activities

Business rule A statement expressing a policy, guideline, or condition that governs business activities and/or business decisions

Byte A basic unit of storage, usually 8 bits in length

C Name of a programming language first developed as part of the UNIX project at AT&T but now widely used by personal computer software developers

Cache A buffer inside the computer built and maintained at the device level. Retrieval of data stored in cache is accomplished in terms of electronic speeds

Call center A facility of the organization where an agent of the organization can engage in conversation with other people

CASE Computer-aided software engineering—generally refers to a class of software products that are used to partially automate the design and development of other software

CDC Change data capture—the incremental changes to a database are captured and stored and then retransacted or logged onto another database

Cell of a spreadsheet A basic unit of data found in a spreadsheet

Change data capture (CDC) The data that are gathered incrementally as a result of transaction processing in order to form the basis of update to a data warehouse

Class I ODS An ODS whose latency is measured in 1 second or less

Class II ODS An ODS whose latency is measured in 4 hours or less

Class III ODS An ODS whose latency is measured in 24 hours or less

Client The node in a client-server architecture that initiates a request to a server and processes the results

Click stream data Automated measurements of the activity occurring on a web site

Cluster A means of storing date from multiple tables based on a common key value

COBOL Common business-oriented language—an early popular computer language, designed for the business user (see Grace Hopper)

Code (1) A symbolic value or (2) instructions written in a language directing the computer how to proceed

Collision The mapping of two or more records to the same location by the hasher

Column A vertical table in which values are selected from the same domain

Comments A field of data containing free-form text

Compliance Business rules enforced by legislation or some other governing body

Connector A symbol used to indicate that one logical grouping of data has a relationship with another logical grouping of data

Constraint The business rule that places a restriction on business actions and/or decisions

Content enriched Big data whose content has been contextualized

Context The surrounding environment that gives definition to a word

Contextualization The process of identifying the context of a word

Core An early form of storage for storing data available to the CPU. Core operated under the principles governed by the hysteresis curve

Corporate data The entire body of data of the corporation

CRM Customer relationship management—a popular DSS application used to streamline customer relationships

CRT Cathode-ray tube—a display device; a screen

Cullinet An early DBMS vendor selling a networked database management system

Current valued data Data whose accuracy is as of the moment of access; online data

Curve of usefulness The curve that indicates that the fresher data are, the more likely they are to be useful

Customer The user or consumer of a product or a service

Cycle The complete steps required to execute a process

Cycle time The measurement of cycle time

Data analyst An individual who gathers and analyzes the results of the execution of a process

Database A structured collection of units of data organized around some topic or theme

Data definition The process of defining the semantics of data

Data element An attribute belonging to an entity

Data flow diagram (DFD) A schematic indicating the direction of the movement of data

Data governance The activities necessary to the management of integrity of data

Data integrity The assurance of the timeliness and the accuracy of data in a database system

Data item set (DIS) The midlevel data model

Data mart A subset of a data warehouse that’s usually oriented to a business group or team

Data mining Analysis of large quantities of data to find patterns such as groups of records, unusual records, and dependencies

Data model An abstraction of data

Data quality The properties of data embodied by the “five Cs”: clean, consistent, conformed, current, and comprehensive

Data scientist An individual dedicated to the study of patterns found in data

Data store (1) A component of a DFD in which data are shown to be collected outside of a process or (2) a place where data are kept

Data structure A logical relationship among data elements designed to support specific data manipulation functions

Data virtualization The process of retrieving and manipulating data without requiring details of how the data formatted or where the data are located

Data visualization Presenting data in a visual way, such as with graphs and charts, helps business people glean insights they might not otherwise see. Dashboards use the concept of data visualization to present data for analysis. IT is often a part of self-service BI but is only as effective as the quality of the data it draws upon

Data warehouse A subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management’s decisions

DB2/UDB Database management system by IBM

DBMS Database management system—system software that manages the storage and access of data on disk storage

DC Data communications—technology that manages messages generated as part of transaction processing

Decryption The process of returning text to its original state after that text has been encrypted

Defect An item that does not conform to expected quality standards

Denormalization The design technique of placing normalized data in a structure so that access to the data is optimized

Dependent data mart A data mart whose sole source of data is the data warehouse; a dependent data mart is a component of the corporate information factory

Derived data Data whose value is achieved as the result of a calculation

Dimension A category for summarizing or viewing data (e.g., a time period, product, product line, and geographic region)

Dimensional modeling A generally accepted practice in the data warehouse industry to structure data intended for user access, analysis, and reporting in dimensional data models

Dimension table The place where extraneous data that relate to a fact table inside a start join are placed

Direct access of data The ability of a database management system to directly find data, as opposed to having to sequentially search for data

Directory A table, block, folder, or database containing indexes and their interpretation

Dis Data item set—the midlevel of a data model

Disk storage Physical media used for storing values of data

Distillation The process of analyzing a large number of records (usually big data records) and producing a single result

Document A basic unit of textual data

Documentation Verbiage describing a system, application, database, procedure, etc.

Document fracturing In textual disambiguation, the process of sequentially processing text looking for text that satisfies such criteria as stop word processing, stemming, homographic resolution, and so forth

Download The movement of a bulk amount of data from one environment to another

Drill down processing The analytic activity of examining an element of data at a lower level of detail after examining the value of data at a higher level

Ed Yourdon An information technology pioneer who started the “structured” movement

Electronic text Text in a form where the words of the text are recognized by the computer

ELT Extract/load/transform—the process of extracting, loading, and transforming data. The problem with ELT is that many organizations only extract and load the data but fail to transform the data

E-mail Messages from one party to another carried on an electronic medium

Encoding The process of encryption of text into a form unrecognizable by an outsider

Encryption The process of scrambling data into a form that is not recognizable

Entity A broad classification of data; a subject area

ERD Entity relationship diagram—a logical description of how the major subject areas of the corporation fit together

ERP Enterprise resource planning—the name given to technology where applications are written by a vendor where there are multiple users of the software

ETL Extract, transform, and load—the process in which data are taken from the source system, configured, and stored in a data warehouse or database. ETL tools automate data integration tasks

Event The demarcation or recording made of the passage of some activity

External data Data whose source is outside of the system of the organization

Export The process of moving data from one environment to another

Fact table The data structure where basic facts in a star join are stored

Farmer A person in the organization who does analytic work that is repetitive and predictable

Feedback loop A procedure where the results of one iteration of processing are made available for the next iteration of processing

Field An element of data and attribute

File A collection of records

File structure The organization of the collection of records

Filter The process of removing data from a set of data based on the value of one or more fields of data

Flat file A collection of records where the structure of each record is identical

Foreign key An attribute used for distinguishing a record that participates in a relationship with another table

Format The arrangement of data onto a data structure

Functional decomposition The process of reducing a large function or process into smaller finer functions

Generic data model A data model of an industry, rather than of a specific company. A generic data model can be used as a template that can be customized for a given company within the industry that has been modeled

Granularity The level of detail found in a record of data

Great divide The division of big data between repetitive data and nonrepetitive data

GUI Graphical user interface

Hadoop, technology designed to house big data A framework for managing data

Hashing algorithm An algorithm converts data values into an address

Heuristic process An iterative process, where the next step of analysis depends on the results attained in the current level of analysis

HIPAA The law protecting medical privacy

Hit An occurrence of data that satisfies one or more search criteria

Hollerith punched cards An early means of storing data, typically containing 80 columns

Homograph A word or phrase whose interpretation depends on the person who originally wrote the word or phrase

Homographic resolution The process of contextualizing data based on the identity of the person who uttered the text

Host The processor receiving and processing a transaction

HTML Hypertext markup language

IBM A large computer manufacturer

IBM 360 A machine that standardized operating systems. With the IBM 360 line, there was compatibility of processing across different machine types. A revolutionary technology that changed the face of computing

Identifier An attribute used to pick out a row of data from a collection of rows of data

IDMS A network DBMS by Cullinet

Image A picture, such as a real estate photo of a house for sale or an x-ray

IMS Information management system—a hierarchical DBMS by IBM

Index A database shows the address of a database record based on a value found in the record

In-line contextualization The technique of inferring context by establishing a beginning delimiter and an ending delimiter

Inmon, Bill The father of data warehouse and textual disambiguation

Instance A member of a shared partition database system, such as an Oracle cluster

Integrity of data The assurance that data are correct and accurate as stored

Internet The system by which data are stored and are made available to a large audience

Interactive A mode of processing in which the end user directly moves data into and out of a system

Intranet A TCP/IP network that is physically separated from the Internet

Inverted list A data structure in which a flat file is indexed

I/O Input/output operation—the activity or reading or writing a record to disk storage. I/O operations happen in terms of mechanical speeds

ISO International Organization for Standardization

IT The information technology organization—the organizational entity charged with building and managing applications and technology systems

Iterative process A process that is done in short finite steps, where there are many steps, but where each step is taken quickly

Join The process of merging two or more tables on the basis of a common key

Key An identifying attribute of data

KPI Key performance indicator—a measurement made periodically by the organization that examines important variables

Language The text that is used to communicate with the computer. Some languages are optimized for ease of use. Other languages are optimized for speed of processing

Legacy systems The older systems used to run the business of the corporation as it was defined 10 or 20 years ago

Line The hardware by which data flow into or out of a device

Lineage of data The “family tree” of data. Data are transformed in many ways as they pass through a system. The lineage is a record of the transformations of data from the moment they enter a system until they are used in analysis.

Link The mechanism by which two systems or two environments form a common relationship

Linux An operating system

Load utility A utility provided by a DBMS vendor in which data are efficiently loaded into the DBMS

Lock The means by which data are protected from update process while the transaction that is updating the data is in execution

Log A journal of activities

Log tape A sequential record of the activities that have occurred inside a system. Sometimes called a “journal” tape. The primary purpose of a log tape is for backup and recovery of a system

Machine cycle A full cycle of processing inside a computer

Magnetic tape An early sequential storage mechanism

Mainframe The monolithic processors produced by IBM and Amdahl

Mapping The instructions to textual ETL as to how to interpret a document or type of document

MapReduce A language for processing big data

MDM Master data management—the set of processes used to create and maintain a consistent view, also referred to as a master list, of key enterprise reference data. These data include such entities as customers, prospects, suppliers, employees, products, services, assets, and accounts. They also include the groupings and hierarchies associated with these entities

Mean The average value of a set of values

Median value The middle value of a set of values when the values are ranked according to value

Memory The high-speed storage that is available to the computer. Memory is accessed and processed in terms of electronic speeds

Message The data input by the end user in order to initiate a transaction

Metadata The classic definition of metadata as “data about the data.”

ODS Operational data store—a type of database often used as an interim area for a data warehouse. Unlike a data warehouse, which contains static data, the contents of the ODS are updated through the course of business operations

Meteorologic data Data downloaded from a satellite regarding weather patterns on earth

Methodology A prescribed way of executing a process

Microsoft A software vendor primarily of desktop technology

MPP Massively parallel processing—a type of operating system capable of handling large volumes of data

Multiplex The ability of a system to share memory

Named value processing One of the two primary processing paths for textual ETL. Named value processing includes standard index processing, in-line contextualization, and custom variable processing

Narrative Prosaic text

Network The means by which electronic communications occurs between two or more nodes

Networked DBMS A DBMS whose primary relationship between records is a networked relationship

NLP Natural language processing—the notion that the context of text can be inferred from the text itself

Node A processing location in a network

Nonlinear format A format of text or reported values where the text or variables are arranged in a nonlinear format

Nonrepetitive data Data whose records have no predictable pattern of structure or content. Typical nonrepetitive records include e-mail, call center data, warranty claim data, and insurance claim data

Nonvolatile data Data that once written cannot be changed. Sometimes called “snapshot” data

Normalization The process of organizing data at its detailed level into according to its existence criteria

Occurrence A specific instance of an entity type

OCR Optical character recognition

ODS Operational data store—a data structure that contains some of the properties of the data warehouse and some of the properties of the operational system. As a rule, the ODS is an optional structure that is found at some companies and not at others

OLAP Online analytical processing—this technique for analyzing business data uses cubes, which are like multidimensional pivot tables in spreadsheets. OLAP tools can perform trend analysis and enable drilling down into data. They enable multidimensional analysis such as analyzing by time, product, and geography. The major types of OLAP processing are MOLAP (multidimensional) and ROLAP (relational). HOLAP (hybrid) processing combines them.

OLTP Online transaction processing—the environment where online transaction processing is executed

Online response time The length of time from the moment an operator initiates a transaction until that transaction returns output to the user

Online storage Storage devices that can be accessed directly and interactively

Ontology A logical relationship of elements participating in a taxonomy

Operating system The technology that controls the computer and all its operations

Operational BI Analytic processing based on data generated by operational processing

Operational environment The processing center where day-to-day transactional processing is supported

Operational system A system that manages and executes the transactions used in the day-to-day operations of the organization

Operations The department charged with running the computer environment

Optical disk A storage medium using lasers rather than magnetic devices

Oracle A large database vendor

Oxide The surface of the storage medium where bits are stored

Page A basic unit of storage in DASD

Paper tape A very early form of storage

Parallel I/O In a nonmainframe environment, when more than one processor does I/O at the same time, it is called parallel I/O

Parallel management of data The processing approach where multiple machines are run in tandem with each other so that the elapsed processing time is reduced

Parameter An elementary data value used as a criterion for qualification of data

Parent/child relationship A hierarchical relationship of data for every parent node, there can be from 0 to n children nodes.

Pareto chart A method of displaying data values over time and classification

Parity check A means of ensuring the quality of data at the lowest level of storage

Parsing The process of reading text and finding contextualized value that resides in the text

Partition A segmentation technique in which data are divided into physically different units

Passive data dictionary A repository of data where the storage of metadata may or may not be used in the development and analytic process

Pattern analysis The analysis that seeks to find recognizable patterns in the occurrence of points of data

PC Personal computer—a laptop/desktop device for personal computing

PDF Portable document format by Adobe

Peak period processing The time of day when the most activities are passing through the system

Performance The measurement of system response time

Physical characteristics of data The physical dimension and configuration of a unit of data or data structure

Physical model The physical definition of the shape and structure of data (as defined to the DBMS)

Poisson distribution The right-hand side of a bell curve as measured from the zero axis

Populate To load data into a previously unpopulated database

Population The totality of the sets of data constituting a database or a group of entities being analyzed

Post processing The processing that optionally can occur after text has passed through textual ETL

Prefix space The overhead space that every occurrence of data has that allows the system to form a structure of data

Preprocessing The editing that can precede textual processing

Primary key Unique identifying information for a unit of data

Primitive data Data whose existence depends on only a single occurrence of a major subject area of the enterprise

Probability of access The mathematical statement of the likelihood that a unit of data will be accessed

Processor The hardware at the center of the execution of a computer program

Program A procedure embodied in code

Proper text Formal text as taught by a teacher of language (as opposed to slang, shorthand, notes, comments, etc.)

Proximity analysis An analysis based on the closeness of words or taxonomies to each other

Public accounting firm An organization charged with commenting on the compliance of a publicly traded corporation to accounting standards and rules

Punched cards An early form of storage that had many disadvantages

Queue time The length of time a transaction waits in the processing queue before the transaction is processed

Query A procedure executed by a computer program in search of qualified data

Query language A computer language designed to support end user queries

Random access The ability of the system to directly access data

Random-access storage A storage technique where the time required to obtain information is independent of the location of the information most recently obtained

Random number generator An algorithm that is capable of generating numbers in a seemingly random sequence

Random sampling The process of selecting a subset of a large population for analysis

Record A unit of data that typically contains keys and attributes

Record locking A means of ensuring transaction integrity during update processing

Recovery The restoration of a system (usually an online system) to an earlier moment in time

Redundancy Multiple occurrences of the same unit of data

Referential integrity The process of relating data together in a disciplined manner

Relational model A form of data where data are normalized

Repeating groups A collection of data that occurs multiple times within a given record of data

Repetitive data Data whose units repeat in terms of structure and even content

Report decompilation The process of reading a report and reducing the report to a normalized database. In general report, decompilation is a nonlinear process because of the complexity of the format of the report

Reporting The process of collecting data from various sources and presenting it to business people in an understandable way

Repository A place where important corporate metadata are stored

Requirements A statement of what is needed in the functionality of a system

Reservations systems A system where corporation makes general reservations for services and products, such as an airline, hotel chain, or car rental organization

Response time The measurement of time from when a transaction is initiated until the first of the transaction output is returned to the user

ROI Return on investment

Rolling summary data A technique of archiving data where the most recent data are the most detailed stored and where over time the detailed data are rolled up into a summary-level data

Roman census approach The method of moving processing to the data rather than moving data to the processor

Root segment The base occurrence of data for an entity; the data to which all other data relate

Row A basic unit of storage; a record of data

SAP An ERP application software company

Sarbanes-Oxley Act A law requiring information compliance for publicly traded corporations. Sarbanes-Oxley was passed because of the misdeeds of Enron corporation

SAS A company specializing in statistical analysis software

Schema The means by which a pattern of data is identified

SDLC System development life cycle—the waterfall approach to the development of systems (see Ed Yourdon)

Security The protection of data and transactions

Select The identification of a set of data that meet specified criteria

Sequential analysis of data A process in which data are accessed sequentially

Sequential file A file of data that has been organized where one unit of data is accessed in a linear fashion

Scope of integration A statement of the limits of integration

SDLC System development life cycle—the development life cycle based on the contributions of Ed Yourdon and Tom Dimarco

Security The means by which data are protected

Self-service BI An infrastructure that allows BI consumers to get the information they need without the help of the IT group

Session The work or activities accomplished in one sitting by the end user

Shared memory An arrangement of processors in which up to four processors share the same memory (see multiplexing)

Shorthand The practice in transcription of not writing down actual words but writing down shortened symbols for those words

Silicon A raw material much like sand that can be shaped into many different end products, such as semiconductors, beer bottles, and body parts

Silicon Valley The location where original technological innovation starts, in the Northern California, San Jose, Santa Clara, Mountain View vicinity

Siloed systems The practice of building application system that has no interface or exchange of other application systems, where there are common data between those systems

Skip sequential The more of accessing data where data are accessed directly, followed by long periods of sequential access

SKU Stock keeping unit—in retailing, the practice of tracking a record of each unit of inventory

SLA Service-level agreement—the agreement within the corporation governing response time of transaction systems and “up time,” the amount of time the system is up and available

Slang Improper language—language that is used improperly, such as the word “ain't”

Sort To arrange data in a sequence based on values found in the data

Snapshot record A record of data taken at a moment in time that cannot be updated

Snowflake structure The dimensional modeling approach where more than one star schema are joined together

Source code The uncompiled version of code

Spam Unwanted, unsolicited e-mail generated outside the corporation

Sparse index An index that contains only selected entries of data

Spider web systems The early architecture where applications grew in a siloed manner

Spreadsheet The primary tool found in the personal computing environment

SQL The language interface for relational systems

SQL Server The DBMS built and managed by Microsoft

Staging area A location where data that are to be transformed are held in abeyance waiting for other events to occur

Standard work unit (SWU) The process of creating small modules that can flow efficiently and without bottlenecks

Star schema (or “star join”) A fact table and its related dimension tables

Statistical analysis The process of looking at a large number of values and evaluating the values mathematically

Stemming The reduction of words to their root. For example, the stem of moving, moved, mover, and move is the stem “mov”

Stop word A word in a language that is needed for communication but not needed to convey information. In English, there are stop words such as “a,” “and,” “the,” “to,” and “from”

Storage hierarchy Storage units linked to form a storage subsystem in which some units are small and fast to access and other units are larger and slower to access

State A stage in a life cycle

Structured data Data that are managed by a database management system

Subdoc processing The recognition by textual ETL of the logical grouping of sections of text

Subject-oriented database A database organized around the major entities of the corporation

Synonym In grammar, a word that is a substitute for another word

System of record (or “single version of the truth”) The building of systems where there is integrity of data; there is one and only one location where any given unit of data is created, updated, and deleted from

Table A relation that consists of a set of columns with a heading and a set of rows (tuples)

Taxonomy A classification of text

TCP/IP Transmission control protocol/Internet protocol—networking protocol developed initially for DARPA widely used on UNIX networks

Teradata A database software company

Text Words; language

Textual disambiguation The process of reading text and formatting text into a standard database format

Textual ETL See textual disambiguation

Time stamping The practice of adding an element of time to a given row of data

Time variant Data that cannot be updated and whose value is accurate as of some one moment in time

Tom Demarco An early pioneer along with Ed Yourdon specializing in structured systems development

Transaction A computerized process that conducts business, usually updating or creating values

Transaction processing environment The location and equipment where transaction processing for a corporation takes place

Transparency The property of a structure of data to be able to be examined synthetically

Trend analysis The analysis of data over a period of time

Trigger The tripping of a condition that causes another event to occur

Uniprocessor A computer that has only one processor

Unstructured data Data whose logical organization is not apparent to the computer

Unstructured data warehouse A data warehouse whose source of data is unstructured data

Update To change or alter the value of data in a database

User The individual engaging in computation

Variable fields Fields that may or may not occur in a data structure

Variable length fields Fields of data that are not fixed in length

VDU Video display unit—a terminal

Video Media where there is moving action and accompanying audio

Voice recognition The technology that allows voice to be converted to an electronic format

Waterfall development The SDLC, so called because any one development activity must be done before the next activity can begin and because the output from any one level of activity becomes the input into the next level

Zachman, John A thought leader and pioneer in computer science

Zachman framework The development framework built by John Zachman where engineering principles are applied to the information systems development process

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset