6 sigma Six standard deviations used to describe a level of quality in which six standard deviations of the population fall within the upper and lower limits of quality
Access The operation of seeking, reading, or writing data on a storage unit
Accuracy to reality A characteristic of information quality measuring the degree to which a data value correctly represents the attributes of the real-world object or event
Accuracy to surrogate source A measure of the degree to which data agree with the original, acknowledged authoritative source of data about a real-world object or event
Acronym resolution The process of expanding acronyms into their literal meaning
Accuracy A qualitative assessment of freedom from error or a quantitative measurement of the magnitude of error
Active data dictionary An automated metadata management facility that is tightly and interactively woven into the development and analysis process
Actuary A professional mathematician trained in the art of studying life expectancy and accident probabilities
Ad hoc processing One time only casual access and manipulation of data never used before, usually done in a heuristic, iterative manner
Address The location of a unit of data
After image A snapshot of data placed on a log after the conclusion of a transaction
Agent of change A motivating force large enough not to be denied
Algorithm The instructions that govern the flow of activity in a procedure
Alternate spelling A different way of forming a word pattern
Amazon.com A successful dot.com retailer company
Analog A type of computing driven by sensory perceptions and signals, as opposed to a digital computer
Anchor data in a dis The key attribute(s) of a dis
API Application programming interface
Application A computerized system dedicated to solving or empowering a specific business function
Application blocking of data The grouping of different occurrences of data into a single unit of storage controlled by the application programmer
Application database A collection of data organized in support of a specific function
Archival database A collection of data containing information of a historical nature
Archival processing The activities surrounding older and/or inactive data
Array of data A data structure that holds multiple occurrences of data
Artifact A design technique used to support referential integrity in a decision support system (DSS) environment
ATM Automated teller machine—a “money machine”
Attribute A value of data that is distinguishable from other values
Audit trail Data that are useful in tracing the activity of one or more transactions
Availability The measurement of time for the online system to be up and running
Backup A file serving the purpose of allowing an online file to be restored as of some moment in time
Batch Computer environment in which long running sequential programs can run where there is no conflict with the online transaction environment
Batch processing The collection of transaction into “batches” that are processed collectively
BCD Binary-coded decimal
BI Business intelligence
Bias The condition in sampling where the sample contains data that are not representative of the whole
Bill of materials A listing of the components of an assembly
Blather Email message generated internally that have no business relevance
Block of data A large physical unit of data that can contain records of data
Blog A personal diary that is open for the public to scrutinize
Boiler plate Text that is copied verbatim for the purpose of serving as a general template
Browser A program executing on a client to interpret a Web page (usually in HTML) and render a proper image of that page
Buffer A work space, usually in memory
Business process A synonym for value chain, the term used to differentiate a value chain of activities from a functional process or functional set of activities
Business rule A statement expressing a policy, guideline, or condition that governs business activities and/or business decisions
Byte A basic unit of storage, usually 8 bits in length
C Name of a programming language first developed as part of the UNIX project at AT&T but now widely used by personal computer software developers
Cache A buffer inside the computer built and maintained at the device level. Retrieval of data stored in cache is accomplished in terms of electronic speeds
Call center A facility of the organization where an agent of the organization can engage in conversation with other people
CASE Computer-aided software engineering—generally refers to a class of software products that are used to partially automate the design and development of other software
CDC Change data capture—the incremental changes to a database are captured and stored and then retransacted or logged onto another database
Cell of a spreadsheet A basic unit of data found in a spreadsheet
Change data capture (CDC) The data that are gathered incrementally as a result of transaction processing in order to form the basis of update to a data warehouse
Class I ODS An ODS whose latency is measured in 1 second or less
Class II ODS An ODS whose latency is measured in 4 hours or less
Class III ODS An ODS whose latency is measured in 24 hours or less
Client The node in a client-server architecture that initiates a request to a server and processes the results
Click stream data Automated measurements of the activity occurring on a web site
Cluster A means of storing date from multiple tables based on a common key value
COBOL Common business-oriented language—an early popular computer language, designed for the business user (see Grace Hopper)
Code (1) A symbolic value or (2) instructions written in a language directing the computer how to proceed
Collision The mapping of two or more records to the same location by the hasher
Column A vertical table in which values are selected from the same domain
Comments A field of data containing free-form text
Compliance Business rules enforced by legislation or some other governing body
Connector A symbol used to indicate that one logical grouping of data has a relationship with another logical grouping of data
Constraint The business rule that places a restriction on business actions and/or decisions
Content enriched Big data whose content has been contextualized
Context The surrounding environment that gives definition to a word
Contextualization The process of identifying the context of a word
Core An early form of storage for storing data available to the CPU. Core operated under the principles governed by the hysteresis curve
Corporate data The entire body of data of the corporation
CRM Customer relationship management—a popular DSS application used to streamline customer relationships
CRT Cathode-ray tube—a display device; a screen
Cullinet An early DBMS vendor selling a networked database management system
Current valued data Data whose accuracy is as of the moment of access; online data
Curve of usefulness The curve that indicates that the fresher data are, the more likely they are to be useful
Customer The user or consumer of a product or a service
Cycle The complete steps required to execute a process
Cycle time The measurement of cycle time
Data analyst An individual who gathers and analyzes the results of the execution of a process
Database A structured collection of units of data organized around some topic or theme
Data definition The process of defining the semantics of data
Data element An attribute belonging to an entity
Data flow diagram (DFD) A schematic indicating the direction of the movement of data
Data governance The activities necessary to the management of integrity of data
Data integrity The assurance of the timeliness and the accuracy of data in a database system
Data item set (DIS) The midlevel data model
Data mart A subset of a data warehouse that’s usually oriented to a business group or team
Data mining Analysis of large quantities of data to find patterns such as groups of records, unusual records, and dependencies
Data model An abstraction of data
Data quality The properties of data embodied by the “five Cs”: clean, consistent, conformed, current, and comprehensive
Data scientist An individual dedicated to the study of patterns found in data
Data store (1) A component of a DFD in which data are shown to be collected outside of a process or (2) a place where data are kept
Data structure A logical relationship among data elements designed to support specific data manipulation functions
Data virtualization The process of retrieving and manipulating data without requiring details of how the data formatted or where the data are located
Data visualization Presenting data in a visual way, such as with graphs and charts, helps business people glean insights they might not otherwise see. Dashboards use the concept of data visualization to present data for analysis. IT is often a part of self-service BI but is only as effective as the quality of the data it draws upon
Data warehouse A subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management’s decisions
DB2/UDB Database management system by IBM
DBMS Database management system—system software that manages the storage and access of data on disk storage
DC Data communications—technology that manages messages generated as part of transaction processing
Decryption The process of returning text to its original state after that text has been encrypted
Defect An item that does not conform to expected quality standards
Denormalization The design technique of placing normalized data in a structure so that access to the data is optimized
Dependent data mart A data mart whose sole source of data is the data warehouse; a dependent data mart is a component of the corporate information factory
Derived data Data whose value is achieved as the result of a calculation
Dimension A category for summarizing or viewing data (e.g., a time period, product, product line, and geographic region)
Dimensional modeling A generally accepted practice in the data warehouse industry to structure data intended for user access, analysis, and reporting in dimensional data models
Dimension table The place where extraneous data that relate to a fact table inside a start join are placed
Direct access of data The ability of a database management system to directly find data, as opposed to having to sequentially search for data
Directory A table, block, folder, or database containing indexes and their interpretation
Dis Data item set—the midlevel of a data model
Disk storage Physical media used for storing values of data
Distillation The process of analyzing a large number of records (usually big data records) and producing a single result
Document A basic unit of textual data
Documentation Verbiage describing a system, application, database, procedure, etc.
Document fracturing In textual disambiguation, the process of sequentially processing text looking for text that satisfies such criteria as stop word processing, stemming, homographic resolution, and so forth
Download The movement of a bulk amount of data from one environment to another
Drill down processing The analytic activity of examining an element of data at a lower level of detail after examining the value of data at a higher level
Ed Yourdon An information technology pioneer who started the “structured” movement
Electronic text Text in a form where the words of the text are recognized by the computer
ELT Extract/load/transform—the process of extracting, loading, and transforming data. The problem with ELT is that many organizations only extract and load the data but fail to transform the data
E-mail Messages from one party to another carried on an electronic medium
Encoding The process of encryption of text into a form unrecognizable by an outsider
Encryption The process of scrambling data into a form that is not recognizable
Entity A broad classification of data; a subject area
ERD Entity relationship diagram—a logical description of how the major subject areas of the corporation fit together
ERP Enterprise resource planning—the name given to technology where applications are written by a vendor where there are multiple users of the software
ETL Extract, transform, and load—the process in which data are taken from the source system, configured, and stored in a data warehouse or database. ETL tools automate data integration tasks
Event The demarcation or recording made of the passage of some activity
External data Data whose source is outside of the system of the organization
Export The process of moving data from one environment to another
Fact table The data structure where basic facts in a star join are stored
Farmer A person in the organization who does analytic work that is repetitive and predictable
Feedback loop A procedure where the results of one iteration of processing are made available for the next iteration of processing
Field An element of data and attribute
File A collection of records
File structure The organization of the collection of records
Filter The process of removing data from a set of data based on the value of one or more fields of data
Flat file A collection of records where the structure of each record is identical
Foreign key An attribute used for distinguishing a record that participates in a relationship with another table
Format The arrangement of data onto a data structure
Functional decomposition The process of reducing a large function or process into smaller finer functions
Generic data model A data model of an industry, rather than of a specific company. A generic data model can be used as a template that can be customized for a given company within the industry that has been modeled
Granularity The level of detail found in a record of data
Great divide The division of big data between repetitive data and nonrepetitive data
GUI Graphical user interface
Hadoop, technology designed to house big data A framework for managing data
Hashing algorithm An algorithm converts data values into an address
Heuristic process An iterative process, where the next step of analysis depends on the results attained in the current level of analysis
HIPAA The law protecting medical privacy
Hit An occurrence of data that satisfies one or more search criteria
Hollerith punched cards An early means of storing data, typically containing 80 columns
Homograph A word or phrase whose interpretation depends on the person who originally wrote the word or phrase
Homographic resolution The process of contextualizing data based on the identity of the person who uttered the text
Host The processor receiving and processing a transaction
HTML Hypertext markup language
IBM A large computer manufacturer
IBM 360 A machine that standardized operating systems. With the IBM 360 line, there was compatibility of processing across different machine types. A revolutionary technology that changed the face of computing
Identifier An attribute used to pick out a row of data from a collection of rows of data
IDMS A network DBMS by Cullinet
Image A picture, such as a real estate photo of a house for sale or an x-ray
IMS Information management system—a hierarchical DBMS by IBM
Index A database shows the address of a database record based on a value found in the record
In-line contextualization The technique of inferring context by establishing a beginning delimiter and an ending delimiter
Inmon, Bill The father of data warehouse and textual disambiguation
Instance A member of a shared partition database system, such as an Oracle cluster
Integrity of data The assurance that data are correct and accurate as stored
Internet The system by which data are stored and are made available to a large audience
Interactive A mode of processing in which the end user directly moves data into and out of a system
Intranet A TCP/IP network that is physically separated from the Internet
Inverted list A data structure in which a flat file is indexed
I/O Input/output operation—the activity or reading or writing a record to disk storage. I/O operations happen in terms of mechanical speeds
ISO International Organization for Standardization
IT The information technology organization—the organizational entity charged with building and managing applications and technology systems
Iterative process A process that is done in short finite steps, where there are many steps, but where each step is taken quickly
Join The process of merging two or more tables on the basis of a common key
Key An identifying attribute of data
KPI Key performance indicator—a measurement made periodically by the organization that examines important variables
Language The text that is used to communicate with the computer. Some languages are optimized for ease of use. Other languages are optimized for speed of processing
Legacy systems The older systems used to run the business of the corporation as it was defined 10 or 20 years ago
Line The hardware by which data flow into or out of a device
Lineage of data The “family tree” of data. Data are transformed in many ways as they pass through a system. The lineage is a record of the transformations of data from the moment they enter a system until they are used in analysis.
Link The mechanism by which two systems or two environments form a common relationship
Linux An operating system
Load utility A utility provided by a DBMS vendor in which data are efficiently loaded into the DBMS
Lock The means by which data are protected from update process while the transaction that is updating the data is in execution
Log A journal of activities
Log tape A sequential record of the activities that have occurred inside a system. Sometimes called a “journal” tape. The primary purpose of a log tape is for backup and recovery of a system
Machine cycle A full cycle of processing inside a computer
Magnetic tape An early sequential storage mechanism
Mainframe The monolithic processors produced by IBM and Amdahl
Mapping The instructions to textual ETL as to how to interpret a document or type of document
MapReduce A language for processing big data
MDM Master data management—the set of processes used to create and maintain a consistent view, also referred to as a master list, of key enterprise reference data. These data include such entities as customers, prospects, suppliers, employees, products, services, assets, and accounts. They also include the groupings and hierarchies associated with these entities
Mean The average value of a set of values
Median value The middle value of a set of values when the values are ranked according to value
Memory The high-speed storage that is available to the computer. Memory is accessed and processed in terms of electronic speeds
Message The data input by the end user in order to initiate a transaction
Metadata The classic definition of metadata as “data about the data.”
ODS Operational data store—a type of database often used as an interim area for a data warehouse. Unlike a data warehouse, which contains static data, the contents of the ODS are updated through the course of business operations
Meteorologic data Data downloaded from a satellite regarding weather patterns on earth
Methodology A prescribed way of executing a process
Microsoft A software vendor primarily of desktop technology
MPP Massively parallel processing—a type of operating system capable of handling large volumes of data
Multiplex The ability of a system to share memory
Named value processing One of the two primary processing paths for textual ETL. Named value processing includes standard index processing, in-line contextualization, and custom variable processing
Narrative Prosaic text
Network The means by which electronic communications occurs between two or more nodes
Networked DBMS A DBMS whose primary relationship between records is a networked relationship
NLP Natural language processing—the notion that the context of text can be inferred from the text itself
Node A processing location in a network
Nonlinear format A format of text or reported values where the text or variables are arranged in a nonlinear format
Nonrepetitive data Data whose records have no predictable pattern of structure or content. Typical nonrepetitive records include e-mail, call center data, warranty claim data, and insurance claim data
Nonvolatile data Data that once written cannot be changed. Sometimes called “snapshot” data
Normalization The process of organizing data at its detailed level into according to its existence criteria
Occurrence A specific instance of an entity type
OCR Optical character recognition
ODS Operational data store—a data structure that contains some of the properties of the data warehouse and some of the properties of the operational system. As a rule, the ODS is an optional structure that is found at some companies and not at others
OLAP Online analytical processing—this technique for analyzing business data uses cubes, which are like multidimensional pivot tables in spreadsheets. OLAP tools can perform trend analysis and enable drilling down into data. They enable multidimensional analysis such as analyzing by time, product, and geography. The major types of OLAP processing are MOLAP (multidimensional) and ROLAP (relational). HOLAP (hybrid) processing combines them.
OLTP Online transaction processing—the environment where online transaction processing is executed
Online response time The length of time from the moment an operator initiates a transaction until that transaction returns output to the user
Online storage Storage devices that can be accessed directly and interactively
Ontology A logical relationship of elements participating in a taxonomy
Operating system The technology that controls the computer and all its operations
Operational BI Analytic processing based on data generated by operational processing
Operational environment The processing center where day-to-day transactional processing is supported
Operational system A system that manages and executes the transactions used in the day-to-day operations of the organization
Operations The department charged with running the computer environment
Optical disk A storage medium using lasers rather than magnetic devices
Oracle A large database vendor
Oxide The surface of the storage medium where bits are stored
Page A basic unit of storage in DASD
Paper tape A very early form of storage
Parallel I/O In a nonmainframe environment, when more than one processor does I/O at the same time, it is called parallel I/O
Parallel management of data The processing approach where multiple machines are run in tandem with each other so that the elapsed processing time is reduced
Parameter An elementary data value used as a criterion for qualification of data
Parent/child relationship A hierarchical relationship of data for every parent node, there can be from 0 to n children nodes.
Pareto chart A method of displaying data values over time and classification
Parity check A means of ensuring the quality of data at the lowest level of storage
Parsing The process of reading text and finding contextualized value that resides in the text
Partition A segmentation technique in which data are divided into physically different units
Passive data dictionary A repository of data where the storage of metadata may or may not be used in the development and analytic process
Pattern analysis The analysis that seeks to find recognizable patterns in the occurrence of points of data
PC Personal computer—a laptop/desktop device for personal computing
PDF Portable document format by Adobe
Peak period processing The time of day when the most activities are passing through the system
Performance The measurement of system response time
Physical characteristics of data The physical dimension and configuration of a unit of data or data structure
Physical model The physical definition of the shape and structure of data (as defined to the DBMS)
Poisson distribution The right-hand side of a bell curve as measured from the zero axis
Populate To load data into a previously unpopulated database
Population The totality of the sets of data constituting a database or a group of entities being analyzed
Post processing The processing that optionally can occur after text has passed through textual ETL
Prefix space The overhead space that every occurrence of data has that allows the system to form a structure of data
Preprocessing The editing that can precede textual processing
Primary key Unique identifying information for a unit of data
Primitive data Data whose existence depends on only a single occurrence of a major subject area of the enterprise
Probability of access The mathematical statement of the likelihood that a unit of data will be accessed
Processor The hardware at the center of the execution of a computer program
Program A procedure embodied in code
Proper text Formal text as taught by a teacher of language (as opposed to slang, shorthand, notes, comments, etc.)
Proximity analysis An analysis based on the closeness of words or taxonomies to each other
Public accounting firm An organization charged with commenting on the compliance of a publicly traded corporation to accounting standards and rules
Punched cards An early form of storage that had many disadvantages
Queue time The length of time a transaction waits in the processing queue before the transaction is processed
Query A procedure executed by a computer program in search of qualified data
Query language A computer language designed to support end user queries
Random access The ability of the system to directly access data
Random-access storage A storage technique where the time required to obtain information is independent of the location of the information most recently obtained
Random number generator An algorithm that is capable of generating numbers in a seemingly random sequence
Random sampling The process of selecting a subset of a large population for analysis
Record A unit of data that typically contains keys and attributes
Record locking A means of ensuring transaction integrity during update processing
Recovery The restoration of a system (usually an online system) to an earlier moment in time
Redundancy Multiple occurrences of the same unit of data
Referential integrity The process of relating data together in a disciplined manner
Relational model A form of data where data are normalized
Repeating groups A collection of data that occurs multiple times within a given record of data
Repetitive data Data whose units repeat in terms of structure and even content
Report decompilation The process of reading a report and reducing the report to a normalized database. In general report, decompilation is a nonlinear process because of the complexity of the format of the report
Reporting The process of collecting data from various sources and presenting it to business people in an understandable way
Repository A place where important corporate metadata are stored
Requirements A statement of what is needed in the functionality of a system
Reservations systems A system where corporation makes general reservations for services and products, such as an airline, hotel chain, or car rental organization
Response time The measurement of time from when a transaction is initiated until the first of the transaction output is returned to the user
ROI Return on investment
Rolling summary data A technique of archiving data where the most recent data are the most detailed stored and where over time the detailed data are rolled up into a summary-level data
Roman census approach The method of moving processing to the data rather than moving data to the processor
Root segment The base occurrence of data for an entity; the data to which all other data relate
Row A basic unit of storage; a record of data
SAP An ERP application software company
Sarbanes-Oxley Act A law requiring information compliance for publicly traded corporations. Sarbanes-Oxley was passed because of the misdeeds of Enron corporation
SAS A company specializing in statistical analysis software
Schema The means by which a pattern of data is identified
SDLC System development life cycle—the waterfall approach to the development of systems (see Ed Yourdon)
Security The protection of data and transactions
Select The identification of a set of data that meet specified criteria
Sequential analysis of data A process in which data are accessed sequentially
Sequential file A file of data that has been organized where one unit of data is accessed in a linear fashion
Scope of integration A statement of the limits of integration
SDLC System development life cycle—the development life cycle based on the contributions of Ed Yourdon and Tom Dimarco
Security The means by which data are protected
Self-service BI An infrastructure that allows BI consumers to get the information they need without the help of the IT group
Session The work or activities accomplished in one sitting by the end user
Shared memory An arrangement of processors in which up to four processors share the same memory (see multiplexing)
Shorthand The practice in transcription of not writing down actual words but writing down shortened symbols for those words
Silicon A raw material much like sand that can be shaped into many different end products, such as semiconductors, beer bottles, and body parts
Silicon Valley The location where original technological innovation starts, in the Northern California, San Jose, Santa Clara, Mountain View vicinity
Siloed systems The practice of building application system that has no interface or exchange of other application systems, where there are common data between those systems
Skip sequential The more of accessing data where data are accessed directly, followed by long periods of sequential access
SKU Stock keeping unit—in retailing, the practice of tracking a record of each unit of inventory
SLA Service-level agreement—the agreement within the corporation governing response time of transaction systems and “up time,” the amount of time the system is up and available
Slang Improper language—language that is used improperly, such as the word “ain't”
Sort To arrange data in a sequence based on values found in the data
Snapshot record A record of data taken at a moment in time that cannot be updated
Snowflake structure The dimensional modeling approach where more than one star schema are joined together
Source code The uncompiled version of code
Spam Unwanted, unsolicited e-mail generated outside the corporation
Sparse index An index that contains only selected entries of data
Spider web systems The early architecture where applications grew in a siloed manner
Spreadsheet The primary tool found in the personal computing environment
SQL The language interface for relational systems
SQL Server The DBMS built and managed by Microsoft
Staging area A location where data that are to be transformed are held in abeyance waiting for other events to occur
Standard work unit (SWU) The process of creating small modules that can flow efficiently and without bottlenecks
Star schema (or “star join”) A fact table and its related dimension tables
Statistical analysis The process of looking at a large number of values and evaluating the values mathematically
Stemming The reduction of words to their root. For example, the stem of moving, moved, mover, and move is the stem “mov”
Stop word A word in a language that is needed for communication but not needed to convey information. In English, there are stop words such as “a,” “and,” “the,” “to,” and “from”
Storage hierarchy Storage units linked to form a storage subsystem in which some units are small and fast to access and other units are larger and slower to access
State A stage in a life cycle
Structured data Data that are managed by a database management system
Subdoc processing The recognition by textual ETL of the logical grouping of sections of text
Subject-oriented database A database organized around the major entities of the corporation
Synonym In grammar, a word that is a substitute for another word
System of record (or “single version of the truth”) The building of systems where there is integrity of data; there is one and only one location where any given unit of data is created, updated, and deleted from
Table A relation that consists of a set of columns with a heading and a set of rows (tuples)
Taxonomy A classification of text
TCP/IP Transmission control protocol/Internet protocol—networking protocol developed initially for DARPA widely used on UNIX networks
Teradata A database software company
Text Words; language
Textual disambiguation The process of reading text and formatting text into a standard database format
Textual ETL See textual disambiguation
Time stamping The practice of adding an element of time to a given row of data
Time variant Data that cannot be updated and whose value is accurate as of some one moment in time
Tom Demarco An early pioneer along with Ed Yourdon specializing in structured systems development
Transaction A computerized process that conducts business, usually updating or creating values
Transaction processing environment The location and equipment where transaction processing for a corporation takes place
Transparency The property of a structure of data to be able to be examined synthetically
Trend analysis The analysis of data over a period of time
Trigger The tripping of a condition that causes another event to occur
Uniprocessor A computer that has only one processor
Unstructured data Data whose logical organization is not apparent to the computer
Unstructured data warehouse A data warehouse whose source of data is unstructured data
Update To change or alter the value of data in a database
User The individual engaging in computation
Variable fields Fields that may or may not occur in a data structure
Variable length fields Fields of data that are not fixed in length
VDU Video display unit—a terminal
Video Media where there is moving action and accompanying audio
Voice recognition The technology that allows voice to be converted to an electronic format
Waterfall development The SDLC, so called because any one development activity must be done before the next activity can begin and because the output from any one level of activity becomes the input into the next level
Zachman, John A thought leader and pioneer in computer science
Zachman framework The development framework built by John Zachman where engineering principles are applied to the information systems development process