|
4GL – fourth generation language – a computer language optimized for ease of use
Acronym resolution – the process of expanding acronyms into their literal meaning
Alternate spelling – a different way of forming a word pattern
Alternate storage – storage other than disk based storage used to hold bulk amounts of data
Analog – a type of computing driven by sensory perceptions and signals, as opposed to a digital computer
Analog data pond – the data pond where analog data is placed and processed
Application – a computerized system dedicated to solving or empowering a specific business function
Application data pond – the subset of the architected integrated data lake where application data is stored and processed
Archival database – a collection of data containing information of a historical nature
Archival processing – the activities surrounding older and/or inactive data
Archival data pool – a component of the architected integrated data lake environment where data is passed when the probability of access is close to zero
Big Data – the storage of massive amounts of data in inexpensive storage
Business process – a synonym for value chain, the term used to differentiate a value chain of activities from a functional process or functional set of activities
Business rule – a statement expressing a policy, guideline or condition that governs business activities and or business decisions
CIF – corporate information factory – the data warehouse centric architecture that contains operational sources of data, ETL, an ODS and data marts
Conditioning – the transformation process that data in the data ponds pass through
Constraint – the business rule that places a restriction on business actions and/or decisions
Contextualization – the process of identifying the context of a word
Database – a structured collection of units of data organized around some topic or theme
Data lake – the place Big Data is stored
Data pond – a subdivision of the architected integrated data lake
Data scientist – an individual dedicated to the study of patterns found in data
DBMS – database management system – system software that manages the storage and access of data on disk storage
Document – a basic unit of textual data
Great divide – the division of Big Data between repetitive data and non-repetitive data
Hadoop – technology designed to house Big Data – a framework for managing data
Homograph – a word or phrase whose interpretation depends on the person who originally wrote the word or phrase
Homographic resolution – the process of contextualizing data based on the identity of the person who uttered the text
Inline contextualization – the technique of inferring context by establishing a beginning delimiter and an ending delimiter
Log tape – A sequential record of the activities that have occurred inside a system. Sometimes called a “journal” tape. The primary purpose of a log tape is for backup and recovery of a system.
Logical data model – a data model based on inferred relationships
Metadata – the classic definition of metadata as “data about the data.”
Non-repetitive data – data whose records have no predictable pattern of structure or content. Typical non-repetitive records include email, call center data, warranty claim data, insurance claim data, and so forth
Parsing – the process of reading text and finding contextualized value that resides in the text
Pattern analysis – the analysis that seeks to find recognizable patterns in the occurrence of points of data
Proximity analysis – an analysis based on the closeness of words or taxonomies to each other
Statistical analysis – the process of looking at a large number of values and evaluating the values mathematically
Stop word – a word in a language that is needed for communication but not needed to convey information. In English there are stop words such as “a,” “and,” “the,” “to,” “from” and so forth
Structured data – data that is managed by a database management system
Taxonomy – a classification of text
Textual data pond – the subset of the architected integrated data lake where textual data is stored and processed
Textual disambiguation – the process of reading text and formatting text into a standard database format