Image241123.jpg

Glossary

4GL – fourth generation language – a computer language optimized for ease of use

Acronym resolution – the process of expanding acronyms into their literal meaning

Alternate spelling – a different way of forming a word pattern

Alternate storage – storage other than disk based storage used to hold bulk amounts of data

Analog – a type of computing driven by sensory perceptions and signals, as opposed to a digital computer

Analog data pond – the data pond where analog data is placed and processed

Application – a computerized system dedicated to solving or empowering a specific business function

Application data pond – the subset of the architected integrated data lake where application data is stored and processed

Archival database – a collection of data containing information of a historical nature

Archival processing – the activities surrounding older and/or inactive data

Archival data pool – a component of the architected integrated data lake environment where data is passed when the probability of access is close to zero

Big Data – the storage of massive amounts of data in inexpensive storage

Business process – a synonym for value chain, the term used to differentiate a value chain of activities from a functional process or functional set of activities

Business rule – a statement expressing a policy, guideline or condition that governs business activities and or business decisions

CIF – corporate information factory – the data warehouse centric architecture that contains operational sources of data, ETL, an ODS and data marts

Conditioning – the transformation process that data in the data ponds pass through

Constraint – the business rule that places a restriction on business actions and/or decisions

Contextualization – the process of identifying the context of a word

Database – a structured collection of units of data organized around some topic or theme

Data lake – the place Big Data is stored

Data pond – a subdivision of the architected integrated data lake

Data scientist – an individual dedicated to the study of patterns found in data

DBMS – database management system – system software that manages the storage and access of data on disk storage

Document – a basic unit of textual data

Great divide – the division of Big Data between repetitive data and non-repetitive data

Hadoop – technology designed to house Big Data – a framework for managing data

Homograph – a word or phrase whose interpretation depends on the person who originally wrote the word or phrase

Homographic resolution – the process of contextualizing data based on the identity of the person who uttered the text

Inline contextualization – the technique of inferring context by establishing a beginning delimiter and an ending delimiter

Log tape – A sequential record of the activities that have occurred inside a system. Sometimes called a “journal” tape. The primary purpose of a log tape is for backup and recovery of a system.

Logical data model – a data model based on inferred relationships

Metadata – the classic definition of metadata as “data about the data.”

Non-repetitive data – data whose records have no predictable pattern of structure or content. Typical non-repetitive records include email, call center data, warranty claim data, insurance claim data, and so forth

Parsing – the process of reading text and finding contextualized value that resides in the text

Pattern analysis – the analysis that seeks to find recognizable patterns in the occurrence of points of data

Proximity analysis – an analysis based on the closeness of words or taxonomies to each other

Statistical analysis – the process of looking at a large number of values and evaluating the values mathematically

Stop word – a word in a language that is needed for communication but not needed to convey information. In English there are stop words such as “a,” “and,” “the,” “to,” “from” and so forth

Structured data – data that is managed by a database management system

Taxonomy – a classification of text

Textual data pond – the subset of the architected integrated data lake where textual data is stored and processed

Textual disambiguation – the process of reading text and formatting text into a standard database format

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset