Key Concepts

Before launching into the main text of this book, we have found it pertinent to recall the definitions of some key concepts. Needless to say, the following list is not exhaustive:

  • Big Data: The term Big Data is used when the amount of data that an organization has to manage reaches a critical volume that requires new technological approaches in terms of storage, processing, and usage. Volume, speed, and variety are usually the three criteria used to qualify a database as “Big Data”.
  • Cloud computing: This term designates a set of processes that use computational and/or storage capacities from remote servers connected through a network, usually the Internet. This model allows access to the network on demand. Resources are shared and computational power is configured according to requirements.
  • Competitive intelligence: It is the set of coordinated information gathering, processing and dissemination activities useful for economic actors. According to the Marte Report, competitive intelligence can be defined as the set of coordinated information research, processing and dissemination actions aimed at exploiting it for the purpose of economic actors. This diverse set of actions is carried out legally with all data protection guarantees necessary to preserve the company’s assets, with the highest regard to quality, deadlines and cost. Useful information is needed at the company or partnership’s different decision-making levels in order to design and put into place strategies and techniques coherently aimed at achieving company-defined objectives and improving its position in the competitive environment in which it operates. These kind of actions take place in an uninterrupted cycle that generates a shared vision of company objectives.
  • Data: This term comprises facts, observations and raw information. Data itself has little meaning if it is not processed.
  • Data analysis: This is a class of statistical methods that makes it possible to process a very large volume of data and identify the most interesting aspects of its structure. Some methods help to extract relations between different sets of data and thus draw statistical information that makes it possible describe the most important information contained in the data in the most succinct manner possible. Other techniques make it possible to group data in order to identify its common denominators clearly, and thereby understand them better.
  • Data governance: It constitutes a framework of quality control for management and key information resource protection within a company. Its mission is to ensure that the data is managed in accordance with the company’s values and convictions, to oversee its quality and to put mechanisms into place that monitor and maintain that quality. Data governance includes data management, oversight, quality evaluation, coherence, integrity and IT resource security within a company.
  • Data journalism: The term designates a new form of journalism based on data analysis and (often) on its visual representation. The journalist uses databases as his or her sources and deduces knowledge, meaningful relationships or intuitions from them that would not be accessible through traditional research methods. Even when the article itself stands as the main component of the work, illustrating ideas through graphs, diagrams, maps, etc., is becoming more important day by day.
  • Data mining: Also referred to as knowledge discovery from data, is intended for the extraction of knowledge from large amounts of data using automatic or semi-automatic methods. Data mining uses algorithms drawn from disciplines as diverse as statistics, artificial intelligence and computer science in order to develop models from data; that is, in order to find interesting structures or recurrent themes according to criteria determined beforehand and to extract the largest possible amount of knowledge useful to companies. It groups together all technologies capable of analyzing database information in order to find useful information and possible significant and useful relationships within the data.
  • Data reuse: This practice consists of taking a dataset in order to visualize it, merge it to other datasets, use it in an application, modify it, correct it, comment it, etc.
  • Data science: It is a new discipline that combines elements of mathematics, statistics, computer science and data visualization. The objective is to extract information from data sources. In this sense, data science is devoted to database exploration and analysis. This discipline has recently received much attention due to the growing interest in Big Data.
  • Data visualization: Also known as “data viz”, it deals with data visualization technology, methods and tools. It can take the form of graphs, pie-charts, diagrams, mappers, timelines or even original graphic representations. Presenting data through illustrations makes it easier to read and understand.
  • data.gouv.fr: The French government’s official website for public data, which was launched on December 5th 2011 by Mission Etalab. In December 2013, data.gouv.fr was transformed deeply through a change in both the site’s structure and its philosophy. It has, without doubt, become a collaborative platform oriented towards the community, which has resulted in better reuse of public data.
  • Dataset: Structured and documented collection of data on which reusers rely.
  • Etalab: This is a project proposed in the November 2010 Riester Report and put into place in 2011 which is responsible for implementing the French government’s open data policy, as well as for establishing an almanac of French public data: data.gouv.fr.
  • Hadoop: Big Data software infrastructure that includes a storage system and a distributed processing tool.
  • Information: It consists of interpreted data and has discernible meaning. It describes and answers questions like “who?”, “what?”, “when?” and “how many?”.
  • Innovation: It is recognized as a source of growth and competitiveness. The Oslo Manual distinguishes between four types of innovation:
    • - Product innovation: Introduction of a new product. This definition includes significant improvements to technical conditions, components or materials, embedded software, user friendliness or other functional characteristics.
    • - Process innovation: Establishing a new production or distribution method, or significantly improving an existing one. This notion involves significant changes in techniques, material and/or software.
    • - Marketing innovations: Establishing a new marketing method involving significant changes in a product’s design, conditioning, placement, promotion or pricing.
    • - Organizational innovation: Establishing a new organizational method in practices, workplace organization or company public relations.
  • Interoperability: This term designates the capacity of a product or system with well-known interfaces to function in sync with other existing or future products or systems, without access or execution restrictions.
  • Knowledge: It is a type of know-how that makes it possible to transform information into instructions. Knowledge can either be obtained through transmission from those who possess it, or by extraction from experience.
  • Linked Open Data (LOD): This term designates a web-approach proposed by supporters of the “Semantic Web”, which describes all data in a way such that computers can scan it, and which links to it by describing its relationships, or by making it easier for the data to be related. Open public data is arranged in a “Semantic Web” format, such that its items have a unique identifier and datasets are linked together by those identifiers.
  • Open innovation: It is defined as increased use of information and knowledge sources external to the company, as well as the multiplication of marketing channels for intangible assets with the purpose of accelerating innovation.
  • Open knowledge foundation network: A British non-profit association that advocates for open data. It has most famously developed CKAN (open source data portal software), a powerful data management system that makes data accessible.
  • Open data: This term refers to the principle according to which public data (gathered, maintained and used by government bodies) should be made available to be accessed and reused by citizens and companies.
  • Semantic Web: This term designates a set of technologies seeking to make all web resources available, understandable and usable by software programs and agents by using a metadata system. Machines will be able to process, link and combine a certain amount of data automatically. The semantic web is a set of standards developed and promoted by W3C in order to allow the representation and manipulation of knowledge by web tools (browsers, search engines, or dedicated agents). Among the most important, we can cite:
    • - RDF: a conceptual model that makes it possible to describe any dataset in the form of a graph in order to create knowledge bases;
    • - RDF Schema: language that makes it possible to create vocabularies, a set of terms used to describe things;
    • - OWL: A language that makes it possible to create ontologies and more complex vocabularies that serve as support for logical processing (interfaces, automatic classification, etc.);
    • - SPARQL: A query language for obtaining information from RDF graphs.
  • Semi-structured information: It is worth noting that the boundary between structured information and unstructured information is rather fuzzy, and that it is not always easy to classify a given document into one category or the other. In such a case, one is no doubt dealing with semi-structured information.
  • Smart Data: The flood of data encountered by ordinary users and economic actors will bring about changes in behavior, as well as the development of new services and value creation. This data must be processed and developed in order to become “Smart Data”. Smart Data is the result of analysis and interpretation of raw data, which makes it possible to effectively draw value from it. It is, therefore, important to know how to work with the existing data in order to create value.
  • Structured information: It can be found, for example, in databases or in programming languages. It can thus be recognized by the fact that it is arranged in a way such that it can be processed automatically and efficiently by a computer, but not necessarily by a human. According to Alain Garnier, the author of the book Unstructured Information in Companies, “information is structured when it is presentable, systematic, and calculable”. Some examples include forms, bills, pay slips, text documents, etc.
  • Text mining: This is a technique that makes it possible to automate processing of large volumes of text content to extract the main tendencies and statistically assess the different subjects they deal with.
  • Tim Berners-Lee: He is the co-inventor of the Semantic Web. He is very active and engaged in data.gov.uk. In particular, he has defined a five star ranking system to measure the Semantic Web openness level for putting a dataset online.
  • Unstructured information: Unlike structured information, unstructured information constitutes the set of information for which it is impossible to find a predefined structure. It is always intended for humans, and is therefore composed mainly of text and multimedia documents, like letters, books, reports, video and image collections, patents, satellite images, service offers, resumes, calls for tenders, etc. The list is long.
  • Web 1.0: This term refers to the part of the Internet that makes it possible to access sites composed of web pages connected by hyperlinks. This Web was created at the beginning of the 1990s. It creates a relationship between an edited site that publishes content or services and Internet users who visit it and who surf from site to site.
  • Web 2.0: This term designates the set of techniques, functions and uses of the World Wide Web that have followed the original format of the Web. It concerns, in particular, interfaces that allow users with little technical training to appropriate new Web functions. Internet users can contribute to information exchanges and interact (share, exchange, etc.) in a simple manner.
  • Web 3.0: (also known as the Semantic Web). This is a network that allows machines to understand semantics, which is to say the meaning of information published online. It expands the network of Web pages understandable by humans by adding metadata that is understandable by a machine and that creates links between content and different pages, which in turns allows automatic agents to access the Web in a more intelligent manner and to carry out some tasks in the place of users.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset