4
Creating Value from Data Processing

Today, as long-term economic growth depends on the accumulation of knowledge and the ability to introduce new products, processes, services, and business and organizational models, business competitiveness is determined by a company’s ability to organize a space that is beneficial to science, technology and innovation. With the phenomenon of Big Data, data is now central to a company’s decision-making process, and is a new precious tool for directing their strategic vision.

In a changing world, data – increasingly large, rich, and varied – represents a major asset for businesses provided they can capture, combine, and analyze it. Data can be seen as an asset if it has the potential to create value for companies. These can predict possible outcomes by studying both structured data and the continually expanding mass of unstructured data from different sources, and then by sharing those results to discover any relevant correlations.

By publishing reusable data, public and private organizations are thus supporting the development of new services as well as the content within them. The availability of more and more “open data” is important. Open Data is a broad movement of opening up data, initiated in the USA during the “Freedom Act” which, like Google Maps, Amazon or eBay has revolutionized how we consume data. This openness will lead to new applications and promote innovation with new tools for the value-creation of these data.

Faced with the numerous challenges of opening up data, French and European public cultural institutions initiated some early publications using the latest models in “web semantics” or “data web”. Data is “the new global currency” and the Web is the currency exchange office. Companies have understood that the right decisions are made by the proper management of their data. The integration of Web Semantic technologies in data management systems is one of the approaches that bring new perspectives.

The processing and understanding of increasing amounts of data; and their transformation into relevant information is one of the challenges posed by the Internet and the Web. But the design of information and its display or presentation is a growing concern for web experts. A smart web is one that represents an intermediary of digital tools associated with a digital display interface. This allows for graphical representation that facilitates decision-making. This is where there is the need for “data visualization” which is optimal and suited to the processing of Big Data. “Data visualization” revolutionizes the use of data and offers a simple and efficient way to access information from the mass of data produced.

4.1. Transforming the mass of data into innovation opportunities

With the emergence of the information society from the 80s, we are witnessing the transition from the material economy to the virtual economy. The change reduces the importance of raw materials and physical presence and leads instead to increasingly immaterial products. This new “postindustrial” economy is based on intelligence, knowledge, and innovation, and is materialized through the development of these services.

This gradual transformation of society organized around production, the circulation and exchange of knowledge simultaneously affects the modes of production and consumption, sources of growth, and competitiveness, modes of organization and management of companies, the skills building process and acquisitions of new qualifications for human capital. This confirms that knowledge is the most crucial resource for any company; it is a strategic resource. By observing the knowledge pyramid we find that “data” is the important pillar of knowledge.

Data represents an infrastructure resource, an “input”, which – in theory – can be used by an unlimited amount of users to generate numerous applications. In this context, whilst the borders of economic life are evolving, the most powerful companies are most likely the ones who successfully managed to take advantage of the available amounts of data. The ability to use this data can bring value to the economic activity.

The proliferation of data created by individuals, companies, and public authorities will support new uses and gains in productivity. The scale of the Big Data phenomenon can, therefore, be seen as an economic value in itself. Companies that are highly innovative are most likely to rely on large data analysis and data mining; this was confirmed by the latest analysis of the world’s most innovative companies, published in 2014 by the Boston Consulting Group (BCG).

Table 4.1. The 50 most innovative companies in 2014

Source: 2014, BCG Global Innovator Survey

From 1–10 From 11–20 From 21–30 From 31–40 From 41–50
1 Apple 11 Hewlett-Packard 21 Volkswagen 31 Proctor & Gamble 41 Fast Retailing
2 Google 12 General Electric 22 3m 32 Fiat 42 Walmart
3 Samsung 13 Intel 23 Lenovo Group 33 Airbus 43 Tata Consultancy Services
4 Microsoft 14 Cisco Systems 24 Nike 34 Boeing 44 Nestlé
5 IBM 15 Siemens 25 Daimler 35 Xiaomi Technology 45 Bayer
6 Amazon 16 Coca-Cola 26 General Motors 36 Yahoo 46 Starbucks
7 Tesla Motors 17 LG 27 Shell 37 Hitachi 47 Tencent Holdings
8 Toyota Motors 18 BMW 28 Audi 38 McDonald’s 48 BASF
9 Facebook 19 Ford Motor 29 Philips 39 Oracle 49 Unilever
10 Sony 20 Dell 30 Soft Bank 40 Salesforce.com 50 Huawei Technologies

The success of large companies such as Amazon, Google, Facebook, etc. proves the emergence of a fourth factor in the development of today’s hyper-connected world. Aside from raw materials, labor and capital, data has undoubtedly become an essential element in gaining a competitive advantage. But their production into mass online data and in different digital forms indicates a new era of Big Data, just like the new technologies which enable their analysis.

For network companies in particular, the use of data allows for the unprecedented optimization of business (ability to prevent network failures, service interruptions, cancellations of subscriptions, etc.) the development of “smart services” and the creation of value from data from third parties. On this last point, as an example, the three largest telecommunications operators in the United Kingdom have formed a joint company called “Weve” to sell anonymous data from their clients (purchase data, geolocalisation data, internet data etc.) to advertisers.

image

Figure 4.1. Companies actively targeting Big Data in their innovation programs (over the next 3 – 5 years).

Source: BCG (2014)

In the same light and in the medical field, a study by the McKinsey Global Institute suggests that if the US health care sector knew how to process the vast amounts of electronic data that it has access to (from clinical trials and experiments), then a better management of this data would improve the system’s efficiency and save them more than $300 billion annually.

According to BCG, 53% of companies confirm that Big Data will impact their ability to innovate over the next 3 to 5 years. The graph below shows companies actively targeting Big Data in their innovation programs.

BCG has also found that business leaders who process large amounts of data generate an income of over 12% more than other companies that neither experiment nor gain value from Big Data. Companies analyzing and processing mass data (internal/external, structured/unstructured) are much more likely to be innovative. In order for businesses to take advantage from the value created by Big Data, they must:

  • – examine data: since the company possesses a large amount of data from various sources, it must understand the relationship between them. At this stage the key attributes in the analysis must be sufficiently standardized to generate valid results;
  • – process the data: once data has been well documented and is of a sufficiency quantity, what remains is to draw out the key information. This requires comparing them in order to obtain results that will guide the company in their strategy. At this stage, the company must analyze data using a system with processing power that allows for extracting value;
  • – transform data: after having processed data, that company must use a tool that is capable of handling different data sources, including unstructured data, and to transform them into useful and readable information.
image

Figure 4.2 Massive data processing and results visualization

Big Data allows for recovering data hitherto unknown but useful for inspiring new ideas that can lead to new discoveries and thus fuel the cycle of research an innovation. The applications of Big Data are numerous and are a main factor in strengthening the capacity for innovation within companies by playing the two dynamics of exploration and processing. Innovation will certainly stem from combinations and processes that were not originally thought of.

For example, the correlation that Google identified between a handful of search terms and the flu is the result of 450 million mathematical models. In the same spirit, the United Nations has developed a program anticipating epidemics and economic downturns through keywords used on Twitter.

The Danish company “Vestas Wind Systems”, one of the most influential wind turbine manufacturers in the world, uses “IBM Big Data Analytics” and “IBM Systems” solutions to decide the location of wind turbines within a few hours by crossing varied data such as weather data, satellite images, etc.

The exploration of large amounts of data enables the launch of new products and services, new processes, and even new business models. By making data speak each company will have access to a better understanding of the context and its environment.

In the automotive sector, cars are increasingly equipped with sensors and software that allow them to analyze their environment and act accordingly. The car can be customized by integrating and using data; and becoming more connected, even without a driver.

image

Example 4.1. The Google car

The collection and use of data on connected products or on consumer behavior can improve the operational functioning of a company and forecast any market developments. “Lokad”, a French start-up in software publishing, has developed an algorithm that helps retailers to optimize their daily inventories by analyzing receipts, and sales history.

In terms of “Urban Data”, there are as many cities as there are models. Many large-scale urban innovation projects have made making “smart” infrastructures and integrating sensors and increasing network capacity their main purpose. The Big Data technologies can help cities meet their most pressing challenges.

image

Example 4.2. Smart City - Montpellier

The “smart city” is equally deployed thanks to Open Data and the implementation of an ecosystem of programmers and mobile application developers.

image

Example 4.3. An application on a transport company

Open Data can promote innovation processes, open firstly between businesses and public enterprises which hold usable data, and secondly start-ups or SME suppliers of innovative technologies. This represents an opportunity for radically new solutions for all stakeholders.

Big Data is therefore part of “open innovation” approach that encourages organizations to collaborate with other actors outside of the traditional company boundaries, on either strategic or unexpected topics, in what is deemed “open innovation outside-in”. It also requires reflecting on the value creation from projects and produced data, in which the company has no plans to capitalize directly into its core activity of “open innovation inside-out”.

Beyond the huge volume, it is the diversity of data sources that gives Big Data its full scope. But the development of new products and services and their adaptation to new uses are facilitated by the mixing of large data sets. Big Data marks a major turning point in the use of data and is a powerful vehicle for growth and profitability. A comprehensive understanding of a company’s data, its potential and the processing methods can be a new vector for performance.

All companies are overwhelmed with data but do not often know how to use them appropriately. For data to lead to innovative opportunities it is necessary to have significant computing and storage capabilities. Storing data online makes them available without temporal or spatial constraints, which is the first technological process that is essential to Big Data processing. Another essential factor, which also allows explain data, is the power of formulas and calculations. Data stream and analysis processing tools develop their power every day (such as Cloud).

The Big Data revolution that extends the potential of “Machine Learning” is an opportunity for change. Big Data allows for an understanding of the heterogeneity of data sets, and adjusting the analysis based on data in real time to enable a more appropriate response. The disruptive nature of the innovation introduced by the dissemination of Big Data into an increasingly broad range of industrial fields opens up important opportunities for creating value and differentiation. Sectors such as insurance, retail, automotive manufacturing or even energy have experienced significant changes in their value chain through the influence of Big Data. There are several values (monetary value, usage value, reusage value, etc.) and depends on those who use data.

4.2. Creation of value and analysis of open databases

The OECD said in a 2013 report [OEC 13] that the convergence of technological and socioeconomic trends related to the development of internet usage and the decreasing management costs of dematerialized data leads to the creation of a large amount of data that can be used to create new products, new wealth, and new value to business. The OECD identifies five sectors driving value creation:

  • – improved research and development;
  • – the creation of new products based on data;
  • – optimization of the manufacturing process;
  • – optimization of targeted marketing;
  • – improved managerial approaches.

It therefore seems evident that data must be considered as an intangible but essential asset to the company. The increasing production of data, generated by the development of information and communications technology (ICT), requires an increased openness as well as a useful division which can be harnessed into changing the economic and social world. The birth of Open Data did not come from chance; rather it is completely linked to the Internet and its mutations. The development of digital technology and the potential uses for data calls for the acceleration of this movement; more data must be shared freely, in open formats, with freedom for reuse.

The opening up of data, or the phenomenon of “Big Data”, has spread throughout the world despite being new thanks to its ability to generate both an economic and social value. For this, various state actors are actively involved in this development by allowing access to and reuse of their data. Open Data signifies that the public sector relinquishes its role as guardian of data and replaces it with a new role, of provider of information and data, leading to a realignment of power dynamics between the public and private sectors.

This phenomenon has attracted considerable attention in recent years; one of the central principles of the open data initiative is to make public data available in order to promote innovation from these data. The question that arises is: how can we create value or profit from simply reusing or transforming these open data?

Data deposits are a raw material that revolves around a cycle consisting of suppliers that emit data to consumers, who – through a transformation process – will produce information and therefore value. But to generate value, these data must be available to everyone. In 2007, the working group “Open Government Data” defined the eight principles of data accessibility; data is considered open if it is:

  • – complete;
  • – primary;
  • – appropriate;
  • – accessible;
  • – usable;
  • – non-discriminatory;
  • – not privately owned;
  • – free of royalties.

To advance Open Data we must recognize this variety of conditions for opening data. We, therefore, potentially have very large volumes of data concerned, with very diverse characteristics that can be mixed with other types of data – from social networks for example – to enhance applications or make new ones. This is possible through data analysis and processing. The relevant data mixing is the basis for all applications based on Open Data.

image

Example 4.4. “OpenKnowledge Foundation”

In a similar vein and in order to best use Open Data, the Open Data portal data.gouv.fr in France provides more than 350,000 data sets through the “Etalab Mission Open License”. The released data are very varied and can generate interesting applications.

image

Example 4.5. A route calculation service for people with reduced mobility

In March 2010, the city of Rennes made some of its data publicly available on the platform data.rennes-metropole.fr. From this data stemmed 60 services on fixed and mobile Internet providers on themes as varied as transport, accessibility, environment, and culture.

image

Example 4.6. The SNCF and the reuse of data

image

Example 4.7. Orange and the site “Where do you really live?”

The term Open Data refers to public or private data that can be reused. The concept of opening up data refers to the “general public” dimension, this is to say that all people can consult a dataset in a clear and simple manner. This is free data that generates profit through reuse.

However we must first define a theoretical framework that explains the impact of data availability. This is what G. Sawyer and M. de Vries presented to the “European Space Agency” based on empirical modeling [SAW 12]. This modeling identifies three phases during which the opening of data can vary:

  • – a “sowing phase”: opening data is a cost to the administration;
  • – a “growing phase”: the number of re-users grows, public services gain efficiency and the reuses begin to generate profits for the company;
  • – a “harvesting phase”: the effects of data opening are felt on employment and public finance activities.
image

Figure 4.3. The 3 phases of opening up data.

Source: [SAW 12] for the European Space Agency. For a color version of the figure, see www.iste.co.uk/monino/data.zip

The reuse of data will help to enrich these data and mix it with other existing data, and thus, offer new products and services. Open Data allows for creating a partner ecosystem. Opening this data is a big step towards co-production of new activities for companies. Data becomes information, and then knowledge, which can be a source for innovation by mixing it with other data.

As such the challenge for companies is to manage data value chains. Open Data has thus quickly become a growing sector, attracting many companies and startups whose mission is to process, sort and analyze data to give it meaning and render it usable. The new connected objects from the Web 3.0, the “quantified self” (watches, bracelets, smart cars, etc.) will produce data to be shared with those who wish to enrich it or aggregate it. Combined with different technologies, these data are also an important vector for innovation, enabling the creation of new services.

Initiating Open Data is a selection of raw data making easily available and reusable: public data, scientists, citizens, rising companies, etc. Improving the analysis, products, and services are the main aims of companies who use and reuse open data. The decision to initiate an Open Data approach must be taken with regards to the potential benefits it can bring to a company, working closely with the objectives of its digital, and innovation strategies.

Current initiatives for opening up data, such as data.gouv.fr, represent an important step in the implementation of the web of data. But, there is still much to be done before reaching the famous “five stars” defined by Tim Berners-Lee: structured, identified data in a non-proprietal format, and linked together semantically.

4.3. Value creation of business assets in web data

Data, structured or unstructured, which has grown exponentially in volume and variety within companies holds the most promising value when it interacts with the gigantic mass of data from the “Web”. Additionally, if we add the phenomenon of “Big Data” which we have already have spoken about, to the phenomenon of “Open Data” – which makes data accessible – and even “Linked Data” – which allows for connected data from different sources – then the possibilities are even larger. The Web promises to be a goldmine for companies.

The Web has become the preferred location for data research, and it has put many types of data online that are potentially useful for a company’s monitoring system. The Web has revolutionized the way of capturing, accessing, and integrating data in order to use it. Indeed, a large amount of data available online contains potentially useful information for decision-making. The growing volume and variety of information poses real obstacles to effectively accessing this goldmine.

When we use a search engine it does not understand our request. It finds the pages which feature out keywords but does not guarantee a real response. Various data research and collection techniques on the Web have been proposed to build Tools that refine the search for relevant results. There is a need to build a space for exchange of data between machines, allowing access to large volumes of data, and providing the means to manage it.

In order to understand the concept and value of data, it is important to consider the existing data exchange and reuse mechanisms on the Web. We therefore need to reach a “smart” web, where data is no longer stored but “understood” by computers to provide users with what they are really seeking. The result is a global data space that we call the “Web of Data” or “Semantic Web” which Tim Berners-Lee started using between 1991-01. The concept was again taken up and developed into the “World Wide Web Consortium (W3C)”.

The semantic Web provides a new platform for a more intelligent management of content through its ability to manipulate resources based on their semantics. A machine can understand the amount of data available on the Web and thus provide a more consistent service, so long as we empower it with certain “intelligence”. In fact, the integration of the Semantic Web is not a new idea but is in fact born with the Web [FAL 01, MOT 01]. The Semantic Web is the starting point for the development of smart web services.

The vision of the Semantic Web proposed by Tim Berners-Lee [BER 01] can be summarized as follows: the development of the Web was first made possible by the development of standards such as TCP/IP (transmission and routing bits across a network) or http:// and HTML (transport and rendering of Web pages and hyperlink texts). The first generation of the Internet consisted essentially of handwritten pages. The second generation offered automatically generated content.

On the basis that these two architectures are dedicated solely to human–human or human–machine interactions, the Semantic Web initiators defend the idea that the next Web generation should promote access to resources so that they are automatically processed by software agents, in other words: machines. In order to offer the automatic processing capacity, the Semantic Web adds to the data a layer of metadata and makes them usable by computers. These metadata provide unambiguous semantics to automize the processing.

The architecture of the Semantic Web is based on a pyramid of languages proposed by Tim Berners-Lee to represent knowledge on the web which meets the criteria of standardization, interoperability, and flexibility. The following points introduce the main function of each layer in the architecture of the Semantic Web:

  • – XML: it is now considered a standard for the transmission of data on the web;
  • – Resource Description Framework also known as the RDF layer: this represents the metadata for Web resources;
  • – the ontology layer: it is based on a common formalization, and specifies the semantics of metadata provided in the Semantic Web;
  • – the logiclayer: it is based on inference rules that allows for intelligent reasoning performed by software agents.

The origins of this Web are based in the research efforts by the community of Semantic Web researchers and especially on the project Linking Open Data2 (LOD) from W3C, started in January 2007. The original objective of this project, which generated an active and still expanding community, was to start the data Web by identifying sets of existing and accessible data under open licenses and to convert them into RDF in line with the principles of connected data and to publish them on the Web [HEA 07, Chapter 3].

In the universe of the Semantic Web, it is not enough simply to put data on the web, it is especially important to create links between them so that this web of data can be explored. Linked Data is a subproject of the Semantic Web, which helps find related data. For Tim Berners-Lee, related data is a very important change, for which he sets out four basic principles:

  • – use of URIs, “Uniform Resource Identifier” instead of a URL to identify what we want to make available on the Web as data resources;
  • – use of HTTP addresses to locate them and access their content;
  • – provide useful information about the resource (URI) when consulted;
  • – include links to other Uris related to the resources used in order to improve the discovery of useful information.

Data binding allows for the browsing and sharing of information between different sites. Connecting a database to the Web is establishing a link between a web server and a database management system. This link leads to processing techniques that are associated with the computing world and bound to the “client/server” framework, putting at risk web oriented programming languages, communication protocols, data manipulation languages etc. Large amounts of structured data have been published, notably in the framework of the “Linking Open Data” project3.

The interest of linked data is to identify and typify the relationships between entities, using web standards that are favorable to be read by machines and that reveal interconnections that would be impossible to produce through solely human intervention. Linked data refers to a set of best practices to implement to publish and link structured data on the Web. Linked data are a set of design principles for the sharing of machine-readable data for use by public administrations, businesses and citizens [ECI 12].

The concept of linked data corresponds to a more technological vision of data related to the Semantic Web. It consists of a collection of data sets published using Semantic Web languages and links between data sets, allowing for multiple queries on data sets from one individual search. Many tools are beginning to appear for data storage, for visualizing data in varied ways, offering visualization interfaces with maps.

image

Figure 4.4. Evolution of Linked Open Data. For a color version of the figure, see www.iste.co.uk/monino/data.zip

The developments of semantic technologies and the evolution towards the Web of data have led to changes in the representation of terminologies and formats of metadata, and to new opportunities for machines to reason on the basis of terminologies. These metadata databases are increasingly large and can be used more remotely and directly via web services. Technologies of the web of data demonstrate the immense prospect of pooling, sharing and utilization of metadata on the web.

Some metadata are already implemented. The algorithms “Page Rank” and “Trust Rank” by Google use metadata concerning web structure. In addition, the search engine “Bing” from Microsoft is beginning to interpret and contextualize requests in natural language through the use of the technology “Powerset”. Facebook has also introduced the “Open Graph Protocol”, a technology based on the Semantic Web which allows third party sites to interact with the social Networking site by the receiving and sending of information to Facebook. This new protocol is based on the RDF syntax and aims to “sustain” the social interactions between visited sites and the Facebook profile of an Internet user4.

The website “e-retailer Best Buy” was one of the first to enhance its search capabilities with the opportunities presented by the Semantic Web, via the integration of the RDX syntax with officials’ blogs representing each of its stores. The aim is to make the essential data of its stores more visible (contacts, opening hours, product pricing) by enabling search engines to better display them.

In companies, the applications that stem from the Semantic Web are currently mainly represented through software research Tools and data structuring. These include “Exalead” and its “Cloud View” Platform which collects, analyzes and organizes large volumes of data (e-mails, social networks, blogs, etc.) whether from fixed terminals, mobile systems or the web. Its objective is to enable companies to scale their information needs to their specific issues and areas of activity.

It is worth noting that the Semantic Web is an approach to the management of corporate knowledge in which data is given a clear definition to allow computers and users to work in cooperation. The Semantic Web is not only used in intelligent searches of information, but is also studied in research on knowledge engineering, knowledge representation systems, automatic processing of natural language, learning, intelligent agents, automated reasoning, etc.

image

Figure 4.5. Web of the future

Source: N. Spivack, “The Future of the Net”, 2004, available at http://novaspivack.typepad.com/nova_spivacks_weblog/2004/04/new_version_of_.html

The expressions “Semantic Web” or “Web of Data” refer to a vision of the future Web as a vast space of exchange of resources supporting the collaboration between humans and machines, in order to use large volumes of data and the various available services on the web more efficiently. This new web generation reflects the two-pronged dream of its creator Tim Berners-Lee. On one side the web becomes a very powerful means of cooperation between human beings, and on the other side it is also a means of cooperation for “machines” (computers)5.

4.4. Transformation of data into information or “DataViz”

Nowadays the different uses of the Internet, social networks, Smartphones, and other connected objects result in the production of billions of data that can be stored, analyzed and used. The advent of High performance processing technologies can enable a systematic analysis in real time of this enormous potential of data. This is however increasingly complex to interpret and it becomes increasingly pertinent for companies to invest in this approach which generate value and income.

First a team of “Data Scientists” must identify the right data sets, have access to these data sets, develop a mathematical model to address this, and finally start generating ideas. Once this work is completed, the team delivers the results to “top management” in a report or spreadsheet, which can be hard to understand. In fact, the results are not just the numbers of the report they are also the decisions and hypotheses that were issues throughout the previous work.

These results require a debate in which, currently, many top managers would not be able to participate because they do not have access to the necessary information in an understandable format. Therefore, leads can carry on strategic decisions for the company without even realizing what they may have missed. This is why Big Data must be accompanied by a visualization of the processed results to better succeed.

In addition to the collection, processing, and analysis phases, which represent a cycle of new information and which must be accompanied by Big Data to most effectively create value from data deposits, what other phase is the most important? The answer to this question lies in the “visualization of data” which is the most transparent, intuitive, and contextual manner in which to use data beyond simple numbers. The need for data integration for the success of Big Data analytics, is therefore, undoubtedly the answer.

Companies have an incentive to move towards analytical solutions and must complete the data value chain from the capturing of data to its presentation and dissemination. Data visualization provides a common language for executive directors, data scientists and top managers, and thus, allows them to have a conversation on data. These three groups usually have “different business languages”, and data visualization will replace these differences with a visual dialogue that everyone will understand.

William Playfair confirms this idea by noting that: “instead of calculations and proportions, the surest way to hit to touch the spirit is to speak to the eyes”. Visual data analysis, is thus, based on three essential elements:

  • – data visualization;
  • – data analysis;
  • – data management.

“Data visualization” can be used both as an exploratory model to find patterns and extract knowledge from processed data, and as an explanatory model to clarify and illuminate relationships between data. Through the visualization of data, companies can take advantage of the real value of Big Data by accelerating the understanding of data and allow leaders to take quick and decisive action on business opportunities.

As the volume and variety of data increases, the visualization of data becomes increasingly important to stimulate a collaborative dialogue. When faced with an enormous amount of data, visual grouping can bring together points of measurement that can help decision makers understand the relationships between data, and thus, make effective decisions.

Information presented in a visual format is also more likely to grab people’s attention. As reported in Bistro media, people Retweet infographics much more than articles or images6. If a company must reply to a question, or has an article to deliver, a well designed data visual can make the message more compelling and also boost their ideas to a much wider audience.

Data visualization must be regarded as the result of a carefully considered process, which understands the capture of data quality to allow for good analysis results, and the effective communication of these results throughout the process. In order to process Big Data more efficiently we must have visualization functions at every stage of the analytical process, namely:

  • – collection and preparation of data stage: the combination of various data sources is a key step in the data analysis process. A company must be able to visualize this process with a tool that allows for verifying that the data assembly mode accurately reflects the significance of data sources. The more a company’s data source is varied, the more it needs visualization to know precisely what data it has available and to help it understand how they can help solve the initial problem;
  • – modeling stage: visualization is extremely important in modeling, notably because in most cases the model must be adjusted according to the different issues;
  • – deployment stage: many tool sets only allow for visualization during the deployment stage. Visualization here plays a crucial role: the analysis is embedded in an application where it generates value for a variety of users and provides them with the information necessary for their work.

Data visualization is a technique for exploring and analyzing data. It is an excellent solution to address the growing complexity of available data. Data visualization is a tool to visually represent data, in an illustrated form, that makes it more readable and understandable. As such, there are three broad categories of data visualization7:

  • – visualization of “fixed” data: static infographics, representing the targeted data (illustrations, typography effects, or photographs, “photoviz”). This category has the advantage of being completely adapted to the context of social network use, including those with high potential to go viral such as Facebook, Twitter, Google Plus, FlickR, etc.;
  • – visualization of “animated” data: screenwriting and storytelling can rely on videos, thus, diverting the animation codes to servicing the information;
  • – visualization of “interactive” data: possibility for the user to “play” with these and help making decisions.

If the aim is to make a decision, we are more likely to use an interactive “data visualization” which enables the display of large amounts of information with specific details on the numbers or nuances. The explosion of computing power currently allows for undertaking more pieces of information. To extract important information from such volumes of data, companies must have visualization tools in order to interactively explore and analyze data flows. So, data visualization cannot be processed without addressing the interaction.

Visualization aims to produce actionable visual representations for analyzing and understanding complex data. But interactivity is a key point of visualization to better understand what is observed. This interactivity is all the more important when the volume of data to explore is significant. An increase in the volume of data to process and display, however, constitutes a natural halt to this interactivity and demands much more consistent calculation efforts. The amount of data that must be visualized is a direct result of the exponential growth of available computing power.

Once the analysis is finished, the results must be clearly presented. This is to enable decision-making by removing the complex part and being as clear as possible about the issues and challenges. The many calculated fields represent the considerable volumes of data that must first be stored then processed through an interactive visual analysis. Developments in IT have enabled a large capacity for data collection and generation but also enormous power over visualization techniques. Knowledge about visual perceptions in humans can help provide practical solutions to some identified problems. Humans have the ability to visualize highly developed information and which can play a major role in their cognitive processes.

image

Example 4.8. Clouds of texts or words

The increase in computing power and of the quality of GUIs (graphical user interface) has allowed for exploring and developing interactive techniques that make best use of the available data. The interaction in real time enables easy navigation within the data area and thus a better understanding of the structure and characteristics of this space; this is what characterizes the exploratory phase and helps improve our knowledge. Thereafter the user can formulate more efficient queries in the search system. It is at the level of interaction that we can improve the collaboration of visualization techniques, and data research and processing techniques.

image

Example 4.9. Three-dimensional visualization. For a color version of the figure, see www.iste.co.uk/monino/data.zip

image

Example 4.10. Bipartite e-graph

Interactivity is a key point of visualization. To better understand what is being observed, the company must be able to quickly change its view, in other words direct its course of action, and access as quickly as possible clear representations of the fields to be analyzed. Data visualization, in light of the amount of data to be displayed, is thus an important analytical tool. It is becoming increasingly clear that this is an essential aspect of the effective communication of results.

4.5. Conclusion

The considerable increase in the volume and diversity of digital data generated, coupled with Big Data technologies, offer significant opportunities for value creation. This value cannot be reduced to simply what we can solve or improve, but rather it knows what the new potential discoveries are that may arise from cross-exchanges and correlations. To create value from these huge sources, we need algorithms, for which you need access to some outstanding talent. Thus, “You have to create intelligence from the bytes that are circulating” (G. Recourcé, Scientific Director of: EVERCONTACT.).

Every day huge amounts of data circulate around the Internet. We constantly produce them ourselves each time we make a connection or transaction. This data has a value, and this value increases as processing and visualization tools are developed, which allows for exploring knowledge from the correct reading of these flows. Companies that manage to correctly read the available data receive pertinent information and all the elements necessary for innovating.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset