2
Open Data: A New Challenge

The world of data is becoming more competitive every day, as reflected in terms of volume, variety and value. This is why we now speak about Big Data. Data is a key asset for value creation, as well as an element that favors and promotes innovation, growth and development. With the digital revolution, data has taken on a central role in the economy. However, attaining the full potential of data depends on the way in which it is presented. It must be used and reused in different ways without diminishing its value.

This means making data available in the right form and at the right time to any party seeking to exploit it and add value to it. Open Data adds a new dimension to the analysis of data warehouse, and gives way to new forms of innovation. Sharing and opening up data means making essential data available online so that it can enhance many decision-makers’ analysis; it also means making it possible for people to save time, or for them to make more informed decisions in all sorts of sectors. It therefore means creating large sets of reference data shared by all actors and that encourage the development of several high added value services.

2.1. Why Open Data?

Open Data is private or public digital data. It is produced by collective bodies or (possibly outsourced) public services. It is disseminated in a manner structured according to a given method and with an open license that guarantees free access to it, as well as the possibility for anyone to reuse it without technical, legal or financial restrictions. Open Data is comprised of a number of sources and types of data:

  • – public data or information coming from the public sector. This includes all data collected by public organisms at all levels;
  • – data from scientific research, in particular, from publicly funded research;
  • – private sector data, which can be made public with the right incentives and privacy protections.

Open Data therefore means making this data available for access, exploitation and reuse by any interested actor (companies, scientists, etc.).

The philosophy behind this movement is, at its foundations, truly centered on the citizen. In this sense, free access to data contributes to an enhancement of democratic institutions from a citizen’s point of view. This event must help to enrich democratic debate, stimulate public life, and contribute to renewing public services. Open Data is a political process whose message is built around transparency in innovation and in the development of public action.

It is worth noting that Open Data has been used since the 1970s, mostly through statistical data processing and modeling that made data freely available and that made it possible to communicate and transfer it [BUC 98]. It is possible to summarize the history of Open Data development in the following timeline.

After several years, France’s Open Data and data sharing strategy was materialized in the online platform https://www.data.gouv.fr in 2011. On 27 January 2011, Paris became one of the first cities to evaluate the question of Open Data by launching “Paris Data”, an online platform making it possible to disseminate public data and maps. Rennes has also played an important role by being the first to genuinely work in favor of the idea of making Open Data a vector for public service improvement: by developing public data, uses and services in order to be able to improve civil society through transparency in public data.

image

Figure 2.1. Open Data: history

image

Figure 2.2. Open Data platform growth in France.

Source: [COU 14]

Open Data also means the public sector gives up its role as the gatekeeper of data and replaces it instead with the role of a data and information provider, which leads to a realignment of power dynamics among the private and public sectors. This opening will make it possible to establish relationships between cities, companies and citizens around data by encouraging transparency, the development of new applications, and individual participation in data enrichment.

The definition of Open Data involves the following, more specific, elements:

  • availability and access: data must be available as a whole, and it must be accessible in a way that is comfortable and modifiable;
  • reuse and redistribution: data must be made available in ways that make its reuse and redistribution possible, including the possibility of combining it with other sets of data;
  • universal participation: everyone must be able to use, reuse, and disseminate data.

Data compared, analyzed and understood also goes through an incremental cycle that stretches from creation to use [MER 07]. This means that the value of data is not inherent to it, but rather that it is a product of its aggregation, cross-referencing, analysis, and reuse. The Internet and smart objects have therefore constituted a data “universe” that plays an important role in the value creation process. The importance of this role has developed around the fluidity of online exchanges. Data often has more value as an exchange and influence mechanism than it does isolated in company data storage1.

The emergence of (public or private) open data reuse, which has been encouraged by new applications and uses, has therefore revealed a value chain based on this data. As the raw data is made available for re-exploitation, it becomes possible to create new services. Companies like Amazon and Netflix have, as previously mentioned, taken advantage of the great number of users on their sites by developing consumer preference models, which in turns, allow them to make highly personalized purchase suggestions based on their clients’ tastes.

Open Data coming from both, public and private sources, implies greater and better quality access to data. Quality criteria can be thought in terms of comprehensiveness, coherence, and precision. Additionally, we may consider temporal and territorial variables, as well as that of interoperability through open formats. These principles are based on important values: accessibility, autonomy, sharing and freedom [MER 14].

Tim Berners-Lee observes on the subject of Open Data that “If we share data online – public data, scientific data, citizens’ data, whatever – then other people will be able to develop marvelous creations from that data that we could never even have imagined”. In 2010, the same author provided a set of criteria to grade data quality on a scale from zero to five stars.

We can say from this table that Open Data must have three characteristics:

  • technical: raw data must be exploitable in an automatic manner, and made available in open-source formats as much as possible;
  • legal: licenses must clarify the rights and obligations of the owners and of people who reuse data. They must be as open as possible;
  • economic: few or no royalties (since they may constitute obstacles for reuse), marginal cost pricing.

Table 2.1. Open data in five stages.

Source: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example

Stage Description Benefits
image Making data available online in any format without licensing restrictions. Users can see, print, and store the data, as well as manually select it on a system.
image Publishing the data on a structured format (e.g. an excel file instead of a scanned format) Data can be processed automatically, visualized, and transformed into other formats.
image Using an open source format (e.g. CSV instead of Excel). Data can be manipulated independently of its format and of any given software.
image Using URIs to identify elements in order to click on them. Data can be linked, marked, and reused.
image Linking data to other’s data in order to contextualize it. Data patterns can be automatically identified and it is possible to dynamically uncover complementary data related to the original data.

Data increases gradually with one of the elements structuring the digital world in the present hour. In order for data to generate knowledge, it must be available for all to use online. It must be available to all those users who can combine and mix it in order to create value. In order to benefit fully from Open Data, it is necessary to place data within a context that creates new knowledge and offers powerful services and applications.

2.2. A universe of open and reusable data

Today within the context of public sector transparency and modernization efforts, Open Data, alongside Big Data, are quickly evolving. The more open data becomes, the more it can be used, reused, reoriented, and combined with other data in the interest of all. Insofar as our economy and our society are structured around knowledge, data is a key asset for value creation, as well as an element that favors and promotes innovation, growth and development.

Transparency, participation and collaboration are the main challenges for the integration of different economic actors within the paradigm of Open Data. The Open Data reuse process involves two types of actors: producers and reusers:

  • Public actors: regional and local governments, as well as administrative public bodies, produce and receive a considerable amount of data.
  • Companies: because of their nature, companies use data in a variety of internal processes aimed at achieving their goals and strategies. Companies, thereby, produce data (as per reports) that they can relate to other data coming from their external environment (competitors, clients, providers, web, etc.).
  • Scientific research: scientists are by nature data collectors and producers. The scientific world is increasingly governed by the imperative to publish: scientists’ recognition is tied to the results they publish and make available to be evaluated by the scientific community.
  • Individuals: each individual can collect data and enrich it with smart tools (smartphones, tablets, GPS, etc.).

Data reuse is, of course, geared for the most established economic actors, especially large companies in the IT sector, which can use public data to improve their production processes. Since it contains an important innovative potential, public data is an especially important ingredient in startup development for young innovative companies in the digital economy.

image

Example 2.1. Data journalism

If companies whose goal it is to exploit data made available by the government begin to emerge, companies, scientists and even individuals might follow suit. At the time being, few companies have themselves embraced the Open Data movement by making their own data freely available. Nevertheless, they produce data in their daily activities, and they often consume external data which allows them to analyze the economy or interact with their partners. For them, the question of Open Data depends more on commercial or innovation opportunities than on transparency concerns, even though that would also allow them to improve their image.

However, Open Data goes beyond the scope of public and scientific data. A recent study by the McKinsey Global Institute suggested that opening up corporate and government data would have the potential to unlock approximately $3.2 trillion in the value added per year in seven different sectors of the world economy. With the high-speed development of mobile Internet, technology now makes it possible to process large amounts of data.

Open Data can generate several advantages for companies, including:

  • – accelerating product development and stimulating innovation by creating a dynamic ecosystem of partnerships while improving client satisfaction;
  • – better understanding market trends.

For example, in the health sector, France’s National Health Insurance Fund (CNAM) possesses a very large stock of data consisting of millions of its patients’ health forms, the information of which is stored by CNAM. This data is not yet available. CNAM has resisted requests for it to open up its data by appealing to medical privacy. But, in fact, medical privacy can easily be ensured by encoding patient identities, which is a technology that already exists. Doctors’ anonymity can also be guaranteed, even though it could be useful to expose doctors with abusive practices, not so much to punish them, but rather to drive them to modify their behavior, which could very well prove crucial in balancing France’s social security budget.

If data is made available to honest users [MAT 14], which is to say excluding ill-intentioned “insurance companies”, and more so “unscrupulous journalists”, health policies could be greatly improved and social security budgets much better managed.

However, the data “mine” is also fed increasingly by another very specific category of data, which is often collected without our knowledge by a variety of organisms: personal data. Our marital status, age, income bracket, tastes, frequent or sporadic purchases, the reason for those purchases – these constitute a growing part of Big Data, but a largely hidden one, which allows great profit for a select few users.

Big Data comes mainly from:

  • – the Web: newspapers, social networks, e-commerce, indexing, document, photo and video storage, linked data, etc.;
  • – the Internet and connected objects: sensor networks, call logs;
  • – science: genomics, astronomy, sub-atomic physics;
  • – commercial data (e.g. transaction histories in a supermarket chain);
  • – personal data (e.g. medical files);
  • – public (open) data2.

Information management has become a crucial factor for companies’ success. Efficiently processing (public and private) data in order to transform it into information is the key. Using a combination of methods and technologies, companies can attain and benefit from a competitive advantage.

It is true that the information (which is originally just data) becomes a strategic asset and a form of power for those who possess it. However, it must be understood as a performance element within companies, since its value is generated through sharing and exchange. Open Data tends to support this ideal of power based on possession, but it draws closer to an ideal based on sharing.

The more open data becomes, the more it can be used, reused, reoriented and combined with other data in the interest of all. Different business models can be imagined for best exploiting Open Data. Since the exploitable data generates millions of dollars according to several studies, Open Data has become a planetary movement. It is a movement that seeks open access and exploitation of data for all and by all.

Sharing and opening up data means making essential data available online so that it can enhance many decision-makers’ analysis, save time or enable more informed decision-making in all sorts of sectors. Open Data should be understood as an opportunity and as a long term process for the different actors involved (administrators, companies, scientists, citizens).

Open Data represents a major source of information for all actors, which is why it should be considered a tool for rationalizing companies’ public actions. Open Data is also a way of renewing public action implementation methods, and of developing links between (and with) governments, companies and citizens, since transparency is not the only goal of Open Data.

2.3. Open Data and the Big Data universe

The Web is the largest and most dynamic reference point for data in the world. It is the ideal resource to best exploit data and transform it into information. Data on the Web is very diverse and has a huge volume (content and format). The most important asset of large volumes of data has to do with the fact that they make it possible to apply knowledge and create considerable value. Combined with advanced analysis methods, they can provide new explanations for several phenomena.

There are two ways to transform data into a valuable contribution to a company:

  • – transforming data into information is one of the stages of data value production, which is exploited in order to obtain useful information and to successfully carry out company strategies. This automatically involves database information in company decision-making processes;
  • – transforming data into products or processes adds value to companies. This is produced when data analysis must be implemented in the physical world.

The data revolution is interesting because it enables the development of competitive advantage. Big Data stands out today as a genuine ecosystem since it spans all sectors. Large companies and startups must collaborate. Interactions between experts, scientists, startups and large companies will promote the development of new energy leading to new knowledge and new projects.

The government, therefore, has a crucial role to play by making public data available and modifying regulations so as to facilitate data access and use in order to create value. Nevertheless, when governments make raw data openly available, they must ensure that everyone can understand it.

Easy access to data for all parties offers several advantages, both for those who own data and those who use it, including:

  • – rediscovering other uses and applications (new data development) thanks to the use, reuse, linking, cross-referencing and combination of data with other forms (of data) coming from different sources. These new uses can be derived well after the data has been collected and used;
  • – promoting innovation by ensuring ease of access to data, since if it is difficult to access the data, it is impossible to evaluate its full potential or to exploit and re-exploit it;
  • – gaining feedback through Open Data as a means to compliment internal analyses with ideas coming from external sources;
  • – increasing transparency, which will reinforce public control over government actions, ensures an increase in the reliability of scientific results, and results in newfound trust in companies’ credibility;
  • – benefiting from the effects of web-networks by combining several elements, since the more data becomes open, the more the whole data ecosystem gains value.

The main principle of the Open Data philosophy is free access to data, which will allow its use and reuse. Reusers will cross-reference it in order to provide new information that better responds to companies’ expectations. Open Data will make it possible to:

  • – create new products or services;
  • – gain access to aggregated and updated information;
  • – enhance companies’ images;
  • – analyze and make sense out of data;
  • – commercially exploit and develop the new information drawn from this data;
  • – obtain financial returns.

The biggest question for a company is no longer deciding if it should launch new products, but rather taking advantage of available (structured or unstructured) data and to know how to adapt different key success factors to its environment. These principles are not limited to internal data (client data, etc.). Instead, they also include all external data surrounding the company (cities, universities, etc.) that can allow it to increase the value created by a given piece of data.

Optimizing production processes, fine-tuning client knowledge, improving its reputation, rationalizing supply costs, promoting research, etc. The possibilities are endless in the data revolution. In this way, it is necessary to first identify innovation pools and new economic models in order to go out and seek improved economic performance and attract value.

The objective of an Open Data policy is to encourage creativity, stimulate innovation, and promote data reuse of both private and public data by relying on collective intelligence, as well as on scientists’ and companies’ will to create new knowledge capable of generating information.

The nature of data allows it to be used, reused, linked and recombined in several different ways without diminishing its value. Open Data supports the emergence and success of great data potential and its main effect resides in the variety of its sources. Although the volume of data increases as do processing and exploitation speeds, different sets of Open Data compete to find new ways of developing data, which come to complement existing ones.

image

Example 2.2. Open Data and governance

In 2011, France emulated the American platform data.gov, which was launched in 2009, to create its own public data dissemination system, managed by Etalab, with the aim of accelerating movement. But available data already possesses great economic value for innovative entrepreneurs. However, governments are not the only organizations to go into Open Data. Etalab’s mission is to support public data opening and sharing, in order to facilitate reuse by companies and citizens.

In 2012 and 2013, Etalab organized a series of innovative project and service creation contests, aimed at encouraging public data reuse. The initiative was known as “Dataconnexions”, and it sought to recognize the best data applications, services, and interactive visualizations reusing public data: six startups received awards. Etalab also contributes to shine a light on the best data reuses, especially by promoting them within government.

In France, public Open Data and data sharing are free services, free to be reused and available in open source formats. This is already a resource used by hundreds of startups that develop new added value services. Even if public support for innovation in Big Data is present in France thanks to the opening of public data and financial incentives for startups and SMEs, few data development strategies are implemented.

The volume of data is very large: it is necessary to gather it, store it, index it to make it accessible, and edit it for others to access. Companies manage data deposits and free access to that data gives them the capacity to fine-tune it, interpret it, find tendencies in it, and identify their characteristics. Data exploration is, therefore, a key stake for companies: knowing how to process data to obtain better performance.

The data revolution will necessarily go through Big and Open Data. It is, therefore, essential to take an interest in these subjects and in their dissemination. The two concepts can transform the business world, the government sphere and civil society. Big Data gives us the power to understand, analyze and ultimately change the world in which we live. Open Data hopes for that power to be shared and for the world in which we live becomes more democratic.

Big Data has to do with methods of gathering and processing a very large volume of data in real-time; it holds enormous wealth. Open Data represents a mode of access to information. Open Data is also a basic trend in data access: unlike most digital information, it is made available to the public or professionals free of cost. Open public data makes it possible to respond to an economic, scientific, environmental or social demand for innovation.

In this regard, the foregoing reflections surrounding Open Data account for two needs on the part of data reusers:

  • – more raw data updated in real-time, and, of course, processed;
  • – contextualizing documents that make it possible to understand how and why a given set of data has been constructed.

Open Data represents a major source of information for all actors, which is why it should be considered a tool for rationalizing companies and public actions. It is also a way of renewing public action implementation methods, and of developing links between (and with) governments, companies and citizens, since transparency is not the only goal of Open Data.

Big Data and Open Data are closely related, but they are not identical to one another. Open Data provides a perspective that can make Big Data more and more democratic. Big Data is defined by its size. It is a term used to describe very large sets of data. But these are subjective judgments which depend on technology: the volumes of data available today might not seem quite as large in a few years, when data and IT analysis evolve.

Creating value at the different stages of the data value chain will be at the core of the coming together of Big Data and Open Data. Big Data and Open Data represent information deposits that have yet been underexploited.

2.4. Data development and reuse

The Web is entering a new phase of its existence, and one of the properties that distinguishes it is the quantitative and qualitative jump of the data available in it. Sources of data become more diversified: government agencies, companies and individuals. In the near future, objects will publish, share and circulate more and more data online. For the last few years, several voices have spoken out in favor of a freer flow of data.

Speed and ease of access to data are, therefore, crucial in a world where the quality of available information increases constantly. We can distinguish three main categories of online data according to the source of the data. We propose the following definitions:

  • – raw data produced by public entities (demographic statistics, etc.): it is data coming from government agencies, local governments, and national statistic institutes;
  • – raw data produced by private or public companies (catalogues, directories, reports, etc.): it can be sold or made available free of charge (e.g. Amazon’s book database);
  • – raw data produced by individuals (age, comments, etc.): it is personal data as such and is protected by privacy laws and regulations. However, some data produced by users belongs to the service where it is hosted, as is stated in the terms and conditions.

The relationship between different types of (statistical, scientific, administrative, geographic, and Web) data makes it possible to create ecosystems by integrating a large volumes of data coming from a variety of different datasets. Open data reuse plays an important role in terms of its capacity to add value.

With the digital revolution and the advent of the Internet, which improve massive data production and processing, public Open Data and data sharing become a powerful tool for:

  • – reinforcing trust among individuals due to greater transparency of public sector activities;
  • – allowing for new forms of coproduction with civil society and supporting social innovation (like the Handimap project, which made it possible due to data from the cities of Rennes and Montpellier to develop an application for calculating routes for physically disabled persons);
  • – improving administrative operations (as is evidenced by the very strong use of public data by the government itself);
  • – improving the efficiency of public action by developing new modes of organization and new work processes (like automobile accident follow-up by highway safety services, which makes it possible to improve roadworks and construction);
  • – supporting economic dynamism by creating new resources for innovation and growth.

The process of information understanding is often considered to be an ascending progression going from data, to information, to knowledge, and finally to wisdom. In the same way, Open Data is gaining importance and becoming more functional as it is gathered, structured and disseminated. Data assembled in a relevant manner thus reconstructs a world of information and structures a world of digital data.

Opening up data does not only involve improvement in responsibility and trust, but also, in the case of entrepreneurs, in innovation through application development, most often in the form of “mash-ups”, or digital products created from already existing elements. After identifying, gathering and classifying data, a processing phase ensues in which cleaned data ready to be used for analytic purposes is produced. This processing phase is important because it makes it possible to classify the data into datasets, which provides the maximum amount of information.

Available data growth in terms of quality and value is the modern response to the always growing need for information, both by individuals and organizations. More and more, governments throughout the world define and implement Open Data strategies, with the aim of capitalizing on the phenomenon’s three pillars: transparency, participation and collaboration. Open Data advocates hope that this type of change will reinforce democracy and improve the impact of government activities through increased transparency, participation and collaboration.

They thus consider that this will allow for greater efficiency through information infrastructure allowing better reuse of data. They are also motivated by Open Data’s potential to produce new innovations through its use. The Open Data culture is founded on the availability of data and is oriented towards communication: this makes it possible to generate knowledge through transformation effects in which data is provided or made available for innovative applications.

France is one of the few countries in the world to have an organization like the French National Institute for Statistics and Economic Research (Institut national de la statistique et des études économiques, INSEE), which produces a vast amount of data. We therefore have great potential in terms of activities that create added value through data processing. The advantage is, in fact, both private and public. States can benefit from allowing the fruits of these new activities to flourish and expand: not only do they create value directly, they also provide services for society at large.

The current debate on Open Data evokes public entities and companies’ reflection of whether to make their data available for the purpose of generating business or image benefits (transparency, information, communication, etc.). In the context of Open Data, data can circulate freely to favor innovation research or simply to achieve productivity gains.

2.5. Conclusion

Indeed, with the progress of the digital revolution, Open Data presents a common reference point, favoring the economy and society, but also research and knowledge production. The Open Data movement seeks to promote knowledge and innovation due to information sharing and cross-sector collaboration. This movement is made possible by information and communication technologies that enable exchanges of an almost unlimited amount of data (Big Data). By opening its data to the world, an organization allows for it to be reused for different purposes. The results of this secondary reuse can, in turn, be shared with the community, which creates a multiplying effect.

The phenomenon of Open Data, which opens up digital government data to the public, developed very quickly in the US before taking hold in Germany, France and several other countries. For Tim Berners-Lee, the phenomenon of Open Data should determine the future of Web developments. Moreover, with the development of all kinds of data, the evolution of data quantities should generate high added value, both in terms of usage and in terms of decision-making and development enhancement, thanks to improved synergy in actors’ actions.

Furthermore, one of the main advantages associated with Open Data is that it promotes the development of a culture centered on information sharing and on cross-sector collaboration. As a cross-sector principle, Open Data can generate benefits for the economic, cultural and social spheres. The possible benefits that stem from such opening of data can therefore be exponential.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset