Chapter 7

Strategies for Analyzing Chinese Information Sources from a Competitive Intelligence Perspective1

7.1. Introduction

Mastery of information has become an essential issue for organizations, whether state departments or private structures, in the field of politics, of business, or of scientific research. In recent years, the scientific research sector has experienced increasing pressure with mounting stakes in political, ideological, and commercial terms, and presents major strategic issues. Scientific research has accelerated and flourished with the development of awareness of a real planetary emergency in certain sectors, particularly those of renewable energy and environmental protection, but also in other fields important for human survival, such as medicine, agriculture, and water treatment. Technical and high-tech sectors are also concerned by new patents that create strategic monopolies. Given that intellectual property rights lock down all scientific discoveries and depend on the fact of being first to register a patent, we are now dealing with a “research race” on an international scale. This is as true in industrialized countries as for major emerging powers, such as India and China. We can no longer afford to ignore these new players, who are gaining rapidly in importance in all domains.

In this context of heightened competition, with alliances between businesses and government and genuine engagement in scientific research, there is a clear need for high-performance tools to harness useful information in record time. It is therefore essential to find tools that ally performance, rapidity, and multilingual capabilities to carry out extremely fine watching activities. Faced with a growing mass of information of varying nature and in the context of an unstable economy, decision makers need increasingly high-performance tools to assist them in comprehending their environment.

These tools must have an extremely fine capacity for exploration to obtain the most relevant information for the decision maker, on the one hand, and to assist decision makers in refining their own analysis of the environment, corresponding as closely as possible to the user’s own logic and to their domain of study.

This chapter focuses on Chinese scientific information, as this is more or less unknown territory for our organizations. Chinese researchers, on the other hand, who generally have a strong grasp of English, have access through their laboratories to worldwide scientific databases. They are, therefore, able to access the totality of scientific information available from around the world, whereas Western researchers have no means of accessing Chinese scientific information (unless Chinese researchers publish in English). Scientific and technical information are essential for any researcher wishing to carry out relevant scientific watch; it is also a crucial component of the strategic information needed to carry out a competitive intelligence process, whether this be done by decision makers in the private sector or at government level, mostly in the context of industrial applications of research. Competitive intelligence and the practices involved may be defined as an analysis of the status quo with the aim of anticipating possible developments, allowing correct orientation of future actions [MAR 94]. Mastery of this type of information is, therefore, an absolute necessity in all economic domains linked to the rapidly developing technical sector: pharmacology, genetic engineering, chemistry, energy, physics, aeronautics, nuclear technology, and so on.

In the context of this work, we aim to demonstrate what may be achieved in terms of automatic processing of Chinese scientific information, using largely validated information processing and analysis tools generated by the French research sector. Our main task was to adapt these tools to the Chinese information environment. After a discussion of the indispensable nature of Chinese scientific information, our analytical strategy will show how Chinese strategy is developing at international level (section 7.2). We will then consider means of harnessing Chinese information (section 7.3) and create a first level of value-added strategic information (section 7.4).

7.2. Chinese scientific information as an essential source of information

Although Deng Xiaoping launched a new era of openness between China and the rest of the world at the beginning of the 1980s, internationalization and the economic boom truly occurred only in the mid-1990s, institutionalized by the establishment of the “socialist market economy”. The 2000s marked a new turning point; in 2001, China joined the world trade organization and, by this act, entered into worldwide competition. Increasing numbers of foreign companies have created bases in China and the entire country was connected to the Internet in the space of a few years. Although the first decade of the new millennium was marked by new issues for the Chinese government, their strategy remains essentially the same: reconquer the position of leader in Asia and gain recognition as a major power on the world stage.

Transfers of knowledge and technology are therefore crucial. Nonetheless, China, following its strategy of economic domination, cannot focus exclusively on this method of acquiring scientific knowledge, and must develop its own power in terms of scientific innovation. On January 9, 2009, a national prize giving ceremony was held in Beijing to honor achievements in science and technology in the presence of the President, Hu Jintao, the Prime Minister, Wen Jiabao, and a number of other political and scientific heavyweights. The Prime Minister declared that it is the strength of research that will decide the destiny of the country, exhorting scientists to place themselves at the service of businesses and to themselves invest in the development and commercialization of new products.

According to the latest Biennial Report of the Observatory of Science and Technology (OST)1 [OST 08], internal expenditure on R&D in China more than doubled between 2000 and 2005. In 2006, China reached the number two position worldwide in terms of R&D expenditure, with a cash injection of $136 billion [OCD 06], an increase of more than 20% in 1 year. R&D expenditure in 2008 was around $101 billion2 [OST 08]. In 2005, China became the preferred destination for Western industrialists hoping to globalize their R&D [UNE 05]. The Chinese government, aiming to increase the attractiveness of the country in terms of R&D, created special zones by theme. Furthermore, the number of researchers in China increased by 77% between 1995 and 2004; today, China is second only to the USA in terms of numbers of researchers, with 926,000 individuals involved in this activity [OCD 06]. Nature journal considers [BUT 08] that scientific development in China is progressing even more rapidly than economic development (Figure 7.1).

Figure 7.1. Part played by China in worldwide research and development (source: Les Echos, July 6, 2007)

image

The emergence of Chinese authors in the international scientific press has been gradual, beginning in the early 2000s, but there has since been a major explosion in English language scientific publications by Chinese authors in international journals with a high impact level, particularly in the first semester of 2007. In 2004, Chinese researchers contributed to the production of 6.5% of all worldwide scientific publications, and attained second place in terms of publications linked to nanotechnology research [VIL 08]. In 2005, China reached number five in world rankings in terms of the number of international scientific publications [ZHO 09]. The 1.1 million Chinese researchers (out of a total of six million worldwide) increased the number of their scientific publications by 96% between 2001 and 2006, putting their country in third place worldwide with 7% of all publications recorded by the science citation index (SCI) (in all disciplines) [OST 08]. The number of publications of which the first author is based in China has also increased dramatically. In 2007 alone, the journal Science published around 30 articles by Chinese authors or co-authors. China is also second on the world stage in terms of presentations at major conferences, with 10.1% of the total according to ISPT rankings3 [IST 08]. Chinese authors, therefore, have a major presence, and the quality of their work is on a par with its quantity. The number of patents and journal articles produced by both Chinese scientists and engineers are increasing rapidly.

The medium- and long-term national program for the development of science and technology (2006–2020), launched on February 9, 2006, sets out a 15-year strategy for scientific and technological development in China. The declared aim of this plan is that science and technology and their direct applications should contribute at least 60% to the development of the Chinese economy. Over the same time period, dependence on foreign technologies should be reduced by at least 30%. The number of patents and academic scientific publications should place China in one of the top five positions worldwide. Through this plan, the Chinese government has encouraged big businesses to set up R&D centers, either alone or in collaboration with public bodies. Analysis of the dichotomy between private and public research in China should not follow the same lines as in Western countries. In reality, everything in China is considered to be public; in other words, everything belongs to the Party. A Chinese company is never really private. Even if it is managed in the same way as a private company, links with the state remain very strong. The circulation of knowledge and decision-making networks have evolved very little, as the Communist Party is ever-present and all-powerful not only in the political sphere but also in economic, scientific, and other domains. Moreover, a brief look at the CVs of the heads of Chinese companies reveals a great deal. The more a business is involved in a sensitive sector or has important dimensions, the more its directors are likely to hold important strategic positions within the Party. An individual may easily be a member of the national people’s assembly and the director of a company. Knowledge, patents, and technologies, therefore, belong “to the Chinese people”.

Furthermore, plans have been made to overhaul the current system of scientific and technical management by bringing together the military and the civilian research organizations, with the aim of creating common research programs and producing commercial applications to cover research expenditure. All major axes of technical research are included in the 15-year plan, but certain domains are given priority: energy research, use of biological resources, and the development of space and laser technologies. Thus, four major research projects have been defined for the next 15 years in the aim of increasing China’s international competivity through major breakthroughs in science and technology: protein studies, quantum control, nanotechnology research, and genetics. In terms of budget, the program aims to double contributions to R&D as a proportion of China’s gross domestic product (GDP), which is itself increasing at a rate of 9–10% per annum; investment may, therefore, potentially triple over 15 years. New technologies thus constitute a major strategic axis of Chinese policy.

As major strategic domains have been clearly defined by the government, the direction to be taken by the country is already fixed and, from here, attention is focused on the concrete application of these aims. New banking and fiscal policies aim to support innovative start-ups alongside R&D departments in existing companies.

Furthermore, one of the main concerns of this plan is the protection of intellectual property and the registration of patents, which form part of a strategic plan at national level. The Chinese government considers scientific advances as part of a national integrated system of innovation. It is therefore interesting to look at the position of Chinese actors in terms of numbers of patents registered worldwide. China has also accelerated the internationalization of its research system, thanks to an increased visibility in international journals and conferences, and to a vast network of international cooperation, although — as we see in Figure 7.2 — the number of triadic patents remains low.

Figure 7.2. Profile of China in science and innovation (source: OCDE 2008)

image

To complete this presentation of the growing scientific power of China and demonstrate how our method for the analysis of Chinese scientific information is connected to a wider approach to competitive intelligence, we will now look at the sector of agriculture and biotechnologies in China. This sector includes the question of hybrid wheat, which will be the focus of our demonstration.

A plan has already been set out according to which research domains in biotechnology will count for 5% of Chinese GDP in 2020 (a total of around $250 billion) [QUO 07]. The main applications of this domain will be in the fields of pharmaceutics, genetics, protein engineering, human tissue engineering, and new-generation industrial biotechnologies. The Chinese government wishes to encourage research in this domain and hopes, in particular, to accelerate the implementation of transgenic technologies that it considers to be crucial for durable agricultural development and for increased competitivity in agriculture. “The stated objective is to obtain a stock of genes with high added value, while avoiding, as far as possible, paying intellectual property rights. The Prime Minister, Wen Jiabao, highlighted the urgency and strategic importance of this technological program and asked those responsible to move as quickly as possible. An additional four billion Yuan (just under $570 million) has been assigned to research on genetically modified rice and wheat (part of which will go to fulfilling security conditions linked to these crops)” [VIL 08]. At the present moment, around 150 agricultural biotechnology laboratories at national and local level are spread among over 50 research institutes and universities across the country. Over 100 laboratories across the country are working on the genetic sequencing of plants, animals, and humans and more than 50% of third-world investment in plant biotechnologies is directed toward China. In 1999, Chinese expenditure on vegetal biotechnologies was around $112 million. In 2000, this reached $120 million and, in 2001, $360 million. Of the tests carried out in agricultural biotechnology, 90% concern resistance to insects and to disease.

All these figures serve to show that China has entered into a phase of scientific research clearly oriented toward aims of economic domination, part of a strategy to gain political power on the international stage.

7.3. A global vision of the sector through patent analysis

To look at the position of China in a given domain at international level and, therefore, to measure the country’s capacity to gain a scientific and economic monopoly of a technology, we must begin by studying patents registered in the domain worldwide. On an international level, only three patent offices are important: those of the USA, Europe, and Japan. The OST report shows that the increase in Chinese patent registrations was +124% for the European Patent Office, and +261% for the American office.

As patents are an important indicator of the technological dynamism of a country, we must begin any analysis of the position of China in a given sector by studying patents concerning that sector. To do this, we use Matheo Patent, an automatic patent analysis tool developed by the Université Aix-Marseille. Matheo Patent is directly connected to Espacenet, the European Office’s online database of global patents.

To obtain all patents relating to the hybridization of wheat, we chose to search for keywords in titles and abstracts rather than using the ICP code4, having noticed that the patents that interested us fell into various different ICP categories. Our research equation, based on recommendations from experts in the domain, was as follows: [(HYBRID AND WHEAT) AND MALE AND STERILE].

Matheo Patent, after detecting the corpus of patents corresponding to this request, downloads all linked bibliographical entries, abstracts, claims, and even pages of drawings if these are included in the patent. 79 patents were found for the period January 2000–October 2008. We were then able to establish a preliminary map of author locations.

It is interesting to note that the two countries with most entries were the USA and China (with several organizations registering patents: Hebei Normal University, Crops Institute of Sichuan Provincial Academy, and the Hunan Agricultural College). The different countries registering patents were placed into groups to obtain more information on national orientations in terms of research strategies and actors (Figure 7.3).

Figure 7.3. Countries classed by number of patents registered

image

China thus takes a very clear lead in terms of the number of patents held in this domain. Although, Chinese patents do not always follow the same information structure in terms of format (Chinese patents are often shorter than global, European, or American patents), their large number shows that Chinese researchers are extremely active in this domain.

We now move on to a deeper analysis of these different groups to study levels of cooperation between different actors in the domain. This information is important in the sphere of scientific research. Cooperating leading groups share research domains, and thus information, technologies, and capabilities which increase scientific efficiency. To obtain this information, we will look at the network links between different groups (Figure 7.4).

Figure 7.4. Cooperation between countries

image

We note that in spite of a high number of patents, China has not developed cooperative links in terms of patent registration (Figure 7.4), confirming the direction of Chinese government strategy as described above. In comparison, the USA has cooperative links with three other countries — Canada, Israel, and Thailand. This may also indicate that reflection has already taken place concerning a research strategy and that a choice in terms of research directions has already been made. To deepen our analysis, we may establish a diagram of IPCs by country, allowing us to visualize different research techniques used by different countries (Figure 7.5).

Figure 7.5. Technologies used by different countries

image

Specific technologies (ones not shared with other countries) are particularly present in China and Japan. For the other countries included (with the exception of South Korea and Taiwan), technologies used are shared by two or more countries. This analysis is interesting as it highlights the orientation of research in this domain by ASEAN member states. This analysis may be taken even further by using the IPC to its full extent (with more precise eight figure codes).

Research orientations within the “China” group can be shown easily using a matrix within the group to cross-reference “inventor” and “IPC” fields (Figure 7.6).

Figure 7.6. Research focus of Chinese inventors in relation to IPC

image

Global information on key technologies is obtained by creating a network between the different IPCs found in each patent. This produces a network map that indicates the key domains (network nodes), main research orientations, and areas that are detached from the bulk of the networks, and which may indicate innovative technologies (Figure 7.7).

Figure 7.7 shows three main research orientations, one of which constitutes a separate group from the other two orientations. It also clearly indicates a research domain that seems to be an innovative approach (as it does not participate in all networks). This is also the case for techniques that are linked to a network by their IPC code without being connected to other technologies.

The use of Matheo Patent thus provides us with a first visualization of the domain of hybrid wheat at international level, clearly showing that scientific research orientations in Western countries (the USA, Canada, etc.) are very different from those observed in the same domain in Asia, and in China in particular. The use of this first tool in our watch process confirms the previously unproven suspicions of experts in the domain: first, that active scientific research is underway in the domain of wheat hybridization in China, and second, that the country has not yet developed cooperative links between its institutions or businesses and those in other countries. This analysis, carried out using data from the European patent office, does not reflect the totality of Chinese activity linked to hybrid wheat. However, it clearly demonstrates China’s position as leader in terms of the number of patents registered. We must, therefore, dig deeper to obtain a more precise vision of the realities of Chinese scientific research in this domain, along with the technological applications of the domain and the associated economic issues, not only in China but also in the near future, at international level.

Figure 7.7. Key technologies in research on hybrid wheat

image

7.4. Chinese sources of scientific information

A vast amount of information on China can be found on the Internet. This is evident in scientific domains where Chinese researchers, seeking global recognition, are publishing increasing numbers of articles in international journals with a high impact factor. Thus, as we saw above, an analysis of databases such as SCI, PubMed, and others can be very instructive, and constitute a veritable goldmine of information when beginning a scientific watch process concerning China.

At this point, a quick experiment to compare the numbers of Chinese publications in English and those in Chinese may be interesting. We will use the Google Scholar search engine, which indexes the contents of certain scientific article databases, and is the only search engine of its kind to also process articles in Chinese characters.

We launched a search using the expression “male sterility gene”, then a second search with the same expression in Chinese: image We chose to restrict our search to the period 2000–2009. We were then able to make the following rapid observations:

– There were more responses for the Chinese search term, with 1,320 against 938 in English.

– Of the five key authors in the English language version of the search, three had Chinese surnames (WANG, TAO, and DONG). These authors do not appear as key authors in the Chinese version of the search, where the “top five” are FANG, WANG, ZHU, SUN, and YUAN.

Although we would not wish to use this type of approach to carry out relevant watch, this brief experiment shows that the potential research capacity of China is far beyond that which we might deduce from analysis of English language scientific documentation alone. In our opinion, English language publications are just the tip of the iceberg where Chinese research is concerned; these publications relate to the most successfully completed fields of research, but reveal little on work in progress. Scientific and technical information are important sources of strategic information for decision makers in a competitive watch process, used to plan decisions and competitive intelligence actions. This information is also important for researchers, who can subsequently effectuate complete global watch on the state of the art and scientific developments in their field of study. As far as the database industry is concerned, China has been part of the International Committee on scientific and technical databases since 1984. In 1987, China gained an international information center, initially responsible for the creation of 134 major databases. Less than 10 years later, the center had over 1,300 databases. Currently, almost half of all databases are developed by public bodies and may be consulted over the Internet [MA 05].

Any competitive intelligence and information seeking process aiming to facilitate business decisions must, in China as in any other part of the world, include a source verification aspect with qualification of the degree of reliability of information depending on the source. We must pay particular attention to various forms of manipulation of information that may be encountered: “information poisoning”, unfounded rumors, erroneous predictions, imprecise, or false factual data, and so on. We provide a qualification of the various different Chinese information sources available online, which is given in Table 7.1. In our opinion, scientific articles present the same level of credibility in China as in other countries. We will therefore use the Chinese scientific and academic article database, China national knowledge infrastructure (CNKI), to access the totality of Chinese research and define an approach for the reading and analysis of the dynamics of the hybrid wheat market in China.

Table 7.1. Qualification of information depending on source

Type of site Credibility of contents
Government websites Propaganda
Online newspapers
Company websites Incomplete
Financial information websites
Discussion boards Require checking
University websites Imprecise information
Professional associations Good product information
The SIPO Chinese patent database Credible
Scientific databases

The CNKI portal is a project developed by universities, bringing together various Chinese databases. Its development was, and still is, strongly supported by the Chinese government, which aims to use this portal to stimulate growth of a “culture of information” in China and develop information-based intelligence in China. In addition to Chinese databases, CNKI is now open to foreign databases; for example, an agreement was signed with Springer in 2008, giving the latter a foothold in the Chinese market. Consultation of bibliographical notices is free, but full text downloads must be paid for and require a subscription. At the end of 2007, CNKI contained references to over 25 million articles.

The Internet in general in China is undergoing a permanent process of improvements. In few months, the entire configuration of CNKI changed [GUE 08]. In addition to the collection of a corpus, which is always labor-intensive (this process cannot be done automatically), two new problems have emerged:

– One line of Chinese text contains up to three fonts, which alternate from one character to the next. This is not noticeable when reading, but creates noise in coding.

– Keywords disappear when downloading the corpus; they are no longer present in the description fields of an article, restricting bibliometric analysis.

However, online bibliometric tools have been integrated into CNKI, allowing us, for a request, to see lists of authors by frequency of publication, organizations, keywords, and so on. Essentially, CNKI provides bibliometric data but does not allow users to create their own data (at best, possibilities for this are limited). We are, therefore, obliged to make new modifications to process this information automatically using our tools.

To keep our demonstration simple, we have deliberately restricted the collected corpus. A more relevant analysis would involve the collection of all articles in the database containing the word “wheat”. This request would provide us with a visualization of research on hybridization as a proportion of all work concerning wheat. However, this request produces almost 30,000 responses. The analysis of a corpus of this size would be beyond the scope of this article, hence our restriction of the corpus; our aim here is to demonstrate the feasibility of analysis of Chinese information. Our extracted corpus, therefore, focuses on male sterility, a major condition for hybridization, and covers all articles from the period 2000 to 2008. We thus obtain 302 responses (Figure 7.8) and, by analyzing this corpus, we are able to observe the development of research on wheat hybridization over this 8-year period.

Figure 7.8. Request process in the CNKI database

image

The next step after sending the request is to download the corpus. Notices can be downloaded in pages of 50. Note that the database is extremely well structured (Figure 7.9). We can then carry out a certain number of analyses on the metadata of the database to extract a first level of information.

Figure 7.9. Page of 50 notices extracted from the corpus

image

7.5. Automatic processing of information by bibliometrical analysis of metadata

7.5.1. Specificities of a Chinese-language corpus

Our work takes place in the context of analyzing an economic environment to provide strategic knowledge to support a decision process. The first set of information to obtain, therefore, concerns the actors in the domain and their reciprocal actions. This type of information may be extracted from our corpus, but it is not immediately apparent from simple reading of the corpus; we must use a specific tool to cross-reference and count fields. We do not aim to simply model information based on its contents, but in relation to the use to be made of the information [DAV 05].

From this extracted corpus, we can carry out a first level of analysis on the metadata contained within the bibliographical notices of articles. The structure of the database allows us to carry out the necessary cross-referencing.

The Tétralogie program, developed by the Institut de Recherche en Informatique in Toulouse, France, is used for data mining [DOU 03] and allows us to show networks of actors and their dynamics, the development of concepts and subjects of study and to detect weak signals [LOU 07] present in a corpus of material. Its use has been widely proven to be valuable; our aim is to use Tétralogie in a new linguistic environment.

It was therefore necessary to add a software development phase to our work to adapt Tétralogie to the Chinese linguistic environment, on the one hand, and to the structure of the CNKI database on the other hand. Each database has its own structure, and we needed to make modifications for the new format of CNKI. To do this, we first created a basic “structure describer”.

This describing tool defines different basic fields, identifying their banners, separators, use, and the various types of information they contain (Figure 7.10). It also allows us to identify the beginning of each notice and the physical structure of recording (format and number of occurrences of banners).

To enable automatic processing of Chinese characters, we used Unicode to transform and tag metadata fields. All identifiers for Unicode characters may be found in Unicode tables in hexadecimal or decimal form (Table 7.2).

This permits automatic processing of all bibliographical notices in the corpus and, in the long run, will produce strategic analyses of Chinese-language information which are perfectly accessible to readers with no knowledge of the language.

Figure 7.10. Description of fields in a bibliographic notice

image

Table 7.2. Correspondence between Chinese characters and Unicode for metadata fields

image

The Unicode converters available online (Figure 7.11) are insufficient for the conversion of an entire corpus with tens of thousands of characters. We must therefore carry out minor modifications to our corpus to show up codifications. The entire corpus will be saved in text mode to process characters directly. However, a problem emerges in analyzing the structure of the document; in text mode, the character font changes constantly. A single phrase may contain up to three different fonts, increasing noise in the coding and rendering processing more complex.

Figure 7.11. Visualization of character coding using different standards: Unicode code convertor (source: http://hapax.qc.ca/conversion.fr.html)

image

Take, for example, the following notice. The title field image contains two characters. The first uses the SimSun font and the second is in MS UI Gothic (Figure 7.12).

Figure 7.12. Visualization of coding for characters image (“title”)

image

If we want to process data directly from the code for Chinese, we need to reformat data to “clean” notices and remove the noise generated by formatting (Figure 7.13).

The results of this first process should contain only the Unicode coding of Chinese characters and of bibliographical fields (Figure 7.14).

The reformatter contained within Tétralogie “cleans” notices to produce a result more suited to bibliometric treatments.

Figure 7.13. Visualization of coding of a notice title. In Chinese: image

image

Figure 7.14. Visualization of coding of Figure 7.13, after reformatting

image

7.5.2. Analysis and results

After “cleaning” the corpus in text mode, we can finally process the data. We will supply a few results here, but the reader is invited to consult previous publications on Tétralogie to gain a better idea of the wide range of cross-referencing and analytical activities the program provides. In what follows, we will present some results produced using Tétralogie to process the corpus of material on “male sterility”.

From a representation of actor networks in the “male sterility” corpus, we rapidly observe that a number of very distinct research teams exist, with very little connection between these teams; some of these teams only contain three authors. We will not devote much time to these groups and will instead focus on the largest group (Figure 7.15). In this group, author collaborations are widespread, demonstrating the existence of a real research team with fruitful exchanges; it is also easy to identify the most prolific authors, as their associated nodes are larger.

Figure 7.15. Visualization of author collaborations

image

Having targeted the main network, we can now use a matrix to identify authors and to follow their development over the period concerned by our corpus.

It appears that at the height of its activity, in 2006, the network included 15 major authors (Figure 7.16).

Figure 7.16. Visualization, in Unicode, of author names in the target group

image

Table 7.3. Conversion of decimal numeric codes into Chinese characters

image

It is then easy to carry out conversions in the opposite direction to obtain the corresponding Chinese character (Table 7.3).

The name of the first person in the group thus appears in a relatively simple manner. Repeating the same operation for other names rapidly allows us to identify the most active Chinese researchers in the domain of hybrid wheat. It is then easier for decision makers to carry out searches on these names to contact the author, follow their work, and/or enter into negotiations with the person or group in question.

Particular attention is required when dealing with the names of Chinese authors, as a large number of names are similar, particularly when transliterated into pinyin. By analyzing Chinese information directly without prior translation, Tétralogie is able to limit confusion in this domain.

7.5.3. Validation and comparison

The author list produced by Tétralogie has been validated by experts in China and ties in with our own on-the-ground experience. We note that these authors are not only major actors in Chinese research into hybrid wheat but also key players at national level from a political standpoint. Tétralogie allows us to identify these actors in a simple manner, based on a mass of articles where not all authors have the same importance and level of influence.

If we continue our validation process by looking at the author ZHANG Gaisheng alone, CNKI provides us with 236 articles by this author, including 19 from 2008. This author is therefore particularly prolific. However, if we search for “ZHANG Gaisheng” as an author in PubMed, we obtain no results. The INIST French database proposes four articles, but only two actually correspond to the target author. A Google Scholar search for the author for the year 2008 produces two articles, the first referenced by Elsevier and the second by Wanfang Data.5

We should add that the results provided by this analysis effectively allow us to identify key elements of Chinese research on hybrid wheat. The scope of the present chapter does not allow us to cover the entire study in detail, but we have shown that decision makers are able to target key actors rapidly and relevantly. The possibilities offered by Tétralogie do not stop here; many other analyses are possible which we were not able to discuss here. We have demonstrated the feasibility of analysis of a corpus of Chinese scientific information, and invite the reader to consult the numerous publications available concerning Tétralogie to gain an insight into the full potential of the program [DOU 03].

7.6. Conclusion

Any analysis of an environment in the context of a competitive intelligence process begins with in-depth analysis of the actors in the target domain. In China, the sectors of research, economics, and politics are more closely linked than in the West. A research partnership with a Chinese organization, even if the organization is institutional, itself opens doors into the Chinese market. It is therefore absolutely essential to accurately identify actors and their ways of working, including those who dominate given sectors, before deciding on any action involving approaches and negotiation.

Our work is anchored in the context of competitive intelligence and decision assistance. The final results of this should be the acquisition of useful knowledge, leading to subsequent action. Faced with the phenomenal growth of China on the international stage in all sectors (political, economic, and scientific), we believe that no watcher can go without access to Chinese information in carrying out full global watching. Our work therefore focused first on the modification of classic information processing and analysis tools to cope with the Chinese information environment. We have proposed a complete approach to information analysis, including the choice of sources, extraction of a corpus, preparation for processing, and facilities for the visualization of results.

High-performance data mining tools have been developed by French researchers. These enable us to identify major and emerging actors, to observe collaborations between authors and groups and to study the development of these interactions over time, based on a very large corpus of source material. These techniques are incredibly useful and merit regular use by decision makers. Up until now, these tools were applicable to any corpus of material in Indo-European languages, not only mostly in English but also in French, Spanish, and Portuguese. The novelty of our approach is that all processing is carried out on the Chinese text and that only the results are translated. This process is applied to selected and reliable elements. The method therefore enables any decision maker, even those with no knowledge of Chinese, to analyze the Chinese environment for a particular domain.

Our approach then allows real-time watching of knowledge, based on the capitalization of knowledge in the domain. Once again, our contribution allows the use of Chinese sources and provides ideas for further development of tools. Although the final documents are proposed in their original language (in this case, Chinese), watchers may easily identify documents to translate. Translation is an expensive business6, but waste is avoided as the watcher chooses articles based on concepts expressed in the article and not simply using keywords. This translation phase occurs in the very final stages of analysis and applies to specific, targeted information that will clearly be useful to the watcher.

We have thus shown how large quantities of Chinese-language documents may be used efficiently and demonstrated the way in which we can extract information with a real value in terms of knowledge. This may be integrated into a strategic scouting process to support decisions concerning planned actions in new territories, which are often difficult to understand and which present a commercial risk. At the end of this process, the real work of negotiation begins.

7.7. Bibliography

[BUT 08] BUTLER D., “China: the great contender”, Nature, vol. 454, 23 July 2008, available online at: www.nature.com/news/2008/080723/full/454382a.html.

[DAV 05] DAVID A., Organisation des connaissances dans les systèmes d’information orientés utilisation — contexte de veille et d’intelligence économique, PU Nancy, Nancy, April 2005.

[DOU 03] DOUSSET B., Intégration de méthodes interactives de découvertes de connaissances pour la veille stratégique, HDR, University of Toulouse III, Toulouse, 2003.

[GUE 08] GUENEC N., DOU H., “Intérêt et méthode d’extraction de l’information scientifique Chinoise”, Cahiers de la Documentation, Association Belge de Documentation, no. 2008/4, 2008, available online at: ww.abd-bvd.be/index.php?page=cah/rc-2008-4&lang=fr.

[IST 08] ISTIC image (2008 China and the world’s top ten scientific and technological progress in selected news), 27 November 2008, available online at: www.chaxin.org/EducationDetail.aspx?ArticleID=86505.

[LOU 07] LOUBIER E., BAHSOUN W., DOUSSET B., “La prise en compte de la dimension temporelle dans la visualisation de données par morphing de graphe”, Colloque Veille Stratégique Scientifique et Technologique (VSST 2007), IRIT, Marrakech, Morocco, 21–25 October 2007.

[MA 05] MA J., Bibliothèque et document numérique en Chine, summary for the RTP-DOC, CNRS, June 2005.

[MAR 94] MARTRE H., CLERC P., HARBULOT C, “Intelligence économique et stratégie des entreprises”, Commissariat général du plan, La Documentation Française, Paris, 1994.

[OCD 06] OCDE, Perspectives de l’OCDE de la science, de la technologie et de l’industrie, OCDE, Paris, 2006, available online at: www.oecd.org/document/61/0,3343,fr_2649_33703_37743997_1_1_1_1,00.html.

[OST 08] OST, Edition of the report of the observatoire des sciences et techniques, OST, Paris, 2008, available online at: www.obs-ost.fr/dossiers/article/publication-de-ledition-2008-du-rapport-de-lost.html?tx_ttnews[backPid]=5andcHash=7008cb6c80.

[QUO 07] QUOTIDIEN DU PEUPLE, “La Chine dresse les grandes lignes d’un projet de développement de la bio-économie”, Quotidien du Peuple, 28 June 2007, available online at: http://french.peopledaily.com.cn/Economie/6200702.html.

[UNE 05] UNESCO, Science Report 2005, UNESCO Reference Works series, UNESCO Publishing, 2005.

[VIL 08] VILLALONGA A., La Chine se hisse dans le peloton de tête du classement des publications scientifiques, ADIT, BE Chine 56, 8 January 2009, available online at: www.bulletins-electroniques.com/actualites/57207.htm.

[ZHO 09] ZHOU P., LEVDESDORFF L., “The emergence of China as a leading nation in science”, Research Policy, vol. 35, no. 1, pp. 83–104, February 2006, available online at: http://users.fmg.uva.nl/lleydesdorff/ChinaScience/ChinaScience.


1 Chapter written by Nadège GUENEC.

1 The OST (Observatoire des sciences et techniques) is an inter-institutional platform founded and administered by major actors in the French system of research and innovation.

2 USA: $280 billion. EU: $199 billion. Japan: $113 billion.

3 ISTP: Index to scientific and technical proceedings. This index relates to communications at major international conferences.

4 International Patent Classification, hierarchical system for the classification of patents and models by the domain of technology concerned.

5 Wanfang Data is a Chinese database of scientific articles and a competitor of CNKI. It covers fewer articles but, for the Chinese public, presents the advantage of giving access to major international databases and portals, allowing it to maintain control of a certain part of the Chinese information provider market.

6 Between €200 and €300 for a scientific article for a Chinese-to-English translation carried out by an agency in Beijing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset