4

The digital revolution

Michael Clarke

Abstract:

The digital revolution is without a doubt the most significant event in information dissemination since Gutenberg’s printing press and arguably marks a much bigger shift in human communication. This chapter discusses the impact of the digital revolution on scholarly publishing and professional communication thus far and describes the key trends and technologies shaping the future of the industry. These include evolving online publishing platforms, Web 2.0 technologies that use audience participation and network intelligence, mobile technologies that enable information professionals to access information anywhere, semantic technologies that transform how we discover information, and workflow integrations that channel the right information to the right individual at the right time.

Key words

Digital publishing

electronic publishing

online publishing

mobile

mobile web

smartphones

e-readers

tablets

Web 2.0

semantic technology

semantic web

workflow integration

personalization

Introduction

Printed matter following the invention of Gutenburg’s printing press is termed ‘incunabula’, which is Latin for swaddling clothes, and refers to the infancy of the form. Books from this period (c. 1439 to 1501) often featured embellishments, marginalia and other elements to give the appearance of being hand written. It wasn’t until the 16th century that books started to evolve to a form very similar to those found on the shelves of bookstores and libraries today.

The digital revolution is without a doubt the most significant event in information dissemination since the Gutenberg revolution, but just like that revolution we began with an incunabula period. However, instead of mimicking manuscripts, the first phase of the digital revolution produced mere simulacra of printed publications: online journals and electronic books that looked and functioned as much like their print counterparts as possible. The typeset PDF was and remains the dominant mode of distribution for journals. For e-books, the PDF or EPUB edition similarly seeks to replicate the print edition as faithfully as possible. Indeed, even the dominant metaphor of the web, the web page, is a reference to our print heritage.

There are signs, however, that we are emerging from our incunabula period to new, digital native forms of publishing. Web 2.0 technologies are changing how professionals locate and share information. Semantic technologies are transforming how information products are developed and discovered. Mobile devices are changing the way scholarly information is distributed. And researchers and scholars have changing expectations for the way in which information and services from publishers will fit in to their workflows. This chapter discusses these developments and their impact on scientific, technical, medical (STM) and scholarly publishing.

Online publishing platforms

While electronic publications, in their various forms, date back to the 1960s, the era of electronic publication as the primary mode of dissemination began in the mid-1990s and coincided with the rise of the World Wide Web (Tenopir and King, 2000). As most academics and researchers were already connected to the Internet and accustomed to information retrieval systems and email, the move to online publications was relatively straightforward. Commercial publishers made investments in building large-scale publishing platforms such as ScienceDirect (Elsevier), InterScience (Wiley), Synergy (Blackwell), SpringerLink (Springer) and Nature.com (Nature Publishing Group). Many not-for-profit organizations – especially those in the biological sciences – moved their journals relatively quickly to platforms developed by HighWire Press, Ingenta, MetaPress, Ovid or Atypon Systems. A number of not-for-profit organizations focused on the physical sciences took more of a do-it-yourself approach. Organizations including the University of Chicago Press (which had an emphasis, during the 1990s, on astrophysics), the American Institute of Physics, the Institute of Physics, the Optical Society, the American Geophysical Union, the Institute of Electrical and Electronics Engineers (IEEE) and the American Physical Society all developed their own in-house platforms.

The development of online publishing platforms coincided with an industry shift from subscription sales to site licensing and, ultimately, the Big Deal (Bergstrom and Bergstrom, 2004). Elsevier led the way, borrowing the site licensing concept from enterprise software, selling the entirety of the content on its ScienceDirect platform for one, deeply discounted (when considered on a per-title basis) price. This move to site licensing allowed scholarly publishers to fund the continued investment in digitization. As a consequence, the industry shifted quickly from a print-centric subscription model to an online-centric licensing model as compared with adjacent markets such as news, magazines or trade publishing.

The great limitation of these platforms, however, was books – or rather the lack thereof. Most of these platforms did not feature books, or if they did, they were hosted in a limited capacity (and often were treated, from a display perspective, as if they were journals). Generally speaking, the shift to online distribution of books has lagged journals by over a decade. This delay is due to a combination of technological factors, limitations in existing business models and market receptivity.

Institutional budgets for periodicals and books were often managed separately and periodicals budgets during this period had greater flexibility. Moreover, moving from a journal subscription model to a site license model was a relatively simple transition – they are both based on an annual payment for a content set that is regularly updated. Books have historically been sold on a one-time purchase basis with no updating (with a book, an update is considered a ‘new edition’ and normally requires a subsequent purchase). And finally the production of books is much harder to streamline in the way journal production can be streamlined. Elements valued in book publishing, such as bespoke design and typography, do not lend themselves to scale. Moreover, books are much more structurally complex than journals, with sections, chapters, subsections, callouts and other elements that are not standardized from book to book, even when published by the same publisher.

The principal challenge to wide-scale adoption of digital books, however, was (and remains) reader technology. Journals are consumed on an article-by-article basis. An eight-page article can be downloaded from the web and easily printed from an office printer. In fact, printing out a single article is more convenient than carrying around a printed journal issue. Moreover, a journal article is short enough that screen reading is comfortable for many users. Books, however, are often consumed in long-form and can run to hundreds of pages. That is too long for most people to consume on a laptop or desktop computer screen and not practical to print from an office printer. The wide-scale adoption of digital books has therefore been waiting patiently for the last decade and a half for reading technology to catch up with production and distribution technology. The introduction of electronic readers (e-readers) and tablet computers, as discussed below, marks the beginning of this transition and, not coincidentally, corresponds with steep increases in the sales of electronic books.

While there were (and remain) many challenges to the digital dissemination of books, there were notable early successes. McGraw-Hill’s development of its Access Medicine portfolio, the American Psychiatric Association’s development of the Psychiatry Online Library and the Oxford English Dictionary are just a few of the better known electronic book offerings. Moreover, book aggregation platforms by ebrary, Knovel, Rittenhouse, Safari Books Online and others helped realize both a viable business model and market receptivity. Today, nearly all the major online platforms include increasing amounts of book content. Libraries are purchasing digital books and readers are accessing them online, albeit with the limitations in reader technology described above.

A perhaps greater challenge than the integration of books and other content assets of STM and scholarly publishing platforms is the flexibility of the platforms to meet the evolving needs of the marketplace and to adapt quickly enough to a rapidly changing technological environment with expectations often set in the consumer marketplace or other adjacent information spaces. The macroscopic trends impacting STM and scholarly publishing – including Web 2.0 tools and technologies, semantic technologies, workflow integration and mobile devices – all present challenges for online platforms and are discussed in more detail below.

Web 2.0

Where scholarly publishing was an early adopter of the web and enabling technologies such as SGML (Standardized General Mark-up Language – the precursor of XML), the industry ceased to remain in the vanguard for the next technology wave: Web 2.0. The term ‘Web 2.0’ was popularized by Tim O’Reilly (O’Reilly, 2005) and refers to then-emerging online technologies that include social media, new media crowdsourcing, and other terms for aggregating participatory audiences or otherwise utilizing the intelligence of networks.

The participatory themes described by the Web 2.0 moniker are in some ways native to scholarly publishing and communications while in other ways they are exogenous. Scholarly publishing is inherently participatory. Scholarly articles and monographs are ‘user-generated content’. Authors, reviewers and readers are often the same people. Journals publish a great many letters. And journals and scholarly monographs support a real, physical community of professionals who often congregate regularly at one or more specialist meetings to present their work. In short, many of the Web 2.0 tools and technologies were not embraced by scholars and scholarly publishers because their communities had long since developed institutions and practices for participatory exchange.

Equally problematic, the often unmediated participation found in many Web 2.0 applications was, and remains, contrary to many of the principles and purposes of scholarly publishing. While most professional scholars participate in the scholarly publishing process, they do so in carefully prescribed ways. Reviewers are invited by editors. Editors are appointed by committees. Authors are only published after review by selected peers. This system supports a strata of scholarly brands – including an author’s institution, funding agency and the journal or press they are published by – which are perceived as signifiers of quality.

Beyond signifying quality to readers, publication is often a requirement for career advancement. In the sciences, publication in peer-reviewed journals remains standard practice and a researcher’s work, including the rank of the journals in which he or she has published, is often a key consideration in grant reviews, tenure decisions and other career milestones. In the social sciences and humanities, publication of a monograph by a university press plays a similar role. The various forms of participation encompassed by Web 2.0 tools and applications – such as blogging, commenting, micro-blogging and shared bookmarking – are not typically considered in career advancement decisions and therefore are not considered as worthwhile. Surveys of commenting in online journals, for example, have shown little uptake and no significant increase over the years (Schriger et al., 2011).

Despite these obstacles, there have been a number of notable attempts to bring Web 2.0 tools and applications to scholarly publishing – particularly in the sciences.

Professional networks

Professional networks are sites that provide a forum for individuals to interact, via discussion forums, peer-to-peer communications, comments, shared links or other functionality. Nature Network was the first network of this kind. The site was originally launched to provide a community forum for researchers in the Boston area. Nature Network London followed soon after. Eventually, the network became broader and the Boston and London sites (along with New York) were transitioned to subsidiary ‘hubs’.

A number of networks for scientists followed in Nature’s path. 2Collab was developed by Elsevier but was discontinued in April 2011. UniPHY, from the American Institute of Physics, has taken a different approach by focusing on just the physical sciences and mapping author associations from published research papers in order to pre-populate an individual’s networks with colleagues. Perhaps the most widely used networks today are ResearchGate and Mendeley, both start-ups funded by venture capital and both broad networks aimed at connecting nearly all fields of academia.

Sermo was the first online professional network for physicians. Participation is limited to those with a medical degree (as opposed to Nature Networks which anyone can join). Perhaps the most notable thing about Sermo is its business model. Sermo sells access to its forums and data to commercial organizations such as pharmaceutical companies, and physicians (while identified and validated upon enrolment in Sermo) participate anonymously. Sermo therefore functions as a kind of large-scale anonymous focus group, complete with paying clients behind the mirror.

While Sermo was the first to launch, several other networks for physicians have followed. Doc2Doc is a network developed by the BMJ Publishing Group but open to global participation. Asklepios, on the other hand, is sponsored by the Canadian Medical Association and limited to physicians from that country. Ozmosis is a physician network that supports itself by providing workflow integration tools to institutions.

Shared bookmarks

Shared bookmarking sites were popularized by Delicious, a start-up service that was acquired, and subsequently divested, by Yahoo! Such sites enable users to post and share the websites they find interesting or useful. The value of bookmarking sites is fourfold:

1. one’s bookmarks are available from any computer;

2. one can share one’s bookmarks with others;

3. one can discover related websites by seeing what others who bookmarked the same website as you also bookmarked; and

4. one can follow specific users, such as colleagues, to keep abreast of their readings.

Following Delicious, several social bookmarking resources were developed for the STM and scholarly market, including CiteULike and Connotea. Additionally, many professional networks, such as Mendeley, include a component of social bookmarking.

Virtual reality/gaming

The role of gaming and so-called virtual reality remains experimental within STM and scholarly publishing but is worth noting due to the traction it has gained in the consumer space. Several STM and scholarly publishers, including Nature Publishing Group and the IEEE, have experimented with interfaces in Second Life, a massive multi-player online game (Nature Publishing Group discontinued its Second Life interface, cleverly called ‘Second Nature’, in 2010). Second Life players can interact with publisher artifacts (including applications such as molecular models) and participate in job fairs, lectures, symposia and other events. While initiatives such as these are nascent, they are worth following as new communication, presentation and business models may emerge.

Blogging networks

With numerous independent blogs by scientists on platforms such as WordPress or Typepad, there have been a number of efforts at aggregating the audience by developing science-focused blogging networks. The most notable such networks are ScienceBlogs, Nature Blogs and PLoS Blogs. Additionally, many publishers have begun their own ‘official’ blogs associated with specific publications. These include the New England Journal of Medicine, Journal of the American Medical Association, Pediatrics and Health Affairs among many others.

While blogs by and for scientists proliferate, the genre has yet to find a comfortable home in the scientific communication ecosystem. Research is not published in blogs, but rather in journals. Less formal communications continue via email and at conferences. What then is the function of science blogs? Are they for scientists or for the general public? Are they a place for scientists to have more speculative discussions that are not appropriate for peer-reviewed venues? Such questions become even more difficult to answer when a journal starts a blog under the journal brand. How is a blog post different from an editorial or commentary? These questions remain as blogging continues its emergence and evolution in scientific communication.

Workflow integration

‘Workflow integration’ is a term that is currently in vogue in scholarly publishing. While users of the term may have field-specific meanings in mind, I will use the term here in its widest possible sense to include whatever point at which one seeks information in the context of one’s professional life.

image A student’s workflow could include using a textbook in a course, studying for an exam or accessing reference materials.

image A clinician’s workflow could include providing information to a patient, look-up of diagnostic criteria or drug interactions, consultation of treatment guidelines or review of material for maintenance of certification.

image A researcher’s workflow might include review of laboratory protocols, assessment of primary literature, analysis of datasets or the composition of journal papers.

These are all professional workflows and all provide numerous points at which information or services are needed. The goal of ‘workflow integration’ is to make the relevant information or service available at the right point in the workflow and with a minimum of friction, thus saving the professional time and thereby providing a more valuable product or service. A small sample of workflow integration product and service categories is given below.

Document distribution

Document distribution services have long prospered as a means of connecting busy professionals with articles and other content. Such services are proliferating in a wide range of offerings and business models. Some, like Reprints Desk and Infotrieve, continue to provide document delivery in much the same way it was practiced a few decades ago, albeit with online ordering systems and digital delivery. ArXiv, the venerable physics pre-print server, is arguably a kind of document delivery system. Physicists have long passed pre-prints to colleagues for feedback prior to publication, a practice made more efficient by ArXiv, which centralizes the activity. PubGet focuses on connecting researchers to article PDFs as efficiently as possible by streamlining the search and retrieval process. And DeepDyve provides article rental via a subscription plan targeted to individuals and professionals in small- to mid-sized companies.

Mendeley provides perhaps the most innovative example of document distribution. They have taken the approach of developing a professional network that is based on document sharing. Users are encouraged to upload their own papers to their profiles to share with colleagues. They then can bookmark the work others have loaded and share individually or in groups. They are therefore part professional network, part document distributer, part social booking site and part document management system.

Point of care decision support

Clinicians working at the point of care seek very different types of information, for very different purposes, than medical researchers working in the lab or field. A clinician needs succinct information that helps him or her diagnose a patient, prescribe the appropriate treatment or review how to perform a procedure. In the past, locating the relevant medical reference text off the bookshelf was the most efficient means of finding this information. In today’s fast-paced clinical settings, that is often not feasible. Publishers have responded by developing decision support resources that often combine digitized reference material, integration with drug databases, and purpose-developed content and tools. The most prominent examples include MD Consult and Clinical Key from Elsevier, the Access Medicine portfolio of speciality products from McGraw-Hill, Medscape from WebMD and Up-to-Date from Wolters Kluwer.

Continuing education/maintenance of certification

Physicians, lawyers, nurses and other professionals are required in many countries to pass licensing and recertification exams and receive ongoing training to maintain good standing in their profession. While study guides for such exams have long been common, they have largely moved online and have become increasing interactive, accounting for differences in individual knowledge progress. Examples include McGraw-Hill’s UMLEasy, which prepares students for the United States Medical Licensing Examination; the American Academy of Pediatrics’ PREP the Curriculum, which prepares physicians for the pediatric speciality board exam; and ACCP Seek by the American College of Chest Physicans, a mobile application that helps physicians prepare for certification or recertification in pulmonary, critical care and sleep specialities.

Laboratory workflow support

While scientists working in laboratories need to keep up with research in their field and so are avid consumers of journal articles, they also have a need for information that helps them conduct the experiments that they are paid to perform. A number of publishers, including Nature Publishing Group and Cold Spring Harbor Laboratory Press, have developed online protocols to provide precisely this type of support. Protocols are like recipes for performing aspects of experiments and include information on equipment, sequence and technique. The most novel entrant in this category is the Journal of Visualized Experiment, which provides peer-reviewed and professional recorded videos of laboratory experiments. McMillan has developed a new division, called Digital Science, to develop an array of products that assist researchers with day-to-day work, including tools for equipment sourcing, chemical compound searching and research tracking management.

Manuscript submission and review

Perhaps the earliest example of a successful, widespread, online workflow tool in scholarly publishing is that of manuscript submission and review systems. Such systems allow authors to submit manuscripts online and enable publishers to process manuscripts through an editorial process that often includes peer review by outside experts. Prior to the adoption of such systems, manuscripts were submitted and subsequently circulated via post, adding days if not weeks to each step of the editorial process.

STM and scholarly publishers moved to online manuscript systems en masse in the mid-1990s in concert with the industry move to online publishing platforms. While a number of publishers developed their own systems in-house, most used third-party systems provided under a software as a service architecture. Leading pioneers include Aries Systems, ScholarOne (now a division of Thomson Reuters), HighWire Press and E-Journal Press.

Content integration

While it might seem obvious that there is no such thing as ‘journal readers’ or ‘book readers’ as professionals distinct from each other, publishers have long operated their businesses as if this were true. Book and journal divisions are more often than not operating in silos with separate management, separate marketing and even separate online platforms. Researchers and other readers, however, do not limit their reading to particular formats but rather read multiple formats within a topical area. An engineer who specializes in microfluidic devices in nanotechnology will be interested in relevant technical reports, conference proceedings, journal articles and monographs on that topic.

A few publishers are delivering this kind of integrated content portfolio. Most of the large commercial publishers – including Wiley, Elsevier and Springer – include both books and journals on their online platforms. In physics and engineering, where technical reports and conference proceedings have long held more importance than in other fields, integrated portfolios are far more common. Publishers such as SPIE (the international society for optics and photonics) and IEEE have long combined their digital assets in ‘digital libraries’ that better reflect user information seeking behaviour. In the medical sciences, the American Psychiatric Association (APA) was a pioneer with the creation of PsychiatryOnline, which brings together the APA’s guidelines, continuing medical education, journals, textbooks, news, and reference works in one online portfolio.

Mobile devices

One of the primary reasons for the continued use of paper copies is their convenience and portability. A printed book, journal or a printed copy of an article PDF is far more convenient for long-form reading than reading on a computer screen for many individuals. In addition to being available off-line, the printed object can also be read more easily in a variety of situations and locations from planes and trains to sofas and beds. Any move to complete digital consumption of scholarly material requires a digital reading device (or devices) that provides all of the advantages of paper – portability, ease of annotation, lack of eyestrain – along with the advantages of digital delivery (the ability to access vast amounts of reading material in a slim device, backlighting, etc.).

Such devices are now being developed. Smartphones, e-readers and tablet computers (to say nothing of laptop computers) have made great strides in the last 5 years, with Apple and Amazon as the key innovators around device manufacture. These devices have been adopted quickly by academics and other information professionals. A recent Outsell report indicates that 70 per cent of US faculty and 83 per cent of college students in the US use a mobile device of some kind (Worlock, 2011). While there is not one device that ‘does it all,’ these three categories of devices target different use cases and market segments.

Smartphones

Recent advances in mobile computing and mobile device manufacture are transforming the way people interact with information the world over. Smartphones have existed for nearly a decade in the form of Palm Treos, Blackberrys and various phones for the mobile version of the Windows system. While a number of early innovators emerged in the STM publishing space with workflow decision support tools for clinicians (most notably Epocrates), these devices appealed to a small niche of professionals and were used primarily for email and administrative functions such as electronic calendars and notation.

The launch of the iPhone, followed by the subsequent development of Google’s Android operating system and more recently a refreshed Windows system, have moved the adoption of smartphones from a niche, business communication tool to a powerful delivery platform for STM and scholarly information. In the US, for example, 81 per cent of physicians are using smartphones, and 75 per cent own an Apple mobile device (iPhone, iPod or iPad) (Manhattan Research, 2011). Statistics were not available as of this writing for overall scholarly adoption of smartphones, but their adoption among all consumers in the US is approximately 40 per cent and climbing (Kellogg, 2011). Using this figure as a floor and physician adoption as a ceiling, we can postulate that the overall adoption of scholarly professionals is between 40 and 81 per cent. The actual number as of this writing is almost irrelevant due to the steepness of the adoption curve. With prices falling dramatically, it is a safe operating assumption that smartphone adoption will reach near ubiquity in STM and scholarly markets in developed countries in the next 5 years. In less developed economic regions, especially Asia (Qing, 2011) and South America (Gomez, 2011), smartphone adoption, while lagging Europe and North America, is growing rapidly.

While smartphones are excellent delivery systems for a wide variety of content, they are not ideal for journal articles, textbooks, scholarly monographs and other long-form reading or multi-media formats requiring larger display (procedural videos, for example). This has not prevented STM and scholarly publishers from developing an array of device-native applications and websites optimized for the smartphone. These include applications for single journals, such as Nature as well as applications for content portfolios such as the journals of the American Chemical Society. It also includes purpose-built applications such as ACCP Seek, developed by the American College of Chest Physicians for maintenance of certification.

E-readers and tablets

There have been numerous attempts to popularize e-readers and tablet computers over the last decade and a half, with little result until quite recently. The market was not ready and the technology was not sufficiently advanced. Moreover, no one had seamlessly linked the devices to content sources, until Amazon introduced the Kindle. The Kindle was revolutionary as it enabled one-click wireless download of any digital book available via Amazon. Tens of thousands of books were just a click away. Part of the reason for the Kindle’s success is that it is a purpose-built device: it is for book reading. It does not do much else and the display and size makes it less than ideal for other content formats such as magazines, textbooks or journals.

The iPad, by contrast, with its larger dimensions, color screen and touch pad, is a multi-purpose device, designed for reading, watching videos, surfing the web, reading email and running any number of thousands of applications (including the Kindle application). Unlike the Kindle, the iPad is great for reading journal articles, textbooks, magazines, and other content formats. The downside of the iPad is that it is much more expensive than the purpose-built Kindle.

The introduction of viable reader technology may very well signal the beginning of the end of printed books as a dominant information dissemination vehicle in STM and scholarly publishing. Print distribution ceased to be relevant over a decade ago in journal publishing but has remained the primary dissemination mechanism for long-form reading, including monographs, textbooks and reference works. This is because long-form reading was not practical on a computer screen – even that of a laptop. Modern e-readers and tablets designed with long-form reading in mind, and able to accommodate high-resolution color imagery and user annotation, are now available. And they will get dramatically better and dramatically cheaper over the next 3–5 years. Amazon released its next-generation color Kindle (Fire) in late 2011 at a price point that is scarcely more than the price of some textbooks. While sales of e-books represent only 6.4 percent of total book sales (US) at this point, publishers must, given the steepness of the adoption curve, assume that the rapidly advancing e-reader technology will be more than adequate for the needs of most professional users and that dramatically falling prices will render the devices ubiquitous (Association of American Publishers, 2010). Indeed, some universities and medical schools have already starting requiring them (Conaboy, 2011).

In addition to the improvements in e-reader and tablet hardware, great strides have been made in software standards. EPUB 3 and Kindle Format 8, both recently released, have significantly improved support for document formatting, allowing many of the complex structures found in STM and scholarly books, such as tables and mathematics, to be rendered as precisely in electronic format as they are in print. Additionally, these new formats provide much greater control for rendering document fonts and layout, which is essential for publishing text that contains non-Latin alphabets.

There are significant implications for STM and scholarly publishers, many of which have built content development workflows and business models around the print format. Such workflows will need to be re-engineered as will the business models and distribution partnerships that support their products. This is already beginning to happen as textbook publishers are now partnering with start-up ventures such as Inkling, which has developed a platform for digital textbooks on tablets (Reid, 2011). Some publishers, including PLoS, Elsevier and the ACCP, are developing tablet applications directly. Others are disseminating books via Kindle and other digital distribution streams. And others still are optimizing online book content for tablet reading. Indeed, to this last point, the line between application and online website is blurring with the emergence of HTML5, which is capable of supporting increasingly sophisticated online applications that can store content offline on users’ devices as needed. While still an emerging technology, HTML5 has the potential to further transform how users interact with information, standardizing a viewing experience across many devices.

Semantic technology

The meaning of content is currently written so that humans, not computers, can understand it. Semantic technologies provide a layer of metadata to content that allows computers to understand and make connections to it. For example, XML provides information about document structure. HTML provides information about a website’s structure. But neither, by itself, can tell a computer what the content in a document or web page is about. Semantic technologies provide this machine-oriented content information via an overlay of metadata. These metadata can come in the form of taxonomy, ontology, entity extraction or a combination of all three.

Taxonomy

A taxonomy is a hierarchical framework, or schema, for the organization of organisms, inanimate objects, events and/or concepts. From a computing standpoint, however, there are some important differences between a classic Linnaean taxonomy and a taxonomy that is useful in application logic. Taxonomies used for computing may assign multiple parents to a given taxon or root term. Taxons need not live in only one branch of a taxonomic tree but may be related to several. Taxonomies used in application logic also place a great deal of emphasis on lexical equivalents or synonyms. This is because computers do not know about synonyms unless they are explicitly told about them. Any first-year medical student can tell you that ‘heart attack’, ‘ST-Elevated Myocardial Infarction’ and ‘STEMI’ all refer to the same disease event, however even the most advanced computers require a thesaurus to make such connections.

Ontology

An ontology is a structural framework for organizing information about relationships between concepts, objects or events. An ontology explains how elements of a taxonomy are related to each other. Drug X, for example, may treat condition Y, but with adverse effect Z. While the relationship between drug X, condition Y and adverse effect Z are easy for humans to comprehend (because our languages evolved with our brain functions in mind – literally), computers need these relationships mapped in different ways.

Entity extraction

Entity extraction is the process of locating specific types of digital objects. These might include people, places, academic institutions, chemical compounds, genetic sequences, laboratory equipment, diagnostic devices, surgical equipment, clinical trials or any number of other nouns. The purpose of locating such objects is often to provide the reader with more context about the object. For example, a paper containing a particular chemical compound might link to a database with more information about that compound. Likewise for gene sequences or clinical trials. The identification of laboratory or surgical equipment might be used to provide links to purchase the equipment. Individuals might be identified to provide a list of all work authored by that individual. Entity extraction can be used in concert with taxonomies and ontologies to provide a more precise layer of semantic metadata.

Publisher interests in semantic technology

It is important here to disambiguate between the ‘Semantic Web’ and ‘semantic technologies.’ The term ‘Semantic Web’ is used, often synonymously with ‘Web 3.0’, to refer to the notion of interoperable datasets – or linked data. Many prominent individuals in computing science, government and beyond – including Tim Berners–Lee, the architect of the Web – are actively promoting the need for more interoperability of scientific and government data, not least of all due to the sheer volume of data being produced (Berners-Lee, 2009). The Fourth Paradigm (Hey et al., 2009) has generated a great deal of discussion around the increasing role of computing in making sense of the vast datasets being produced by today’s scientific research.

While interoperable data are of doubtless importance to scientists and technologists, it is not the area where semantic technology is most of value to publishers. Publishers are primarily concerned with semantic technology that enriches content for specific use cases: product development, search engine optimization and user personalization. These use cases are being served now by utilizing semantic technologies without depending on the larger development of the Semantic Web.

Product development

Wide-scale data interoperability requires many organizations both to agree upon standards and to invest in the necessary infrastructure [e.g. RDF (Resource Description Framework) triple stores, domain taxonomies, universal identifiers]. While such efforts are worth pursuing, it is not necessary to wait until they are in place for publishers to use semantic metadata for their own product development.

Product development at the publisher level need only depend on the efforts of that organization. Examples of STM and scholarly information products that have used semantic metadata include Wolfram Alpha, the Royal Society of Chemistry’s ChemSpider, the Journal of Bone and Joint Surgery’s JBJS Case Connector, and Elsevier’s Clinical Key. In the consumer space, Netflix, Amazon, Pandora and Zappos, among others, use semantic technology to significantly enhance their services. STM and scholarly publishers in many ways have less friction to overcome than their consumer counterparts as there has already been a great deal of work done in creating scientific and medical taxonomies and much of it is publically available. Such public domain schemas ensure future interoperability as new resources that appear in the marketplace will want to map to any domain standards.

Publishers and other information professionals are also beginning to use advances in natural language processing (NLP) to drive product development. NLP is used in speech recognition workflow tools such as the Dragon Diction application. NLP can, however, be used as an aide to the development of controlled vocabularies, especially enterprise schema that represent classification of a proprietary content set. It can additionally be used in concert with text mining to provide researchers with new filtering systems and more comprehensive analysis tools (Williams, 2011).

Search engine optimization

Semantic metadata can be used to increase traffic from search engines as such metadata can provide search engines with more information about the content being searched. While the leading search engines maintain their own sets of synonyms, they may not be as complete as those that can be provided by a specialized publisher.

If a unit of content is published, for example, that refers to the acronym ‘ALS’, how is a search engine to know whether the content is about ‘advanced life support’, ‘antilymphocyte globulin’ or ‘amyotrophic lateral sclerosis’? If the last, will that unit of content appear in search queries for ‘Lou Gehrig’s Disease’? Will it know that amyotrophic lateral sclerosis is a kind of bulbar motor neuron disease and would be relevant to searches on that topic? What about searches for spinal cord diseases? The kind of connections and topical hierarchy that might be obvious to anyone familiar with a topic often elude search engines on all but the more popular searches.

Recently, several of the leading search engines have begun working together, via Schema.org, on the development of standards that will facilitate a greater level of metadata exposure (Guha, 2011).

User personalization

While usually thought of in terms of content enrichment, semantic technologies can additionally be brought to bear to enable computer applications to better understand the people who are using them. People can inherit the semantic traits applied to content they interact with. Based on reading, searching or browsing interests – or the alerts the person has set up for him- or herself – an application can make personalized recommendations for additional reading. In other words, applications can increasingly utlize semantic technologies to anticipate the needs and interest of users to save them time.

In the consumer space, this type of personalization is increasingly common. Dynamic personalization and adaptive information environments driven by semantic technologies are used by Amazon to recommend products, by Pandora to recommend music, by Netflix to recommend movies and by Facebook to filter news feeds. In the STM and scholarly space, this type of personalization has not yet been deployed although less dynamic personalized services have existed for some time. Journal Watch, by the Massachusetts Medical Society, for example, provides a topical alerting service across a range of specialties.

The challenge with regard to STM and scholarly applications is in making the filters transparent to users. While it may not be terribly important to Pandora users why one song is recommended over another, for STM and scholarly users the stakes are higher. A researcher needs to be able to see all the available research on a topic before selecting various filters – and he or she needs to be cognizant of what is being filtered out. Recommendation services will be less problematic in this regard as compared with search filters.

Conclusion and outlook

One of the most heated discussion topics over the last decade and a half in scholarly publishing is whether and/or when print publication will cease. Over the same period of time, however, technology has rendered the question irrelevant. The Web has already become the primary mode of dissemination of scholarly information and print is now an ancillary distribution channel. With digital printers now capable of producing high-quality print copies of books and journals with very low print run (including single copies) the question of how long print copies of books and journals continue to be distributed is simply a function of how long there are individuals and institutions willing to pay for such copies. Much as there remains a niche market for vinyl records, there is likely to remain a niche market for printed artifacts.

On the one hand, the digital revolution is over. The wheel has spun and we are in a profoundly different era than we were a few decades ago. On the other hand, we are just now emerging from the incunabula period of digital publishing. Up until this point, digital products have largely been simulacra of print artifacts. We now see the first stirrings of entirely new types of information products, delivered across a dizzying array of devices, through an ever-expanding universe of business models, and via formats that are evolving before our eyes. In this sense, the digital revolution is just beginning as we are in the process of shifting to digital native products and services that are conceived, developed and delivered via digital means. These products and services will less and less resemble the print artifacts they are replacing, just as printed books bear only superficial resemblance to the codices that came before.

References

Association of American Publishers. Book Stats 2010. Washington, DC: Association of American Publishers; 2010.

Bergstrom, C. T., Bergstrom, T. C. The costs and benefits of library site licenses to academic journals. Proceedings of the National Academy of Sciences USA. 2004; 101:897–902.

Berners-Lee, T. ‘Linked Data’, Talk at TED (February 2009). Retrieved from: http://www. ted. com/talks/tim_berners_lee_on_the_next_web. html, 2009.

Conaboy, C. Medical student essentials: Stethoscope. iPad. The Boston Globe. 29(August), 2011.

Gomez, J. Latin America Telecom Insider. Pyramid Research. 3(6), 2011. [November].

Guha, R. Introducing schema. org: Search engines come together for a richer web. The Official Google Blog, 2 June. Retrieved 2011 from: http://googleblog. blogspot. com/2011/06/introducing-schemaorg-search-engines. html, 2011.

Hey T., Tansley S., Tolle K., eds. The Fourth Paradigm. Redmond, WA: Microsoft Research, 2009.

Kellogg, D. 40 Percent of U. S. Mobile Users Own Smartphones; 40 Percent are Android. Neilsenwire, 1 September 1. Retrieved 2011 from: http://blog. nielsen. com/nielsenwire/?p=28790, 2011.

Manhattan Research. Taking the Pulse U. S. v11. 0, 2011.

Reid, C. McGraw-Hill. Pearson invest big in Inkling digital textbook platform. Publishers Weekly. 25(March), 2011.

O’Reilly, T. What is Web 2. 0: Design Patterns and Business Models for the Next Generation of Software. O’Reilly Media (2005). Retrieved from: http://oreilly. com/web2/archive/what-is-web-20. html, 2011.

Qing, L. Y. Android sees 1,000 percent growth in SEA. ZDNet Asia, 10 November. Retrieved 2011 from: http://www. zdnetasia. com/android-sees-1000-percent-growth-in-sea-62302815. htm, 2011.

Schriger, D. L., Chehrazi, A. C., Merchant, R. M., Altman, D. G. Use of the internet by print medical journals in 2003 to 2009: a longitudinal observational study. Annals of Emergency Medicine. 2011; 57:153–160.

Tenopir, C., King, D. Towards Electronic Journals. Alexandria, VA: Special Libraries Association; 2000.

Worlock, K. The Use of Mobile Devices and Content in US K-12 and Higher Education Markets. Outsell. 31(March), 2011.

Williams, C. Computer that can read promises cancer breakthroughs. The Telegraph. 21(November), 2011.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset