7

Social tagging and its applications for academic and leisure reading

Abstract:

Social tagging is a system of content representation which is collaboratively developed by information users and made available to other users via Web 2.0 applications. In this chapter, social labelling is defined, the elements which comprise it are outlined and the criteria necessary for it to be used are discussed. Also discussed below are in-depth descriptions of the characteristics and types of social indexing of information and in addition to such critical perspectives as its usefulness as well as its strengths and weaknesses. The contribution of social indexing and social content administrators to reading for academic purposes is analysed from an academic perspective, while their impact on reading for pleasure is analysed by an examination of social reading applications.

Key words

social tagging

academic reading

social reading application

social reference manager

social labelling

Introduction

Of all the technical processes involved in information description, the description of contents is perhaps the best positioned process when it comes to linking a document’s contents, on the one hand, and a user’s desire to know about a document via an information search, on the other. Historically, procedures and tools for content description have evolved in response both to developments in information formats and to the emerging needs of information users. For centuries information has been conceived in largely static and linear terms, designed to be consulted individually within the confines of purpose-built facilities. However, current evolution in the way we conceive of information makes us think of it in dynamic and, in many cases, miltidimensional terms. Currently available technologies enable us to create and transmit information in very different ways, and, logically, the way we describe information content should also be different.

Not only can users create information, as we have mentioned in the preceding paragraph, but they can also participate in the information classification and description process, availing themselves of a wide range of information and comunication-based tools which are increasingly user-friendly and which provide unprecedented access to the profusion of documents on the web. Currently available tools help information users with the organisation and flow of information between/among on-line resources and other users, which in turn improves accessibility to and interaction with document contents, drawing an ever greater number of users and information into an all-encompassing knowledge vortex. These tools foster the organisation information and resources and the flow of information to other users, which in turn improves information access and multi-user interaction with the contents, in an all-enveloping spiral that ‘traps’ more and more users and more and more information. The trend towards user involvement in information description, known as social tagging, is triggering changes in the way professionals conduct information description and the metainformation about knowledge.

Until recently, the task of describing information content has almost exclusively been the terrain of documentation professionals. Yet, different procedures and tools are now being used, which means that in some cases information description is becoming de-professionalised, though the end purpose of content description, i.e. facilitate location, remains the same.

In one way or another, information is on the Internet, whether it is available in a visible, invisible or hidden form,1 and the way digital information is handled is vastly different from the way paper-based, or analogical, information is dealt with. This is not because the information itself is any different than it was, but rather because the society we live in has undergone a transformation: our learning processes are different, the way we search for information has changed, and even the way we relate to other people has evolved. The Internet has become a social and cultural space that encourages communication of a participative, multifaceted and decentralized kind which no other media before it achieved. There is no way to separate information content from the the means of describing it, particularly in the era of Internet communication.

There are a number of procedures for describing information content. One of the most common is by choosing a series of terms that can be extracted from document description tools that serve the purpose of limiting the range of vocabulary use, i.e indexing. ISO standard 5963:1985 defines indexing as ‘the act of describing or identifying a document in terms of its subject content’. The end result of this process is a closed listing of terms that will then be used to facilitate the searching process and therefore facilitate access to document contents.

One of the most important indexing tools are thesauri, from which terms that may be used in specific contexts, i.e. descriptors, can be extracted. These resources also help to establish the contextualisation of each descriptor by helping to determine what other terms the descriptors remit to, how they are to be ranked hierarchically and what term associations should be established. Thesauri are also useful for identifying those terms which would be adequate (non-descriptors) for identifying information contents in specific contexts.

Document indexing has long been the exclusive domain and a sign of identity for professionals working in the field of documentation, and indeed professional content descriptions created with some of the purpose-designed tools mentioned above can be very effective, though they are not without drawbacks. One of these major drawbacks is that professionals work within limited institutional fields such as libraries or archives, so the areas where they can act are limited in scope. Depending on what what area the centre they work for specialises in, the descriptors they use can range from the general to the specific, or they may be tightly fitted to some topic areas and not to others. An article on anorexia, for instance, would be described differently by information professionals working within the fields of medicine, or psychology, or sociology. Each individual centre will also use different sets of tools, with thesauri common among documentation centres but subject heading lists prevalent in library environments, and each individual tool developer and user will collect different terms depending on when these were developed and implemented. Information content description tends to be marked by the specific moment when it was conducted, on the one hand neglecting to reflect terminological variations over long periods of time and, on the other, reacting rather slowly to the incorporation of new terms.

Another drawback content description in professional environments can have is related to its inefficiency when dealing with information on a large scale. Manual indexing implies knowing an individual document’s contents, though not necessarily in depth and comprehensively. A bare minimum of knowledge of certain parts of each document is recommended, as the ISO 5963:1985 clearly sets out, and this can transalate into lengthy document processing times.

In order to tackle, or at least alleviate, the problem that very high volumes of documents can present, automatic document processing systems have been developed. Automatic systems do allow professionals to process vast amounts of information, though, as with manual information content description, automatic processing has difficulties processing the language of the documents themselves, which may contain synonyms, homonyms, anaphoric referencing2 and ellipsis (Gil Leiva, 2008; Ros Martin, 2008). Despite continuous improvements, automatic systems of information content description have yet to develop the perfect solution.

Both indexing procedures described above, namely manual indexing by professionals and automatic indexing, exclude users from the information description process. In other words, information users are reduced to the role of simple subjects who have to accommodate their search term vocabulary to the terminological choices of the indexing tools (thesauri or subject headings). The end user simply adopts as his or her own the terms accepted or rejected by these tools, thus also assuming that the frequency with which these terms occur in the texts is accurately represented in the data base3 (Kowalsky, 1997; Baeza-Yates and Ribeiro-Neto, 1999, Anderson et al., 2005).

Undoubtedly, the quality of information content indexing by both methods is good, but the characteristics of the social context in which information is generated in today’s society must be taken into account. Information indexing needs to consider how pre-eminent digital formats are in an environment where the exponential increase in the volume of document generation makes traditional methods of content description impossibly inefficient. Not only is there a need for efficient content description but also a need for users to employ IT tools which are much more efficient, and allow document location and retrieval in the minimum amount of time. This feeds into a whole recent set of expectations that information users have created for digital information: it must be immediately available. Many of the indexing and searching resources on the Internet are not associated in any way with specific institutions or their physical locations, though in some case they are associated with a number of cooperating centres. As we have mentioned above, manual indexing should be done in conjunction with the social context in which documents are embedded. All of the characteristics outlined above have driven information users to seek out alternative ways for describing content with greater immediacy, which has led to a greater involvement of users themselves in the process of describing content. This has led to what is being called social labelling, an increasingly common and important process for describing information content in a collaborative and distributed way (Figure 7.1).

image

Figure 7.1 Factors which have led to the development of social labelling Source: Compiled by authors

Social labelling improves virtually every aspect of the search and document retreival process, but on the following pages we will focus on how social labelling can improve the reading experience, whether it be reading for pleasure or reading for academic or scientific purposes.

Social indexing from an agent perspective

We have already seen how social labelling emerged as a response to the need to create content description systems that were more malleable, quicker, and more flexible than traditional indexing systems. In the new social labelling systems that have emerged, users describe information resources in natural language and these metadata are shared with other users via Web 2.0 tools. This is how content descriptions are made available to users in the most immediate of ways and the same system favours the retrieval of the information sources described.

Who does the labelling and how it is done are what makes social labelling different from traditional content labelling. The resulting labelling itself is different, yet the purpose this labelling serves remains the same: facilitate information retrieval.

This new paradigm for handling information description is informed by complex cognitive processes that take place within a newly conceptualised social framework. Social information labelling draws from a multitude of different sources, and these new labels contribute to the generation of new information, which in turn drives further information production about the information itself (metainformation).

Social labelling grew popular as more and more blogs were created. Bloggers added descriptive labels to their published contents, though with no terminological constraints to this process. Quite simply, such a system is an explicit way for users to generate metadata, though this sort of content labelling tends to lack structure and is very often inconsistent.4 Nonetheless, social labelling on blogs aids in the retrieval of contents.

In order to better contextualise the current state of social labelling, a brief overview of the evolution of content labelling is in order5 In the section that follows, we will examine the various agents involved in the process. We will also describe the four different ways of content labelling that have prevailed up to the present time, though with the proviso that none of these have disappeared with the emergence of social labelling for depending on the contexts in which content labelling is used, all of the various modes can be used at the same time.

Indexing as performed by information and documentation professionals

The first of the four conventional types of information content labelling is conducted solely by professionals who use mainly purpose-designed tools, thesauri6 and subject heading lists to describe the contents of documents.

The ultimate goal of such tools, which are enormously complicated to design and update, is to establish an unambiguous language that can be used for description of document contents and that can be used to search for documents containing information users need. Quite expectedly, they tend to be excessively rigid and difficult to keep updated with the latest developments in language usage. Information generated as a result of the use and maintenance of thesauri are generally high quality, but it comes as the result of significant investments of time and effort. This often makes the information contained in thesauri impossible to use for the vast amounts of information on the web.7 It should also be noted that even within a single field or specialised domain the use of a single thesaurus with a common set of universally adopted and precluded terms may not necessarily coincide. Furthermore, a topic may be approached from an entirely different viewpoint depending upon the general approach taken by the thesaurus itself, which will therefore lead to a document being indexed in an entirely different way. One final point that needs to be addressed when discussing how professionals index a document’s contents is how the stage of document description is often done either alone or in small groups while following the guidelines set out by the organisational environment in which they work, which is often rather disconnected from the direct social connection between authors and document users.

Author-based indexing of information content

As an alternative to the system of professional document description discussed above, the authors themselves participate in the process. This is what happens when the authors of scientific articles or blog posts include key words that enable visitors to locate content quickly. In cases like these, key words may be extracted from a tightly controlled language tool or simply chosen by the authors themselves with no specific language tools for guidance. Author-based content description can help solve the problem caused by the time lag between the vertiginous amount of information generated and the processing speed needed to process it all. But when authors do not use tightly controlled language tools such as the thesauruses used in professional indexing document searching capacity can be lost as a consequence. Author-based content description, therefore, is more agile than the sole use of professional document description, but it introduces a loss factor for users to retrieve what they need.

This type of indexing is also done while alone and is subject to no organisational directives of any kind. As in the case above, the potential information users of the information described are disconnected from the process of information description.

Automated processing of information content

In response to the vast volume of information and documents being generated, a number of different ways of automated processing are starting to be developed. This third mode of document content description has attempted to use a number of different mechanisms. In the 1950s, Luhn and Zipf’s automatic indexing research broke new ground as they sought to base the description of texts upon statistical values. Over the course of time, their system has evolved and given way to a much more complex content description model based on (morphological, semantic, syntactic, and pragmatic) language levels (Gómez-Díaz, 2005).

User-based indexing of documents

The last of the four conventional types of information content labelling is the kind of indexing done by the users of the documents themselves, who can also be retroactively involved in document description tasks. This is what is known as social labelling, and what most differentiates it from the other modes of content indexing is the user-to-user nature of this mode. In order for users to describe to other users the content of a documents, they avail themselves of tags, or metadata, i.e. information about the information in the document, which helps other users access the content.

User-based tagging uses no specific term extraction tools, as the descriptions each user writes or chooses are matched with those provided by other users. Ideally, this collaborative working environment which brings together a number of different agents involved in the document creation/dissemination/use process minimises the problem of how to describe the proliferation of documents in a multitude of formats that are published on the web every day.

Social labelling of contents must be understood both as a compliment to professional and author indexing of contents and also as an evolution of the indexing process itself, which is propelled and nurtured by the new social and technological environment in which we are currently immersed. Figure 7.2 contains an outline which illustrates how the four modes of content description operate.

image

Figure 7.2 The evolution of indexing from the agent point of view Source: Compiled by authors

The process of social labelling parallels the development of the web itself. In the early stages of the World Wide Web, interaction was unidirectional, travelling from the information producers, typically information and documentation professionals/authors/computer scientists, to the final user; however, present-day interaction tends to be bidirectional, as the number of information users/producers has increased. In fact, thanks to the information content on the web the information producer/information user relationship stands to grow considerably. Social labelling, it should be noted, will have quite a lot to contribute to the next stage in web development, as the step towards the semantic web will have much to do with how information is processed and what descriptive metadata are associated with that information. User-based information, in upcoming sematic web developments, will be of the utmost relevance.

It is important to stress that social labelling or tagging uses non-specialised vocabulary, and the analysis of this vocabulary can be done using the words which web users type into their search boxes. Besides, the combination of different labels can serve as the basis for developing the semantic web, as these words can be visualised according to their degree of representativeness within a semantic field, and thus within a social community.

The social indexing triangle

Social indexing is comprised of a triangle of three intervening elements, based fundamentally on the large set of information resources, which includes a range of documents from texts and non-moving images to blog entries and books. On one side of the triangle are the people involved in the content description process, while on the other two sides are the labels or tags that the people use to describe the resources.

image

Figure 7.3 The social indexing triangle Source: Compiled by authors

The process of freely assigning labels to information and objects in order for millions of other users to be able to locate them via Web 2.0 search engines is called folksonomy. First coined by Thomas Vander Wal in 2007, this term derives from ‘folk’ and ‘taxonomy’ and refers to large term sets which have been created, developed and spread through user-generated content description, collaborative information development services and the dissemination of the folksonomies themselves. The purpose of these large term sets is to enable information objects (documents, photographs, videos and so on) to be easily and quickly located and retrieved.

In order for the process of information content labelling to be considered social it must be done in a shared and open environment. Whereas classification systems and taxonomies have clearly established explict hierarchical relationships among contents, socially labelled contents tend to be on a level field where there are no such associations. Quintarelli states that ‘the power of folksonomy is connected to the act of aggregating, the power is people here. The term–significance relationship emerges by means of an implicit contract between the users (2005). This is how folksonomies now have become one of a new generation of tools for information production, development, description and retrieval that have emerged from the Web 2.0 environment, where an ever-increasing number of web users are creating and sharing information on-line (Peters and Becker, 2009).

Thanks to social labels a common ground is created where the contributions of many users of different services converge, without any centralised intervening body or any more authority than that of the users themselves as a body. Since the language used is common to all users, it is easier to find information in this common ground, and since new labels are generated continually it is easy to find out what the emerging tendencies are and detect neologisms that might crop up. By doing the information content labelling themselves, users are able to emphasise their own personal experience and interaction with the web, and contribute to the web by aggregating metadata, by using labels to organise and categorise their own digital collection of documents, links, photos, and other files, by linking their content to those of other users and by socially constructing a classification system (Kroski, 2007). We cannot stress enough that ‘folksonomies have turned the organization of knowledge systems into decentralized, distributed processes which have done away with all hierarchical ordering of terms and which have enormously facilitated web indexing and web resource retrieval’ (Rodríguez Bravo, 2011). Because of this, social tagging is an example of how social software has helped redirect traditional approaches to content categorisation towards greater flexibility and better use of resources.

Within a folksonomy, tags can play with letter type sizes in such a way that the most important appear in larger or smaller print, which makes it easy to locate the most important resources within alphabetically ordered lists. These are known as wordclouds, a term coined by Jim Flanagan, who used them for the first time on the image sharing website Flickr.

Wordclouds are used today on thousands of websites, and a number of programs such as Tag Cloud Generator,8 Tagxedo,9 Tocloud,10 and Wordle,11to name but a few, have been developed for creating them.

Social tagging criteria

Social tagging has both a personal and social dimension, the latter of which is most important for information retrieval. Personal information tagging refers to labelling information for our own organizational purposes, while social tagging refers to labeling information for the purposes of sharing it in socially networked environments using web 2.0 technologies. In order to establish common criteria for information retrieval, users can focus on such things as:

image Contents: A content description would contain information about what the document contains. For instance, if we are describing an article about social networks we would use terms such as ‘social networks’, ‘Facebook’, ‘Twitter’, and so on.

image Context: A context description would contain information about where the object is located. For instance, if we are describing an image of Big Ben, we could use London as a descriptor.

The social dimension of these two criteria will allow any user who needs to locate information about electronic books or images about London, for instance, to retrieve content associated with these tags, or labels. The problem with this should be quite obvious, though. In the first case, there is no terminological standarisation for content labelling, and, in the second case, there is nothing preventing users from mixing the varying levels of specific context description.

image Emotions and feelings: in labels that reflect emotions and feelings, they in no way reflect what appears explicitly in a document but rather what the user perceives subjectively. An image of a rainy city street could be labelled as ‘sad’ or ‘nostalgic’, or a complex, argumentative article could be described as ‘dense’ or ‘hard-to-read’. The very subjective nature of such descriptions makes their usefulness for other users lower than that of preceding categories, since what one user describes as ‘dense’ for another may signal ‘very interesting’, a positively connotated expression. Traditional indexing would never record the sensation that the document caused the professional while conducting the indexing process, but in a socially networked environment, where several documents on the same topic may be available, this type of information may help users make their final choices.

Two solely personal categories could be added to the three already discusssed above, though users do not tend to share them and if they do other users find them so personal as to be of little help.

image Organisational tags: such tags describe personal materials or activities, and often includes the names of folders in which documents are located (‘to be corrected’, ‘to read’, ‘to file away’, and so on.

image Origin tags: such tags indicate where the contents came from, whether from a blog or a personal friend, for instance, whose names would be the names of the files or folders which contain the information.

Since different social labels are assigned by different users at different times, a folksonomy can contain labels from all categories or only from some of them, aggregating a number of different viewpoints.

However, not all resources allow all users to collaboratively index information contents, though there are two general types of folksonomies that can be generated depending on whether the resource allows for broad or narrow folksonomy generation (Figure 7.4). In broad folksonomies, any user can tag any resource with any label. In narrow folksonomies, only the author, or a limited number of users, can tag a resource. This means that in narrow models the semantics of the resources depends exclusively on the labels assigned by the narrow group of users; because there is a lower number of labels leading to the resources, semantic searching for these resources is much more difficult.

image

Figure 7.4 Types of folksonomies Source: http://www.vanderwal.net/random/entrysel.php?blog=1635

Characteristics of social tagging

Although we have already examined many of the main features of social tagging, we would like to examine the thorough description made by Hernández Quintana (2008):

Social indexing, or social tagging, can be found in contexts where the understanding, assimilation and treatment of digital objects takes place (contextuality). Besides, the vocabulary used for tags or labels is comprised of the words used by the wide range of users (authors, creators, systems users, and so on) i.e. the vocabulary for tags stems from and is geared towards the user. Social labelling supports cooperation because it encourages individuals to share their viewpoints, and since no category, class or term is prioritized over any other, collaborative tagging is non-discriminatory. This helps systems that adopt it to handle a wide range of registers without the need for adjustments or restrictions in the categories of information used to collect metadata for all types of documents. In addition, the ultimate purpose of folk labelling is not to create hegemonic and speculative systems but to establish useful metadata categories for the broadest and most diverse possible comprehension and navegation for Internet searchability (not-for-profit).

Not only are social tags generated by those who create or administer contents, but they can also be generated by any user. This is how collaborative indexing of information contents becomes de-professionalized, as we have seen above. Social tagging can use a broader terminological base, as it can use terms from several fields of knowledge, and it can better incorporate new terms as they crop up and adapt to the different contexts where these terms occur, with no need to restrict itself to closed terms sets of vocabulary or specific classifications of terms (adaptability). In this case, the lack of term standardization translates into greater flexibility in content representation. On the other hand, content labeling is instantaneous and dynamic, and the ability to change the designations immediate. This means that, rather than having to depend on a system with pre-established standards or criteria for approval of a tag or label, each user can create his or her own at will (personalization). Coupled with the immediately preceding characteristic, it is important to remember that the incorporation of new terms does not entail costs (cost-effectiveness), which means that socially tagged terms can always be up to date (regeneration). Tools that incorporate social indexing make terms available to other users in real time (comunicability). Systems that use social indexing are ideal media for organizing individual collections housed on public domains, which would allow for the study of personal and group relations within a social network (negotiation).

In addition to the characteristics of social indexing mentioned by Hernández Quintana above, Kroski (2007) contributes an additional characteristic. She holds that social tags allow users to locate previously unknown resources because they do not utilize binary models, which only allow resources to be described as belonging to a single category, but rather they enable policlassification, which allows the same resource to be hosted on various different places with slightly different focuses in each location. Kroski also adds that social tags are self-regulatory in that users will tend to use tags which were most successful among those used by previous users. They also allow users to follow ‘wish lines’, namely what users can follow are direct information needs. Considering that a folksonomy is the result of a representation process, what the user is really doing is reflecting on how he or she understands the category of information itself.

Social tagging: types and uses

In the following section, we are going to describe the types of social tagging that exist and the usefulness of each type for the final user. Quite logically, not all of the social tags have the same value for the user community, as the motivation behind tag creation and writing will determine how successfully the tag performs for other users.

Javier Cañada (2006) has established four types of social tags in which each one is defined by the motivation behind the person doing the social tagging and the social value that the tagging generates for the community (Figure 7.5).

image

Figure 7.5 Types of social tagging and social value Source: Compiled by authors based on Cañada (2006)

The first type of social tag described by Cañada is the ‘egotisitcal’ tag, thus called because of it entirely personal nature. ‘Egotistical’ social tags make sense to the person who assigned the label but they lack meaning for the rest of Internet users. Some social labels of this kind may be ‘to be revised’, ‘re-read this’, ‘summer_2012’, and so on. Despite the very personal nature of these ‘egotistical’ labels, they may have a certain usefulness to others (‘funny’, ‘holidays’, and so on), though they are likely to be of little use. Because the motivation behind this kind of labelling is completely personal, the social value of ‘egotistical’ tags is variable.

The second type of tag is the ‘friendly’ tag, called as such because it is has been created thinking not about one’s self but about a group of people with shared interests. This type of tag is valued by the group, who may be a local association of cyclists or a reading club, and the motivation behind the labelling is high. These tags are often re-shared and they reinforce the group’s common identity, generating a high social value, though often limited to the specific group.

The third type of tag is the ‘altruistic’ tag, called this because it has been created with the idea of sharing something. For ‘altruistic’ tags, users look for the most descriptive terms that are most widely recognised. This type of social tagging is the most difficult to do yet the social value is the highest for it contributes the most to information retrieval. Examples of ‘altruistic’ tags are ‘educational resources’, ‘programmed obscolescence’, ‘glass recycling’, and so on.

The fourth and final type of social tag is the so-called ‘popular’ tag, called precisely that way because the main object is to draw attention to their content with such enticements as ‘top ten’ and ‘epic’ and to maximise the chances of users clicking on them. The motivation of the people who use ‘popular’ social tags is extremely high, but the social value of these types of tags, no matter how ‘very interesting’ they may be, is very low and their descriptions are completely subjective.

The social value of tags stems from the ease with which users can find the information they need. However, not all types of resources that can be socially indexed nor all types of social tags have the same value for user communities. What is more, these tags evolve as the language itself evolves and therefore are proof of the process. Social tags allow us to track back to when certain neologisms were first introduced, and when certain topics started to become popular.

Another related question is how some citation managers that include social indexing features can indicate which social tags are the most popular; CiteULite’s Cite Geist includes a feature that can order resources by the number of times they have been tagged. This feature allows researchers to infer that the articles which have most often been shared will be highest on the most popular articles lists; in other words, while researchers are selecting and categorising the articles they value most, the citation manager is collecting and supplying information about the articles that are most popularly valued. With respect to this, Taraborelli (2008) states that in the foreseeable future the popularity indexes culled by social reference managers will become as important a factor as the references themselves in the evaluation of the scientific content of an article. This is fundamentally because the popularity measurements for data are extracted from the natural behaviour of users, since when a researcher selects an article from his personal reference manager it is assumed that this is because he or she is interested in reading it, not because the article will improve the popularity of his or her article.

Strengths and weaknesses of social tagging

Just as we have already indicated, social indexing is based on the naturally-ocurring language of Internet users, and this is one of its greatest strengths. Over time, the evolution of language use can be traced, and at any give moment its stage of development can be seen quickly and instantly. Besides, with social tagging, there is no need to invest in the construction of documentary languages. The simplicity of social indexing allows any non-expert user to find information in real time and describe information content for images, video, articles and other objects effectively. Since information description is done collaboratively, new concepts and meanings can easily come into play, which in turn broaden search options and the potentialities of content retrieval.

Some social indexing applications require multidimensional tagging. This means that each user is required to tag each register in his or her personal library for each reference, which is the case of CiteUlike and Mendeley. This implies that every document will have the same number of tags as the number of users who have added it to their libraries, thus preventing registries from being empty and also impressing upon all registries in these applications a cross-tag coherence among all users. This is how a tag cloud is created. At first glance, the tags which have most often been assigned to a resource appear largest. By forcing all users to tag resources, such systems guarantee the social indexing of resources. Such systems also contribute to the development of a collective knowledge base where all participants can contribute something and broaden the horizons beyond where professional documentation personnel can go.

Social indexing is mainly done using uniterms, which greatly facilitates the break-down of concepts into smaller units as well as the automatic processing of terms (compound terms can be problematic). It leads to open and dynamic information systems and it helps to create a foundation for inter-system relationships based on inter-related tags. Social indexing also results in a high information retrieval rate when standard terms from a specific scientific-technical environment are used for indexing. It is interesting to note how social-based indexing fosters a community spirit and collaborative networking environment. Lastly, this type of indexing contributes to the development of the semantic web, as it generates lists of multiple synonyms derived from social tags which are used to construct the vocabulary of the common user.

Although uniterms are an advantage insofar as they are straightforward and they enable automatic processing, they lead to inaccuracies as there is no single way to process entries comprised of more than one word. In addition, each system has its own rules for using multi-word labels.12 Another one of the major weaknesses of using social tagging comes from the traditional problems inherited from the non-controlled vocabularies (Ros, 2008), which particularly affect consistency, caused by the uncontrolled use of polysemous, synonymous and ambiguous terms. These inconsistencies, inherent to natural language use, have negative consequences for information location and retrieval. Uncontrolled as social tagging is, there is no oversight, no criteria and no guidance. It is common to find different languages used in the tags for the same resource (McCulloch and MacGregor, 2006). Concepts themselves are often mixed up in puzzling ways or isolated from their context in such a way that they make no sense, leading to little more than noise and low effectiveness as key words. To make matters worse, the wide variety of tools places the tags on the same resource on different locations. Social indexing, in a way, is an information location system based on serendipity, a far cry from an attempt to construct a system seated upon a balance between comprehensive and pertinent information description (Seoane Garcia, 2007).

The main criticism of social tagging is directly related with the Web 2.0 environment itself, which is fact that any user can upload any information he or she wishes, the reliability of which is suspect, since there are no control mechanisms in place to guarantee the quality of the information on the web.

Social indexing applications

There are a number of applications that use social indexing to descibe document information contents in such a way that other users can benefit from their content descriptions. It is often the case that users of these applications are unaware that the way they describe document contents can be more or less helpful to other users when locating useful information.

These applications can be classified by the types of contents they are used for, although in some cases the same application can be used for a number of different content types. This is the case of Panoramio, for instance, an application which can be used to search for images and also contains geographical data about the places where the photographs were taken, or the case of a social reference manager which also enables users to manage a social network within a specific area.

Table 7.1 contains a listing of most of the recently available applications that contain social indexing features. Although we have tried to offer a complete picture of where social tagging has a notable presence, new applications are being developed on a regular basis.

Table 7.1

Social indexing applications

image

image

image

aThis application can also be considered geolocation

Source: Compiled by authors

Social indexing applications of special interest for reading

All the applications included in the previous section are very useful. Given the subject of this book, those designed to locate or discover new readings, whether in the academic or leisure field, are particularly relevant. The former features social reference managers and the latter reading networks which also incorporate characteristics of social catalogues. Both application types have their specific characteristics and should be explained.

Social reference managers

Social reference managers allow the on-line storage, management, and sharing of bibliographical quotes and also facilitate the discovery of other bibliographical references related to themes of interest predefined by the user. They facilitate the work of administering the bibliographical information compiled from various databases (WoK, Medline, Mla, LISA, library catalogues, etc.) and also offer the possibility of creating, maintaining, organising, and formatting the references according to the various quotation systems (Ansi, Harvard, MLA, ISO, Vancouver, etc.). This facilitates the repetitive task of the administration of bibliographical information by automating the process. These tools constitute an evolution of the traditional reference managers, and also have the potential of social networks in the discovery and sharing of bibliographical information (Cordón-García et al., 2012b). The shortcomings of social markers are compensated in this manner, as although they allow the compiling of information from the addresses they do not facilitate the importing of metadata from the bibliographical information references.

As is the case with all social tools, the sharing with others of the information and the description of this information enables the creation of a social network around certain themes, which encourages the dissemination and therefore the discovery of new information.

In general terms these tools operate in a simple way: when a user discovers a document on the Internet he or she selects it and incorporates it to his or her system. From this point the operation of the various applications is simple; it is a case of allocating the labels or tags considered necessary in order to describe the resource and facilitate its later use. In this manner the collection of references is created and administered with added value, and as it is an Internet application access to it is possible from any computer and the collection can be shared with other users. In this sense new applications are beginning to be developed for devices both with the Android operating system and iOs. It is important to bear in mind that labelling or tagging originates to organise personal work, which means that subjectivity will always exist; however, as it is made available to others, personal benefit becomes collective.

Some applications allow the selection of certain people and the follow-up of what they incorporate and label in their profile; this means that what they have discovered can rapidly be known. It is also possible to group the various tags in scientific categories and within these in turn according to their frequency of use, to obtain the so-called popularity indexes, which can be useful to mark trends in certain scientific circles.

Although many programmes can be included in this category, in the following pages only CiteUlike and Mendeley will be described.

CiteUlike

http://www.citeulike.org

image This program was created at Manchester University and was specially designed for scientists and academics working in shared academic environments who need to know what their colleagues are reading and also to recommend readings. CiteUlike has gradually become one of the main websites of reference allowing the optimisation of the processes of the storage and administration of bibliographical references.

References can be incorporated by using three different procedures:

1. Search from the application itself.

2. Direct search through external sources.

3. Indirect search through internal sources and the importing of files in RIS (Research Information System) format from other databases.

From the application itself: the references are captured by means of a marker of favourites that can easily be installed in the browser (the Post to CiteULike button) and compiles the bibliographical data appearing on a website. When they are incorporated to the system the user can allocate descriptive tags of their content which are used to organise and retrieve the information and also to establish systems of recommendation that appear in the user profile.

From external sources: in this case the references are captured by a marker of favourites, which is what obtains the bibliographical information that appears on the website. When they are incorporated the user has to allocate the corresponding tags to indicate the thematic areas. To install them he or she clicks on ‘Post to CiteUlike’ and by using the right-hand mouse button adds them to the ‘Favourites bar’ of the browser.

Indirect search: this refers to a search made on databases. The results can be exported in various formats including RIS (Research Information System); it is precisely this format that allows importing to CiteUlike.

Regardless of the mechanism that has been used to incorporate the registers, the user will have to label them. The program shows what tags have been allocated by other users, but each user must label them again. This contributes towards the labelling of all the articles and the folksonomies of the resources are gradually completed.

The program has two possible levels of information, i.e. public and private (MyCiteUlike). One of the advantages of this system is that it makes it possible to capture references from external sources and to export references from blogs or news. By means of ‘add to any’ it is also possible to disseminate the information again and to send it to the manager of references.

One of its special features is that it is possible to know through ‘Citegeist’ which articles have been shared most times and are therefore the most popular, and also with which other users, known as neighboards, most documents are shared. The popularity index measures the number of authors who have read or compiled each bibliographical reference in their personal manager; this information can be useful in order to know which articles have been compiled by many researchers and are therefore likely to be important within a specific field.

It also makes it possible to know which tags are similar to those we have allocated (Recommendations). Finally, the tags are shown in the form of a list or cloud, which is another way of locating information of interest to us. It is important to emphasise that all these systems of scientific discovery, together with that of recommendations, are effected in a personalised manner depending on the tags that are introduced by each user. In this way a user including medicine tags will receive recommendations from this field. Likewise, each user can further adjust his or her profile according to affinity and co-occurrence.

CiteUlike allows the creation of user groups around a theme and the sharing of references among its members; information is compiled in a single place. It is also possible to create a blog of the group, to classify users by research areas, and to locate other researchers with the same interests by means of the Research Field option. Likewise, thanks to the Watchlist (or follow-up list) option users can find out what references are included by other users with similar profiles to their own and therefore of interest to them, which will help them to be permanently up-to-date.

These recommendation and alert mechanisms appear in each user’s profile in modules that the user can add, eliminate, or move according to his or her preferences or needs. Likewise, this application provides a complex network of internal relations between users, articles, and tags with the aim of encouraging the discovery of scientific information as can be seen in Figure 7.6.

image

Figure 7.6 Relations between the elements of CiteUlike Source: Compiled by authors

It also includes a Really Simple Syndication (RSS) channel that allows the dissemination of information, together with an Application Programming Interface (API) for Netvibes with the same purpose.

Finally, it is worth mentioning the possibility of synchronising the CiteUlike accounts with the social marker Del.icio.us and therefore of transferring the references between both systems automatically.

Mendeley

http://www.mendeley.com

Mendeley is an owner program but its use is free. It was created in 2007, although its first version was not published until a year later. It allows the administration of the digital library based on the research documents that may be present in a computer and their sharing, as it combines its desktop version (Mendeley Desktop) with the web version (Mendeley web) and incorporates a pdf administration application, management of references, and an on-line social network for researchers. It is available for Windows, Mac, and Linux, and an app also exists that can be installed both in devices with iOs and in those of Android to facilitate the portability of the information, as although it can be accessed by means of the web browser this application makes work easier as it brings together all the information on the Tablet itself.13 It is compatible with various browsers and platforms and allows the combination of the functions of a traditional reference manager with those of social reference managers. One of its most important features is that it has a tool that allows the obtaining of statistical information on the documents, authors, and subject matter that are most frequently used in a specific field together with the shared references.

Mendeley is compatible with over 50 academic databases, including 2Google Académico. Moreover, it also offers the possibility of importing and exporting files in various formats such as BibTeX, RIS or XML, or EndNoteTM, and imports documents and references from other applications such as Zotero and CiteUlike; it can be linked with the Microsoft Word and OpenOffice Word processors and a bibliography in these programmes can be generated in a simple manner.

Like other programmes it allows both a private and public profile and also offers the possibility of creating profiles instead of linking them to one person in the group. Each group constitutes a social network of people interested in a subject, where they can not only share bibliographical information but also add information on the group itself.

Finally, it provides statistical information on the most frequently downloaded authors or the most frequently consulted tags in the various scientific fields.

Social reading

Social reading is that carried out in virtual environments where the book and reading favour the creation of a community and in exchanging of information, establishing horizontal relationships between the various members and where works are assessed and labelled. The social label is a double vector that points towards the annotations and comments that the users make and read and also towards the labels they include in the books, which is precisely what we intend to highlight in this chapter.

Social indexing associated with reading is carried out by means of networks specialising in reading and applications allowing social cataloguing. Through them all bibliographical information can be obtained on books (author, title, publisher, year, etc.) together with descriptive tags, punctuation, and comments on the bibliographical registers carried out by their users. According to Margaix Arnal (2007), this type of applications should allow users to select documents as favourites, to organise them into files, and to share them with other users. Other additional options are also contemplated, such as the inclusion of the information on social network tools, offering subscription to Really Simple Syndication (RSS) channels, personalising and arranging search results according to social information, browsing by labels, etc.

These applications benefit two sectors, one of which is institutional (libraries) and the other personal (any user). In the case of libraries, this new way of administering the bibliographical collection is of great value because it allows the enriching of their catalogues with collaborations by users, who label the information in a different way to a professional to provide a complementary vision that is often more immediate and useful to other users. These platforms generate a sense of community as well as affording great flexibility for the creation of various knowledge taxonomy types based on social indexing models. In the second and personal sector users can create their own virtual library from the registers contained in the different databases, managing their physical book collections or constituting a social network based on the field of reading, which allows not only the incorporation of bibliographical information on books but also the description of the contents following the principles of social indexing, thus facilitating the location of the information and the obtaining of recommendations by other users, or simply the finding of books more in keeping with their tastes.

Finally, the applications of these platforms for the world of education should be stressed, such as the following:

image Creating the institution’s virtual library.

image Creating a group to include all members of a reading club.

image Creating a network to keep books and share literary tastes, reviews, etc.

image Establishing popularity indexes of a work.

image Creating lists of recommended books.

image Classifying works by genres, periods, characters.

There are a number of ways in which social indexing applications affect reading; some of them include specific developments for mobile devices and are linked to different social networks. As it would be impossible to mention them all, just a few have been chosen.14

LibraryThing

http://www.librarything.com

This platform shares many of its characteristics with social networks as it allows the creation of a library that includes not only a description of each book but also additional information such as tags, assessments, reviews, etc., provided by any user to give added value to any catalogue that only includes ‘objective’ information on the book. LibraryThing content has been used in several tag analysis experiments and social tagging exploitation systems (Kakali and Papatheodorou, 2010).

Once the user has created his or her profile so as to access information on a book it suffices to key in its ISBN to capture the data. All users can create their collections by adding the books they wish, giving them the arrangement they wish, and adding the information they consider to be appropriate. The information that can be added includes tags to give additional value to information on any traditional catalogue, in which the user simply receives information but cannot add to it with his or her own impressions.

This application offers the possibility of creating open or closed groups on any subject. It also allows the carrying out of searches by means of the clouds formed by the tags included in the descriptions of the groups, which gives a vision of the importance of each group as a part of the platform.

LibraryThing tags allow the categorising of books in accordance with those opinions solicited from the users of this network and not as they would be labelled by a professional, where an opinion would be out of place.

The tags included may be individual words such as ‘fantasy’, ‘friendship’, ‘history’, or phrases such as ‘historical novel’ and ‘graphic novel’; as these examples show they may refer to a genre or to contents. Tags therefore constitute a simple and practical way of finding new books in keeping with user tastes and preferences.

Shelfari

http://www.shelfari.com/

In the same sense as the previous applications, Shelfari is also a social network that allows the creation of virtual libraries with the titles possessed or that have been read and the assessing and labelling of and commenting on books and the discovering of people who share our literary tastes. Thanks to the application of social tagging new books can be described.

Once registration has been carried out (‘join now’) an account can be created (‘create account’). Users must be identified in order to access the information. From this point on it can be determined which books have been most often read, commented on, or reviewed by users.

By means of the tags that identify the contents the various books can be located and added to our bookshelf or the information can be sent to a friend. The difference between this platform and previous versions lies in the fact that as it is associated with the Amazon store, there is a direct link with the store so that the number of copies desired can be acquired.

As for visualisation, it allows the arrangement of your books on a virtual 3D shelf; there is even a choice of various models or colours. On it can be classified the books that have been read, those that are being read, and those the user wants to read.

As in other platforms of this type, the library created by each user may be public or private. The advantage of making your library public is that in this way other users can discover like-minded readers and obtain recommendations of interest to them from these readers.

The social indexing of this application allows Shelfari to provide suggestions from other users with similar tastes based on the books you have on your shelf. Likewise it has filters such as ‘Top Books’, ‘Most Opinions’, ‘Top Shelves’, ‘Most Tags’, and ‘Top Tags’.

Users can criticise and comment on the works found, and as in any other social network can join groups depending on their interest in certain genres or authors.

Conclusion

Social tagging is a natural evolution of the traditional content description process. With the help of online, collaborative tools, text users participate in the process of labelling the information a text contains while also having access to the labels collaboratively assigned by other users at the same time. As we have shown in this chapter, this way of labelling content makes texts and the information they contain much easier to locate within bibliographical reference management applications and social reading platforms.


1.The invisible Internet hosts information only accesible through pages generated dynamically upon conducting a search within a database; ordinary search engine and directory searches cannot access the information hosted on the invisible Internet. The dark Internet contains information hosted on servers that for security reasons are inaccesible by searches from outside computers; improperly configured routers can also make information contents housed on computers inaccesible.

2.This term refers to the linguistic notion of substituting a term with a textual referent in part of the text after which it initially appeared.

3.Algorhythms which can calculate the representativeness of terms in a document have been developed. Quite expectedly, they go beyond simple word counts.

4.The term consistency used here refers to the fact that the same concept should invariably be referred to using the same terms.

5.For those seeking a more detailed account of the historical development of indexing, we recommend the following reference: Gil Leiva, I. (2008) Manual de indización: teoría y práctica, Gijón: Trea, pp. 110–114.

6.Thesauri are useful tools for standardising terminology which tend to be topic-based and highly specialised rather than general.

7.According to ‘Extracting Value from Chaos’ on IDC Digital Universe, the volume of digital information in the world doubles every two years. In the year 2001, this volumen was estimated to be 1.8 zettabytes of created and replicated information.

8.http://www.tagcloud-generator.com

9.http://www.tagxedo.com

10.http://www.tocloud.com/

11.http://www.wordle.net

12.In order to use multi-word descriptors in some systems, these must be inserted using underscores between the words, while in others a hyphen must be used, and in others each term must be inserted between inverted commas.

13.http://blog.mendeley.com/tipstricks/android-on-mendeley-an-app-guide/

14.Some of these platforms, such as Anobii and Wichbook have already been discussed in Chapter 5, which tackles the topic of social reading platforms. Chapter 2 discusses in part the Amazon.com potentialities for personalising system contents. For further details on these two specific topics, see the chapters indicated

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset