John A. Bateman

16The integration of multimodal resources in documents: Issues, approaches and methods

Abstract: Nowadays, documents that do not combine text and and rich varieties of images, ranging from pictures to graphs to infographics, all arranged within visual layouts that themselves contribute considerable flexibility in meaning, have become rare. Answering the question of how such diverse carriers of information manage to combine so as to give rise to coherent messages represents a considerable scientific challenge with substantial practical consequences. Methods and theories addressing the issue of integration constitute the main focus of the field of multimodality. Within multimodality, approaches for explaining how integration occurs have been proposed from very different starting points, including linguistics, information design, psychology, discourse theories, rhetoric as well as social action and interaction. In this article, important representatives of these approaches are summarised and contrasted, and potential applications outlined.

Keywords: multimodality, text-image relations, semiotics, discourse coherence, document design, empirical methods

1Introduction

The use of written documents across all social niches and genres continues to grow rapidly within the vast majority of modern-day cultures. Moreover, following a period of marginalisation which, according to Kress and van Leeuwen (2001: 1), reached its most extreme in the late 19th and early 20th centuries, the combination of verbal language and other modes of expression, including all kinds of graphical, pictorial, typographical and layouting material, is now again sharply on the rise. The previously established restrictions of such combinations to particular genres, themselves sometimes marginalised as in comics or newspapers exhibiting more extravagant layout, are loosening considerably. There is also an accompanying rise in the cultural acceptance of mixed-modality documents as forms of artistic expression, as evidenced in the increasing consideration and awareness of graphic novels, sequential art and comics in both academic and public discourse. Even document types which have traditionally permitted the use of ‘illustration’, such as school textbooks, are making far more use of the technological advances available for presenting material of different kinds than previously.

This expansion is driven both by technological possibilities and consumer demand. In many areas presentational styles reminiscent of artefacts produced even as little as 20–30 years ago now appear old-fashioned and are unlikely to survive. But the issues here are significant far beyond questions of changing fashions and styles. It is generally assumed that the more varied forms of presentation pursued nowadays not only look different, or more attractive, but also that they can (or should) be in a position to support a broader range of communicative purposes more effectively than their less sophisticated forerunners. The extended technological capabilities now available for combining information from varied modes undoubtedly provide radically more freedom in design when constructing communicative artefacts. This freedom does not of itself, however, guarantee effective communication. Indeed, as is generally the case, increased freedom also opens the door to more possibilities for mistakes and for ineffective deployment of the resources available.

This makes it rather urgent that approaches to communication provide useful statements and analyses of multimodal presentation styles: such analyses need to be able to reveal deficiencies and provide for a deeper, more predictive theoretical understanding of just how multimodal documents operate. For the study of communication involving such artefacts, therefore, questions are increasingly raised concerning the conditions that must be fulfilled for combinations of presentational forms, or ‘modes’, to work together, i.e. to ‘integrate’ in a manner that supports intended communicative goals rather than distracting or confusing the ‘multimodal reader’. The purpose of this chapter is accordingly to present an overview of this growing area, foregrounding accounts which draw on a linguistic foundation in order to explain how the integration of multimodal resources within documents may function.

2Social issues

The main factors driving the pursuance of investigations of multimodal resource integration at this time lie in the widespread and very diverse communicative purposes for which multimodal artefacts are being mobilised in society. As suggested above, there are now few domains where some kind of multimodal integration is not an issue. Questions of the use and consequences of varied presentational forms appearing in distinct media are thus on several research agendas. One point to be emphasised as we proceed is that appropriate responses to this task will often have to go beyond some of the disciplinary boundaries that have formed over the past thirty of forty years in the ‘area’. This applies both to the kind of data considered, which in the case of linguistics has tended to marginalise non-verbal aspects, and to the range of methods that are applied, challenging traditional distinctions between approaches that study more the ‘context’ of communication (mass media, communication studies, etc.) and approaches that study the internal organisation of those artefacts being exchanged in communication (linguistics, psycholinguistics, semiotics, etc.).

Approaches that have arisen out of practical situations of communication have long been more flexible in this respect and have been less hampered by pre-decided disciplinary restrictions on subject matter or methodologies. Thus, for example, the investigations of Orlikowski, Yates and colleagues concerning business communication have, from the outset, accepted a broad range of modal contributions as germane to their work, identifying observable features both of the communicative situation including “text formatting devices, such as lists and headings, and devices for structuring interactions at meetings, such as agenda and chairpersons” and of the “communication medium (e.g., pen and paper, telephone, or face to face)” itself (Orlikowski and Yates 1994: 544). Both business and scientific communications now accept, and often even expect, multimodal artefacts as the medium of exchange. Many traditional business genres, such as annual reports, are nowadays considered to be essentially multimodal and achieve appeal to, and persuade, their readers via photographs, graphs, tables and so on. This trend probably reaches its most prominent exponents (although still contentious: cf. Tufte 2006) in the widespread use of presentational forms such as Microsoft’s PowerPoint or Apple’s Keynote.

Other areas affected by progressively multimodal development include bureaucratic communication such as forms, both traditional print-based and online newspapers, and many kinds of citizen-related information offerings, such as health advice (e.g., van Weert et al. 2011). In the first area, awareness concerning the need to present information in an intelligible manner has risen considerably in recent years, and this has in turn led to campaigns and legislation to ensure improved design of previously often impenetrable (non-)communicative artefacts. In this regard, improved deployment of typography and layout has already had a dramatic effect in several countries (cf. Jansen and Steehouder 1992; Delin et al. 2006), as has research based primarily on social science methodologies of usability and testing of recall of information (cf. Houts et al. 2006). An increasingly strong overlap with the areas of human-computer interface design and interface usability must also be recognised as interactions between users and documents now commonly occur online (cf. Schneiderman and Plaisant 2009). As a consequence, research on online document design and usability must be open to sources of input beyond those of ‘traditional’ document design, which have largely lacked well articulated notions of ‘interaction’ between user and information offering.

A further institutional context in which the combination of modes of information presentation has been studied is ‘education’ – here seen most broadly from infants to adult education. For example, there is considerable interest in multimodal combinations of texts and images intended even for very young children, such as that exhibited in ‘picturebooks’ where there is already substantial theoretical discussion to draw upon (cf. Nikolajeva and Scott 2001). Here, as with most approaches to multimodal documents, there is a strong focus on just what kinds of meanings are made possible by the visual contribution – especially in cases where the verbal component may even be minimal or non-existent. There are also many studies concerning the design and use of textbooks – i.e. documents with the explicit function of socialising individuals into the social practices of their disciplines. Here, too, there is strong evidence that multimodal combinations of materials can contribute positively to learning when that combination is done appropriately (cf., e.g., Unsworth 2001; Mayer 2009). This becomes particularly important when the information expressed in the different modalities goes beyond ‘simply’ re-stating or illustrating meaning patterns already established verbally. For example, certain information, such as mathematical formulations or presentations of regularities in data, are not well captured in textual form and so the development of multimodal combinations over time has served an essential enabling function for scientific discourse (O’Halloran 2005). This also cross-cuts the considerable body of work from the perspective of visualisation in general (Bertin 1983; Tufte 1997), which itself now overlaps with information design (cf. Waller 1996). In all these areas, there is an awareness that multimodal combinations allow other meanings to be made than is the case within individual modes of presentation, such as verbal language. Explanatory accounts of just how this, in the phrase of Lemke (1998), meaning multiplication operates are still, however, fragmented.

The widespread deployment of multimodal presentational forms is closely followed as a social phenomenon in its own right in the media and communication sciences. Here the concern is to track how various styles of information presentation are received by their consumers. Styles studied span traditional newspapers, news reporting on television and newer forms of audiovisual media such as YouTube. Differences in audience take-up related to the media adopted and the forms of information presentation found in those media have already been identified, although few proposals for explanations for the differences are available. Some studies indicate that whereas print newspapers tend to leave recipients with an awareness of public events and issues, online news readers were in contrast more likely to remember business and other news topics (cf. Schönbach et al. 2005). Recipients thus interact differently with different media offerings concerning just what information they extract and retain. Such a, presumably, unintentional skewing of awareness of information should naturally be of broad social concern. Just what it is about information offerings in distinct media that directs attention, consumption and recall in this way remains an unanswered question.

Although there may well be external factors, such as the circumstances in which the news is consumed, the very different styles of multimodal presentations involved, including how users interact with these presentations, cannot be ruled out as playing a significant role. This concern has been addressed from several perspectives. The suggestion that distinct kinds of presentational forms have different capabilities and tendencies in their shaping of meaning can, for example, be found in both media studies, as in the consideration of the ‘molding force’ of media specificities (Hepp 2012), and in social semiotic approaches to multimodality, in which distinct modes each bring their own affordances into play (Kress 2010). Psychological studies have also revealed significant aspects of document understanding related to multimodal design and integration. One potentially negative consequence of manipulating design elements, for example, is that an appropriately sophisticated style of presentation can serve to direct attention away from problematic content issues. In early work Glenberg et al. (1982) presented results that demonstrated that the confidence of students concerning their own self assessment of whether they had understood a text containing contradictions could be manipulated by the location of the contradictions in the text (early vs. late) and by the syntactic realisation of their information status (given vs. new). Schriver (1997: 227–231) builds on this in the context of document design and has started documenting further conditions under which an “illusion of knowing”, i.e. thinking that one has understood when in fact one has not, can be created. Appropriate attention to layout, phrasing and the relations between diagrams, text and so on can all give rise to misjudgements not only about what is being communicated but also about whether one has understood or not. Typography, layout and ‘visual style’ are all therefore important for supporting information integration involving differing modalities – simply because an artefact appears to conform to current good practice, it may be evaluated positively regardless of whether it actually delivers on that promise. Readers are consequently under the impression that such visual style is helping them integrate information when it may in fact be doing quite the opposite.

Related to these concerns, there are also studies where, when presented with poorly designed instructions for consumer electronics devices, almost all subjects, regardless of age and gender, tended to blame themselves for not being sufficiently intelligent to deal with modern technology rather than, for example, blaming the companies which should in practice have provided more usable documents (Schriver 1997: 211–223). The ability of particular styles of presentation to shape information presentation and its communication in detrimental ways has consequently been taken up with respect to several media. A particularly prominent example is offered by the condemnations of the style of information delivery encouraged in presentation software such as PowerPoint and Keynote from Tufte (2006) that we mentioned above, although Bucher and Niemann (2012) now offer a more balanced and empirically well-founded study.

The explosive growth of possibilities for presenting multimodal artefacts across the board has brought with it significant developments for both producers and consumers. For the former, there has been a rapidly increasing demand for training programmes that would place designers in a better position to deal with the multitude of possibilities technically available. This re-orientation has had a lasting effect and all such programmes nowadays consider, at least from a practical perspective, how information presentation can be designed in an integrated manner for effective communication (cf. Schriver 1997 and the formation of the Information Design Association (Waller 1996) and journals such as the now merged Document Design and Information Design Journal). For the potential consumers of the results of such multimodal design, the issues and opportunities mentioned above combine to raise the challenge of multimodal literacy – that is, just because information has been presented in various modes for some potential group of readers, it cannot be assumed that those readers will automatically know how to interpret the artefacts they are presented with. As the multimodal resources and their co-deployment become ever more complex, it is unlikely that readers will always make correct interpretative decisions. Indeed, it is straightforward to produce situations of cognitive overload which work in precisely the opposite direction to that intended. Attention can be placed under considerable stress by multimodal information presentation (e.g., Grimes 1991) and so appropriate organisation is particularly important. There are also empirical results suggesting that certain kinds of text-image relationships are naturally more challenging than others and this is reflected in the scores obtained by readers of differing reading abilities (Chan 2011). As a response to this problem, multimodal literacy is now well established as a research and practical development area in education (cf. Jewitt and Kress 2003; Anstey and Bull 2006). But there remains a formidable range of fundamental problems to address – due in no small part to a lack of well articulated theoretical accounts for describing the phenomenon of multimodality in the first place.

3Cognitive issues

From a cognitive perspective, there are also many aspects concerning multimodal integration to consider. First, and most fundamentally, there is the general question of how information in different modalities is processed by the human brain in order to provide integrated meanings cohering to form a single ‘message’. This applies across the board to perception and so is by no means specific to multimodal integration within documents, where the task is probably somewhat simpler since the sources of the information to be combined are tightly constrained (by their presence within a document). The general discussion of whether internal representations may be more ‘propositional’ or ‘imagistic’ is an old one (cf. Pylyshyn 1973; Block 1981) and current models commonly adopt the ‘dual coding approach’ of Paivio (1986), in which different modalities with differing representational and processing properties co-exist.

Approaches concerning document multimodality naturally focus more on information that is carried visually, including both verbal (written) language and pictorial/ diagrammatic components. However, this division on the basis of sensory channel is often not the most revealing as it tends to render certain semiotic modes more similar than they are and others less similar than they are. The distinctions and similarities are best captured at a more abstract level of description where the notion of ‘semiotic mode’ can itself be more rigorously defined (cf. Bateman 2011). There is a considerable degree of cross-discipline and cross-method integration to pursue here and exciting research challenges are easy to find.

One tradition relevant for exploring mode integration has grown out of legibility studies. Here there has been a natural extension beyond issues of the readability of text to those of readability of combinations of text, graphics and typography/ layout. Approaches in this area commonly employ psychological methods of investigation and address how visual properties of the artefacts investigated can enhance or compromise reading performance. Research on legibility was given a considerable new lease of life by the need to address screen-based media since the properties of such media are quite different from those of traditional print-and-paper based artefacts. In addition, whereas formerly legibility studies were broadly limited to considerations of ‘micro’-typography, i.e. the spacing between characters and lines, margins, etc., extensions to consider larger issues of typography and layout are now common (cf. Waller 1990). Earlier views of layout as a kind of ‘macro’-punctuation are thus giving way to considerations of relations between layout and document and argument structure. This then overlaps with an originally quite distinct tradition of investigation into processing details: the psychological investigation of discourse coherence. In this field readers are asked to read texts with certain implicit and explicit information in order to explore their mental construction of logical connections between information elements in the texts (cf. Sanders et al. 1992; Sanders and Spooren 2009). This is now similarly being extended to consider investigations of readers’ behaviour when confronted with multimodal texts, a direction of study itself going back to investigations of the relations between visual and verbal information for the purpose of revealing more about human information processing. Both Bransford et al. (1972) and Glenberg and Langston (1992), for example, present early demonstrations that information provided in text and information provided visually combine during text comprehension and a broad range of research is now exploring various aspects of this phenomenon further.

Another direction of research relevant here explores the role of the visual perception system in the cognitive processes of information processing and information integration in visually-based representations. Here, models derived for visual perception in general are naturally applied to questions of multimodal integration. Increasingly popular as a tool of visual perception research is the use of eyetracking data and eyetracking experiments – a trend now considerably strengthened by the radical reduction in cost of the necessary equipment. Eye movements have long been known to consist of points of fixation joined by very rapid and smooth movements, called ‘saccades’, during which there is no visual perception. Since the portion of the retina that is able to deliver detailed information concerning the visual field is actually very small, the eyes have to make many rapid movements in order to bring different areas of the visual field into focus (Duchowski 2003). The places where the eye fixates are not arbitrary and are determined by a range of features. Many of these features are low-level visual properties of the visual field, such as corners, edges, movement, etc.; however, others are more directly described in terms of higher-level considerations and reveal a clear task-and-goal-based influence (Yarbus 1967). Current models assume that the points of fixation are indicative of the deployment of attention during visual processing. Eye movements thus select the particular informational elements in the visual field most likely to satisfy or refute hypotheses relevant for the informational needs of the viewer at that point; a particularly detailed model of this kind and further references are given in Schill et al. (2001).

This intimate relationship assumed between points of fixation and attributions of attention made it natural to employ eye-tracking for investigating comprehension processes while reading text. Now this technique is also finding application in studies of multimodal document perception for both traditional print and screen-based media (Holsanova 2014). Points of fixation are taken to provide direct evidence of the elements which are being selected for integration by a reader/viewer during processing. Tracking these fixations therefore provides valuable material for the investigation of both which elements are combined and the influence of design decisions such as layout, typography and visual style on that selection (cf. Holsanova and Nord 2010; Liu et al. 2011; Bucher and Schumacher 2011). Previous suggestions of where readers might be looking and intuitive proposals for ‘reading paths’ thus become directly accessible to empirical investigation in concrete individual reading situations.

The fact that links are in general readily drawn between information presented verbally and visually raises again the issue of just what must be ‘shared’ for such linking to be possible. Since the basic properties of the materialities involved can be very different, one direction here has also been to consider the information provided by different modalities at higher levels of abstraction that may be able to span the gap. Stenning and Oberlander (1995), for example, attempt to characterise graphical representations in terms of the interaction they enter into for semantic interpretation and suggest that the spatial information inherent in graphics provides a concrete linking structure for possibly abstract and underspecified categories supportive of more effective reasoning. This relates to several discussions of the value of ‘externalised’ representations for ‘outsourcing’ cognitive effort. The essential idea here is that rather than maintaining a complex mental model or map of entities and their relationships, representation in an external spatially-extended form can take over much of the cognitive load. This is also the idea behind the proposal of Larkin and Simon (1987) that diagrams can be highly effective by virtue of their ability to index information spatially. ‘Diagrammatic reasoning’ of this kind now forms a very active field of research in its own right with regular conferences. A further new avenue of research might then be to draw more explicit connections between research into layout and its effects on comprehension as a particular case of diagrammatic external representation. What certainly appears to be the case, however, is that the different properties of different media will also need to be considered, regardless of what kind of models or representations are explored.

4Methodology

As the discussion up to this point has made clear, multimodal integration raises a substantial body of issues. Depending on starting point and goal, a variety of distinct methodologies may need to be combined in order to achieve results. Given the orientation of this volume as a whole to language sciences and verbal communication, the focus here will be multimodal integration studies that have their origin or strong connections with verbal communication. Such approaches typically attempt detailed analyses of particular communicative artefacts exhibiting the phenomena at issue, in the present case, ‘documents’. The corresponding methodologies are therefore oriented strongly to the ‘objects-of-analysis’ rather than to discussions of social or media contexts of use. These orientations need to be combined but, at the present time, there are still disciplinary boundaries in place that tend to work against this. These boundaries need to be vigorously attacked in order to open up space for progress. While focusing on properties of the objects of analysis, it is essential nevertheless not to lose connection with research that explores the effects of those objects of analysis on specific, individual recipients. This has been, and continues to be, one important role of the cognitive approaches described above. Useful results have also emerged from research on design where empirical work orients to usability studies and user-oriented design. Results significant and worth mentioning here include the following three design principles proposed in work on multimodal document design: spatial contiguity, which places related information in spatial proximity; signalling, by which explicit cues for interpretation are designed into the artefact; and ‘dual scripting’ which suggests that complex messages will be more effectively processed when both the visual layout and the semantic content being communicated are constructed so as to work in unison to guide the attention of the reader – that is, the text should say what is being argued (metadiscourse) and the visual layout should be employed to appropriately segment the argument as well. Holsanova and colleagues introduce and evaluate (generally quite positively) these principles on the basis of eye-tracking studies (Holsanova et al. 2008; Holsanova and Nord 2010).

Considered more generally, work within communication studies often suffers from relatively weak notions of the communicative artefacts being studied, however. This continues to limit the insights achieved concerning the mechanisms and principles of multimodal integration. In such approaches, the detailed internal organisation of the artefacts studied often remains only loosely described and the main emphasis falls on recipient response. This defocusing of the fine-grained details is related to the somewhat dated notions of communication typically employed (cf., e.g., Severin and Tankard 2009). These are based on traditional meaning-exchange models reminiscent of information theory (i.e. a speaker sends a message to the hearer across a medium, and the task involved is one of encoding and decoding that message according to a shared code). The substantial developments from our understanding of communication within linguistics and discourse studies concerning mechanisms of dynamic meaning construction are still often unacknowledged. Particularly important here are results that reveal the necessary interaction that takes place between receiver and message in constructing meanings (cf., e.g., Sperber and Wilson 1995 [1986]; Martin 1992; Asher and Lascarides 2003). Understanding this interaction demands fine-grained accounts of just what the recipient is interacting with, i.e. the objects of analysis themselves, in order to derive interpretations.

To approach the complex task of unravelling multimodal integration in documents at a more fundamental level thus requires rather more theoretical apparatus and stronger empirical methodologies to be effective. Here it is important to consider two inseparable facets of a broader methodological issue for exploring the nature of multimodal integration: (i) more finely articulated theoretical underpinnings are needed concerning just what multimodal artefacts are, and (ii) empirical investigations must be conducted with respect to those underpinnings. An empirical investigation is, after all, only as good as the precision and discrimination offered by its underlying research questions. Much here is offered by work drawing on verbal communication for inspiration, where models concerning integration of multimodal meanings within documents extend notions of communicative coherence as developed in studies of language and discourse as listed above. Multimodal coherence is then seen as a logical extension over and above the phenomenon of coherence within texts. As we shall see below, there are now multimodal versions of most of the approaches that have been taken from purely textual notions of coherence.

That linguistic research should move in this direction is itself a natural development. In studies of verbal communication, it has long been recognised that there is almost always ‘other’ material accompanying language when and where it occurs. Crystal (1974), for example, describes how there are several visual ‘levels’ of organised information presentation (such as typography and layout) around language whose precise function is unclear. These non-linguistic sources of information were for a considerable time characterised imprecisely as instances of ‘paralanguage’ – i.e. information that modifies and augments the meanings made in language. However, this can also be seen to make an unwarranted ‘logocentric’ assumption about the meaning contributions of many visually-carried aspects of documents. As argued in Bateman (2008), not all visual features can be assumed to be modifying some ‘main’ linguistic message and, indeed, the relationship might in some cases even be reversed.

Early examples of what subsequently became ‘multimodal linguistics’ were pursued in the moves made towards text linguistics of the 1960s and 1970s since it was already clear that some genres of texts, perhaps most prominently advertisements, would demand accounts going beyond the boundaries of the linguistic system if critical aspects of how such texts function were not to be missed. This realisation was, however, generally restricted to the peripheries of linguistic concern. Combinations of visual and verbal material received attention in some branches of applied linguistics but could rarely be addressed analytically with any precision and analyses consequently remained exploratory and suggestive. After a brief flurry of treatments attempting to open up the area (e.g., Spillner 1982; Muckenhaupt 1986; Harms 1990), there was a lull in theoretical progress, due primarily to the still relatively undeveloped nature of text linguistics at that time.

Issues of mode combinations returned to prominence in the 1990s as the technological availability of multimodality as a design resource continued to explode. It was increasingly accepted that certain genres could only be sensibly treated as combined visual-verbal communicative artefacts – advertisements again offering the archetypal case (Cook 1992). There were then at this time several attempts to widen linguistic accounts so as to include aspects of the visual representations present. The linguistic models most commonly considered in order to account for the multimodal coherence of such genres were accounts that had been developed originally to describe relations between textual elements. These relations, such as those of discourse relations or cohesion, were intended to capture how information in texts could be combined to form coherent textual wholes. As a consequence, their multimodal variants were similarly seen as relations, in this case, however, as text-image or verbal-visual relations. Descriptions of this kind commonly built on the early discussions of Barthes ([1966] 1977), in which the possible inter-relationships between verbal and visual material were characterised in terms of where the main communicative import was taken to be and the degree of dependence that was exhibited between elements for their combined comprehension. Most approaches since show similar dimensions of organisation at work, although expressed in a variety of terms. Basic distinctions are presumed such as: (i) whether the textual material is dominant and the visual material simply illustrates without additional input, (ii) whether the verbal material is dominant and the textual material simply describes, or (iii) whether the verbal and textual material are equally dominant, each providing a necessary component of the intended meaning. The verbal and textual materials may in addition be considered with respect to their degree of interdependence: for example, (iv) whether the verbal and textual material build upon each other or (v) whether they pursue largely independent paths.

Although useful as a starting point, however, such descriptions show a range of problems that together make it difficult to reliably recognise the proposed relationships. This is often compounded by inappropriate conflations of content-issues – i.e. what is being expressed – and form-related issues – i.e. how the material is being presented. For example, some approaches appear to assume that a photograph is always more specific than any text simply because of the fact that it is a photograph, and hence a re-presentation (indexically created) of a concrete state of affairs (i.e. whatever was in front of the camera when the photograph was taken). Such a position neglects the critical issue of the intended discourse function of the element shown – it is highly unlikely, after all, that a photograph of some man and some woman standing by a pair of doors leading to toilet facilities would be interpreted as asserting that just the concretely depicted individuals are allowed access. It is thus the discourse placement of the visual material that determines its intended interpretation not the physical medium alone – a position foreshadowed in discussions by, for example, Goodman (1969) concerning the necessary preconditions for assigning interpretations to visual materials in general.

There are as a consequence many proposals for characterising the relations between modal contributions that are intended both to provide revealing and predictive analyses and to support more reliable application. While the extent to which the individual approaches succeed or not in these aims is still a matter of debate and experimentation, the approaches themselves fall into several relatively well demarcated categories. These can be identified quite usefully according to where they draw their principal organisational motivation from as follows:

Multimodal relations can be modelled on grammar: approaches of this kind typically draw on categories developed within systemic-functional linguistics because there grammar is seen as a rich organisational framework whose fine-grained classification systems are assumed to be of far broader applicability than grammar alone. In particular, grammatical classifications are construed as socially-motivated functional resources for organising meaning. This has made it natural to consider these as heuristic models for investigating other modalities and mode combinations as well. Important here nonetheless is to realise that these organisations are not then assumed to be ‘grammatical’, even though grammar is their source of inspiration. Particular areas of grammar that have been adapted in this way to text-image relations are clause combining, process-participant structures (e.g., Martinec and Salway 2005), and identifying relational configurations (Unsworth and Cléirigh 2009).

Multimodal relations can also be modelled on accounts of cohesion, the nonstructural relationships assumed in systemic-functional linguistic approaches to be responsible for texture. Linguistic cohesion covers those textual relations where the interpretation of one element depends on another, as in all kinds of phoric relations (pronouns, etc.), ellipsis, conjunctions, reference and lexical collocations. Multimodal cohesion then draws on the non-structural nature of such relations in order to posit interpretative dependencies across information in textual and visual form (e.g., Royce 2007). Variations of this approach form one of the most common techniques for dealing with multimodal artefacts despite (or perhaps because of) some inherent weaknesses. It is, after all, generally straightforward to posit many relations between diverse elements in any coherent document – which of these relations are actually significant for the meanings being made is, however, a separate, rather complex issue which tends to be under-addressed in cohesion-based descriptions.

Multimodal relations can also be modelled on any of the various proposals made to account for larger-scale relations and text structuring mechanisms giving rise to dependences between elements in discourse. Examples here include: (a) frameworks based on discourse semantics that adopt logico-semantic and discourse conjunctive relations, again typically as proposed in systemic-functional approaches to text and discourse and motivated by the notion that such organisations are indicative of more general semiotic principles than those of verbal discourse alone (e.g., van Leeuwen 2005; Liu and O’Halloran 2009); (b) frameworks that extend text structuring accounts from text linguistics, most prominently rhetorical structure theory (RST: Mann and Thompson 1988), to the multimodal case (cf. Bateman 2008: 143–176); and (c) frameworks drawing on models inherited from classical rhetoric and persuasion. The latter group is itself rather diverse, ranging from work that adopts rather particular components of rhetoric, such as ‘metaphor’ (e.g., Forceville 2009) or just the notion of effective communication as such (e.g., Marsh and White 2003), through to more inclusive attempts to apply a fuller set of traditional categories in a multimodal context (cf. Bonsiepe 1999; Kjeldsen 2012; Hoven 2012).

Multimodal relations can be based on work on coherence relations that are more psycholinguistically or cognitively oriented. Such approaches assume that the cognitive mechanisms for attributing coherence across texts and images are similar to those operating within texts. Forceville (2014), for example, proposes the use of Sperber and Wilson’s (1995 [1986]) linguistic ‘relevance theory’ to apply to combinations of material in visual and verbal modes. In general, however, any of the above approaches could also be approached from the perspective of cognition.

Multimodal relations can be based on speech acts, interaction and action: these approaches (see also this volume, chapters 6, 9 and 24) draw on the fact that philosophical proposals for communicative actions, such as those of Grice (1969), actually make few assumptions that would restrict their accounts to verbal acts and so may also be considered applicable to ‘multimodal’ actions (e.g., Sachs-Hombach 2001; Bucher 2011). These approaches have begun examining concrete cases of multimodal communication, exploring the situated use of different modalities in the service of specific communicative goals. There are also approaches that draw on social accounts of communication in general (e.g., Kress 2010).

It should therefore be evident that a considerable diversity of methodological approaches is currently being explored for addressing text-image integration – a more extensive introduction with examples of most of these is offered in Bateman (2014b). There are, however, also some recurrent themes.

For example, one common challenge arising for all approaches to multimodal documents concerns the complex nature of the analytic units to be considered – that is, in a complex multimodal artefact, just what are the ‘image’ and ‘text’ that are being related? For empirical studies, this is crucial since without clearly demarcated units of analysis, it is less than clear precisely what is being investigated. For further progress, approaches with a firm grasp of the units being recognised, their attributes, their assignment to distinct levels of descriptive analysis, as well as their placement within diverse semiotic systems will be essential. This includes incorporating considerations of layout with fine-grained internal structure (cf. Bateman 2008). Further examples of how this can be combined with analyses in the style of rhetorical structure theory as well as more background information can be found in Hiippala (2012a) and Bateman (2014a).

In addition, tighter descriptions of multimodal artefacts are necessary to formulate more discriminating hypotheses for subsequent empirical exploration. Only then will accounts make sufficiently strong contact with the actual artefacts being analysed as to be able to support empirical research, such as the eye-tracking and other psychological approaches mentioned above. In addition, providing and exploring complex descriptions of larger bodies of data itself raises significant challenges. These now constitute the growing area of multimodal corpus research (cf. Bateman 2014c), which employs and extends techniques familiar from linguistic corpus work. Hiippala (2013) and Thomas (2009b) offer some particularly well developed accounts of how this can be done for static two-dimensional multimodal artefacts.

5Applications

We have already mentioned the primary applications of research into multimodal integration since they naturally arise out of the main social and cognitive motivations for this direction of research. Applications accordingly revolve around the core concerns of education, particularly in the form of understanding and teaching multiliteracies (Unsworth 2001), of understanding user responses to increasingly complex multimodal documents, and of improving multimodal design and critique of multimodal artefacts. More tightly connecting multimodal theory with multimodal practice via empirically-based research should itself also become an increasingly important ‘application’ of theory (Thomas 2009a; Hiippala 2012b). The ability to show how design influences user uptake by using eye-tracking has already been applied, for example, in studies that have demonstrated how readers familiar with newspaper design can completely ‘filter out’ advertisements despite their visual prominence and positioning (Holmqvist et al. 2003). Robust results concerning how media and their deployment of multimodal resources influence their consumers will increasingly demand appropriate theoretical descriptions of just what the multimodal resources involved are.

There is then hardly an area of information presentation today where issues of modal integration do not arise and where detailed research is required to understand and improve their workings. There are also further cases that lie on the border between the focus of the current chapter, i.e. ‘documents’ as traditionally conceived as more or less self-contained information offerings, and other forms of multimodality. For example, there are now many kinds of environmentally-embedded sources of information that extend multimodality and the task of integration still further – a common example here being ‘signage’, itself increasingly multimodal in design. The effectiveness of signage for orienting users within an environment is now coming into focus not only as a research task, drawing both on traditional information design issues and newer models of cognitive capabilities in spatial organisation and orientation (Hölscher et al. 2007), but also as a very practical concern in providing effective guidance support for diverse user groups, often with varying needs and capabilities. This area itself blends further into that of maps and cartography, particularly when maps are seen as communicative artefacts for orientation and way-finding rather than simple ‘representations’ of physical geography (Meilinger et al. 2007); indeed, maps constitute a classic case of explicit combinations of different kinds of information that require integration on the part of their users, a function that is now extended still further by their common occurrence in information graphics employed within other types of documents.

Considerations of this kind serve well to emphasise that it will be essential to consider the particular meanings being combined when we talk of ‘multimodal integration’ with far more precision than has often been the case if we wish to provide explanatory accounts and practical results. Simple talk of ‘image’ and ‘text’, or of ‘verbal’ and ‘visual’ information, is often far too undiscriminating to produce concrete and applicable explanatory accounts. Semiotically more refined notions of distinct modal contributions of the kind proposed in Bateman (2011) will be required. Indeed, more refined theoretical accounts of the notions of genre, multimodality and their occurrence in diverse media will all be needed for revealing the very diverse range of influences that contribute to each and every instance of multimodal integration.

References

Anstey, Michèle & Geoff Bull. 2006. Teaching and learning multiliteracies: changing times, changing literacies. International Reading Association.

Asher, Nicholas & Alex Lascarides. 2003. Logics of conversation. Cambridge: Cambridge University Press.

Barthes, Roland. 1977 [1966]. The rhetoric of the image. In Stephen Heath (ed.), Image–Music– Text, 32–51. London: Fontana.

Bateman, John A. 2008. Multimodality and Genre: a foundation for the systematic analysis of multimodal documents. London: Palgrave Macmillan.

Bateman, John A. 2011. The decomposability of semiotic modes. In Kay L. O’Halloran & Bradley A. Smith (eds.), Multimodal studies: Multiple approaches and domains Routledge Studies in Multimodality, 17–38. London: Routledge.

Bateman, John A. 2014a. Multimodal coherence research and its applications. In Helmut Gruber & Gisela Redeker (eds.), The pragmatics of discourse coherence, 145–177. Amsterdam: John Benjamins.

Bateman, John A. 2014b. Text and image: a critical introduction to the visual/verbal divide. London & New York: Routledge.

Bateman, John A. 2014c. Using multimodal corpora for empirical research. In Carey Jewitt (ed.), The Routledge Handbook of multimodal analysis, 2nd ed. 238–252. London: Routledge.

Bertin, Jacques. 1983. Semiology of graphics. Madison, Wisconsin: University of Wisconsin Press.

Block, Ned. 1981. Imagery. Cambridge, MA: MIT Press.

Bonsiepe, Gui. 1999. Visual/verbal rhetoric. In Michael Beirut, Jessica Helfand, Stephen Heller & Rick Poynor (eds.), Looking closer Vol. 3: Classic writings on graphic design, 167–173. New York: Allworth.

Bransford, John D., J. Richard Barclay & Jeffery J. Franks. 1972. Sentence memory: A constructive versus interpretive approach. Cognitive Psychology 3. 193–209.

Bucher, Hans-Jürgen. 2011. Multimodales Verstehen oder Rezeption als Interaktion. Theoretische und empirische Grundlagen einer systematischen Analyse der Multimodalität. In Hans-Joachim Diekmannshenke, Michael Klemm & Hartmut Stöckl (eds.), Bildlinguistik. Theorien – Methoden – Fallbeispiele, 123–156. Berlin: Erich Schmidt.

Bucher, Hans-Jürgen & Philipp Niemann. 2012. Visualizing science: the reception of PowerPoint presentations. Visual Communication 11(3). 283–306.

Bucher, Hans-Jürgen & Peter Schumacher. 2011. The relevance of attention for selecting new content. an eye-tracking study on attention patterns in the reception of print- and online media. Communications. The European Journal of Communications Research 31(3). 347–368.

Chan, Eveline. 2011. Integrating visual and verbal meaning in multimodal text comprehension: towards a model of intermodal relations. In Shooshi Dreyfus, Sue Hood & Maree Stenglin (eds.), Semiotic margins: reclaiming meaning, 144–167. London: Continuum.

Cook, Guy. 1992. The Discourse of Advertising. London: Routledge.

Crystal, David. 1974. Paralanguage. In Thomas A. Seboek (ed.), Linguistics and adjacent arts and sciences, Vol. 12. The Hague: Mouton.

Delin, Judy L., Abi Searle-Jones & Rob Waller. 2006. Branding and relationship communications: the evolution of utility bills in the UK. In Carliner, Saul, Jan Piet Verckens, & Cathy de Waele (eds.), Information and Document Design, 27–59. Amsterdam: John Benjamins.

Duchowski, Andrew T. 2003. Eye tracking methodology: theory and practice. London: Springer.

Forceville, Charles J. 2009. Non-verbal and multimodal metaphor as a cognitivist framework: agendas for research. In Charles J. Forceville & Eduardo Urios-Aparisi (eds.), Multimodal metaphor, 19–42. Berlin & New York: Mouton de Gruyter.

Forceville, Charles J. 2014. Relevance theory as model for analyzing visual and multimodal communication. In David Machin (ed.), Multimodal communication, 51–70. Berlin: Mouton de Gruyter.

Glenberg, Arthur M. & William E. Langston. 1992. Comprehension of illustrated text: Pictures help to build mental models. Journal of Memory and Language 31. 129–151.

Glenberg, Arthur M., Alex Cherry Wilkinson & William Epstein. 1982. The illusion of knowing: failure in the self-assessment of comprehension. Memory and Cognition 10(6). 597–602.

Goodman, Nelson. 1969. Languages of art. an approach to a theory of symbols. London: Oxford University Press.

Grice, H. Paul. 1969. Utterer’s Meaning and Intentions. Philosophical Review 68(2). 147–177.

Grimes, T. 1991. Mild auditory-visual dissonnance in television news may exceed viewer attentional capacity. Human communication research 18. 268–298.

Harms, Wolfgang (ed.). 1990. Text und Bild, Bild und Text: DFG-Symposion 1988. Stuttgart: J. B. Metzlersche Verlagsbuchhandlung.

Hepp, Andreas. 2012. Mediatization and the ‘molding force’ of the media. Communications 37. 1–28.

Hiippala, Tuomo. 2012a. The interface between rhetoric and layout in multimodal artefacts. Literary and linguistic computing 28(3). 461–471.

Hiippala, Tuomo. 2012b. Reading paths and visual perception in multimodal research, psychology and brain sciences. Journal of Pragmatics 44(3). 315–327.

Hiippala, Tuomo. 2013. Modelling the structure of a multimodal artefact. Helsinki: University of Helsinki dissertation. https://helda.helsinki.fi/handle/10138/41736 (accessed 18 March 2015).

Holmqvist, Kenneth, Jana Holsanova, Maria Barthelson & Daniel Lundqvist. 2003. Reading or scanning? a study of newspaper and net paper reading. In Radach, Ralph, Jukka Hyona, & Heiner Deubel (eds.), The mind’s eye: cognitive and applied aspects of eye movement research, 657–670. Amsterdam: Elsevier.

Holsanova, Jana. 2014. Recpetion of multimodality: applying eye-tracking methodology in multimodal researrch. In Carey Jewitt (ed.), The Routledge Handbook of multimodal analysis, 2nd ed. 287–298. London: Routledge.

Holsanova, Jana, Nils Holmberg & Kenneth Holmqvist. 2008. Reading information graphics: the role of spatial contiguity and dual attentional guidance. Applied Cognitive Psychology 23(9). 1215–1226.

Holsanova, Jana & Andreas Nord. 2010. Multimedia design: media structures, media principles and users’ meaning-making in newspapers and net papers. In Hans-Jürgen Bucher, Thomas Gloning and Kartin Lehnen (eds.), Neue Medien – neue Formate. Ausdifferenzierung und Konvergenz in der Medienkommunikation (Interaktiva. Schriftenreihe des Zentrums für Medien und Interaktivität (ZMI), Gießen 10, 81–103. Frankfurt & New York: Campus Verlag.

Hölscher, Christoph, Simon J. Büchner, Martin Brösamle, Tobias Meilinger & Gerhard Strube. 2007. Signs and maps and cognitive economy in the use of external aids for indoor navigation. In Danielle S. McNamara and Gregory J. Trafton (eds.), 29th annual cognitive science society, 377–382. Austin, TX: Cognitive Science Society.

Houts, Peter S., Cecilia C. Doak, Leonard G. Doak & Matthew J. Loscalzo. 2006. The role of pictures in improving health communication: A review of research on attention, comprehension, recall and adherence. Patient Education and Counselling 61. 173–190.

Hoven, Paul J. van den. 2012. Getting your ad banned to bring the message home? A rhetorical analysis of an ad on the US national debt. Informal Logic 32(4). 381–402.

Jansen, Carel J. M. & Michael F. Steehouder. 1992. Forms as a source of communication problems. Journal of technical writing and communication 22. 179–194.

Jewitt, Carey & Gunther Kress. 2003. Multimodal literacy (New literacies and digital epistemologies 4). Frankfurt a.M. & New York: Peter Lang.

Kjeldsen, Jens E. 2012. Pictorial argumentation in advertising: Visual tropes and figures as a way of creating visual argumentation. In Frans H. van Eemeren & Bart Garssen (eds.), Topical themes in argumentation theory: twenty exploratory studies, 239–255. Berlin: Springer.

Kress, Gunther. 2010. Multimodality: a social semiotic approach to contemporary communication. London: Routledge.

Kress, Gunther & Theo van Leeuwen. 2001. Multimodal discourse: the modes and media of contemporary communication. London: Arnold.

Larkin, Jill H. & Herbert A. Simon. 1987. Why a diagram is (sometimes) worth ten thousand words. Cognitive Science 11. 65–99.

Lemke, Jay L. 1998. Multiplying meaning: visual and verbal semiotics in scientific text. In J. R. Martin and Robert Veel (eds.), Reading science: critical and functional perspectives on discourses of science, 87–113. London: Routledge.

Liu, Han-Chin, Meng-Lung Lai & Hsueh-Hua Chuang. 2011. Using eye-tracking technology to investigate the redundant effect of multimedia web-pages on viewers’ cognitive processes. Computers in Human Behavior 27. 2410–2417.

Liu, Yu & Kay L. O’Halloran. 2009. Intersemiotic texture: analyzing cohesive devices between language and images. Social Semiotics 19(4). 367–388.

Mann, William C. & Sandra A. Thompson. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text 8(3). 243–281.

Marsh, Emily E. & Marilyn Domas White. 2003. A taxonomy of relationships between images and text. Journal of Documentation 59(6). 647–672.

Martin, James R. 1992. English text: systems and structure. Amsterdam: Benjamins.

Martinec, Radan & Anthony Salway. 2005. A system for image-text relations in new (and old) media. Visual Communication 4(3). 337–371.

Mayer, Richard E. 2009. Multimedia learning. 2nd ed. Cambridge: Cambridge University Press.

Meilinger, Tobias, Christoph Hölscher, Simon J. Büchner & Martin Brösamle. 2007. How much information do you need? schematic maps in wayfinding and self localisation. In Thomas Barkowsky, Markus Knauff, Gerard Ligozat and Daniel R. Montello (eds.), Spatial Cognition V, Vol. 4387 Lecture Notes in Artificial Intelligence, 381–400. Bremen, Germany: Springer.

Muckenhaupt, Manfred. 1986. Text und Bild. Grundfragen der Beschreibung von Text-Bild-Kommunikation aus sprachwissenschaftlicher Sicht Tübinger Beiträge zur Linguistik. Tübingen: Narr.

Nikolajeva, Maria & Carole Scott. 2001. How picturebooks work. London: Routledge.

O’Halloran, Kay L. 2005. Mathematical Discourse: Language, Symbolism and Visual Images. London and New York: Continuum.

Orlikowski, Wanda J. & JoAnne Yates. 1994. Genre repertoire: the structuring of communicative practices in organizations. Administrative Science Quarterly 39(4). 541–574.

Paivio, Allan. 1986. Mental Representations: A Dual Coding Approach. London and New York: Oxford University Press.

Pylyshyn, Zenon W. 1973. What the mind’s eye tells the mind’s brain : A critique of mental imagery. Psychological Bulletin 80(1). 1.

Royce, Terry D. 2007. Intersemiotic Complementarity: A Framework for Multimodal Discourse Analysis. In Terry D. Royce & Wendy L. Bowcher (eds.), New Directions in the Analysis of Multimodal Discourse, 63–110. Lawrence Erlbaum Associates.

Sachs-Hombach, Klaus (ed.). 2001. Bildhandeln: interdisziplinäre Forschungen zur Pragmatik bildhafter Darstellungsformen. Magdeburg: Scriptum-Verlag.

Sanders, Ted J. M., Wilbert P. M. Spooren & Leo G. M. Noordman. 1992. Towards a Taxonomy of Coherence Relations. Discourse Processes 15(1). 1–36.

Sanders, Ted J. M. & Wilbert P. M. Spooren. 2009. Causal categories in discourse: converging evidence from language use. In Ted J. M. Sanders & Eve E. Sweetser (eds.), Causal categories in discourse and cognition, 205–246. Berlin, New York: Mouton de Gruyter.

Schill, Kerstin, Elisabeth Umkehrer, Stephan Beinich, Gerhard Krieger & Christoph Zetzsche. 2001. Scene analysis with saccadic eye movements: top-down and bottom-up modeling. Journal of Electronic Imaging 10(1). 152–160.

Schneiderman, Ben & Catherine Plaisant. 2009. Designing the user interface. strategies for effective user-computer interaction. Reading, MA: Addison-Wesley.

Schönbach, Klaus, Ester de Waal & Edmund Lauf. 2005. Research note: Online and print newspaper. Their impact on the extent of the perceived public agenda. European Journal of Communication 20(2). 245–258.

Schriver, Karen A. 1997. Dynamics in document design: creating texts for readers. New York: John Wiley and Sons.

Severin, Werner J. & James W. Tankard. 2009. Communication theories: Origins, methods and uses in the mass media, 5th ed. Addison Wesley Longman.

Sperber, Dan & Deirdre Wilson. 1995 [1986]. Relevance: communication and cognition, 2nd ed. Oxford: Blackwell.

Spillner, Bernd. 1982. Stilanalyse semiotisch komplexer Texte. Zum Verhältnis von sprachlicher und bildlicher Information in Werbeanzeigen. Kodikas/Code. Ars Semeiotica 4/5(1). 91–106.

Stenning, Keith & Jon Oberlander. 1995. A cognitive theory of graphical and linguistic reasoning: logic and implementation. Cognitive Science 19. 97–140.

Thomas, Martin. 2009a. Developing multimodal texture. In Eija Ventola & Arsenio Jesús Moya Guijarro (eds.), The world told and the world shown: multisemiotic issues. Basingstoke: Palgrave Macmillan.

Thomas, Martin. 2009b. Localizing pack messages: A framework for corpus-based cross-cultural multimodal analysis. Leeds: Centre for Translation Studies, University of Leeds dissertation. http://corpus.leeds.ac.uk/~martin/thesis/martin_thomas_thesis_2009_semi-_skimmed.pdf (accessed 18 March 2015).

Tufte, Edward R. 1997. Visual explanations: images and quantities, evidence and narrative. Cheshire, Connecticut: Graphics Press.

Tufte, Edward R. 2006. The cognitive style of PowerPoint: pitching out corrupts within, 2nd ed. Cheshire, Connecticut: Graphics Press LLC.

Unsworth, Len. 2001. Teaching multiliteracies across the curriculum: changing contexts of text and image in classroom practice. Open University Press.

Unsworth, Len & Chris Cléirigh. 2009. Multimodality and reading: the construction of meaning through image-text interaction. In Carey Jewitt (ed.), The Routledge Handbook of multimodal analysis, 151–164. London: Routledge.

van Leeuwen, Theo. 2005. Introducing social semiotics. London: Routledge.

van Weert, Julia, Guda van Noort, Nadine Bol, Liset van Dijk, Kiek Tates & Jesse Jansen. 2011. Tailored information for cancer patients on the internet: effects of visual cues and language complexity on information recall and satisfaction. Patient Education and Counseling 84(3). 368–378.

Waller, Robert. 1990. Typography and discourse. In Rebecca Barr (ed.), Handbook of reading research, Vol. 2, 341–380. London: Longman.

Waller, Robert. 1996. The origins of the Information Design Association. In The 1996 Annual Report of the IDA, Information Design Association.

Yarbus, Alfred Lukyanovich. 1967. Eye Movements and Vision. New York, NY: Plenum Press.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset