Chapter 6. Modalities and Multimodalities

The way we use our senses, the kinds of information we can parse, and what we are trying to accomplish are tied together. The first exercise requires the identification of color, and the unique items in the room. The second exercise combines matching our own position relative to light sources with our ability to discern brightness. The last exercise focuses inward to what our bodies are telling us about our posture. All that sensory information and much, much more was already there. We just didn’t notice we needed it. Before Google Maps, our senses were already giving us just-in-time information. We adapt how we gather and use sensory information to apply it to the task at hand. These shifts also occur during non-aware activities. This link between purpose and sensory focus helps us filter out irrelevant information and turbo boosts the acquisition of the information we need.

The cognitive processes tied to each sense can also be activated simply by focusing on the sense itself. Using one sense over the other can change the way you think about and respond to an experience. This happens all the time. We shift between modalities, using the set of sensory, cognitive, and physical abilities that are best suited for what we are experiencing and need to accomplish. We easily switch attention from one sense to another, or from observing one type of phenomenon to another. Like a TV, modal focus allows us to change channels.

Modalities: How We Use Our Senses

Modalities are patterns developed over the course of our lives—some are common, and some are unique to more specialized or personal abilities. Their existence can partially explain “having a good feel” for something, as if you already knew how to do it or just picked it up quickly. Modalities are not just the ability to see or hear, but include how we use that information. For example, while we are walking, we look around at different focal lengths, using our vision to help maintain balance, assess our speed, avoid obstacles, and find our way. This is different from how we use vision when hand writing a letter, following the movement of a writing tip at close focus. We need to guide a pen in the shapes of letters, to evenly space and align them, and check spelling and punctuation. Both of these activities rely on vision in completely different ways (see Figure 6-1). Modalities extend beyond sensing to how we derive information from a stimulus and how we apply it to thoughts and actions.

How we see—our focus, intention, and what we notice—are quite different depending on whether we are walking or writing
Figure 6-1. How we see—our focus, intention, and what we notice—are quite different depending on whether we are walking or writing

People who experience visual loss may use their sense of proprioception and touch for obstacle detection with a walking stick as well as for reading and writing, with braille and keyboards. We can apply modalities flexibly. We have many different ways of sensing the objects, environments, and events around us. For example, we can hear wind, feel it on our skin, or observe it rustling the leaves of trees. But when we fly a kite, we experience wind by feeling the kite string pull in our grip and how the kite moves above our heads. And while wind is normally invisible, we can watch a weathervane from indoors, where we have no direct contact with the wind at all. The flexibility of our modalities becomes especially powerful when we create and use tools, devices, and other kinds of equipment. It allows us to interpret the behaviors of technologies and physical objects beyond our own sensory abilities. Many can use the sound of a hollow or solid hammer strike to know whether they will really anchor a nail or not. Chefs use their sense of touch to determine if a steak’s interior is a pliable rare or a springier medium-well.

Think back to when you were learning how to drive a car. It took time to develop a visual sense for speed, when to accelerate and decelerate, and to anticipate obstacles. It took practice to learn when to look around and over your shoulder and to use the rear and side-view mirrors when turning or switching lanes (see Figure 6-2). Getting onto a highway for the first time was probably one of the most frightening parts of learning to drive. It required developing a sense of timing to estimate the speed and distance of oncoming traffic, as well as your own car’s speed. Once this modality was developed, it could be adapted to driving other kinds of vehicles. As new modalities form, they become a more permanent part of our experiential toolkit we can then apply across similar activities.

Using the sideview mirror to merge into oncoming traffic requires learning and practice to gauge car speed and direction in a reversed image
Figure 6-2. Using the sideview mirror to merge into oncoming traffic requires learning and practice to gauge car speed and direction in a reversed image

Everyone has their own distinct sets and preferences across modalities, relying on some more than others. Within a single activity, different people may prefer different modalities. Some are better at language comprehension via reading and writing. Some prefer listening. Still others prefer conversation—speaking reinforces understanding. Like our sizes, shapes, and physical abilities, there is variation across modalities. In all cases, we depend on them to coherently experience and interact with the world.

Types of Modalities

The dominant sense of a modality is most commonly used to describe it (see Table 6-1). While smell and taste, respectively, the olfactory and gustatory modalities, are a meaningful part of human experience, they are not yet significant in creating interfaces.

Table 6-1. Key interface modalities
VISUAL Based on our sense of sight
AUDITORY Based on our sense of sound
HAPTIC Based on the sense of touch and movement
PROPRIOCEPTIVE (KINESTHETIC) Based on the sense of our own movement, and orientation. Proxemics, is a subset of this, focusing on presence and relative spatial distances.

We Shape Our Modalities, and They Shape Us

Vision is the most processing intensive sense, therefore the visual modality is complex. It consists of separate, innate systems for shape, color, and movement. There are also systems that combine learned and innate: recognizing faces and reading emotional cues are patterns that run deep. We have a large part of our brain devoted just to faces—we use the shapes of the eyes, nose, and lips, as well as their spacing. While we are born with these abilities, and start developing them as soon as we open our eyes, we can only attain accuracy through repeated use. Even less innate is the ability to read, but with practice our eyes sweep quickly across and down pages, effortlessly melding spatial order and tiny little glyphs into new ideas, new worlds, and beloved characters. We are born with our senses and the instinct to build modalities with them. Which modalities we build depend on our circumstances and choices.

As we develop modalities, they become a more permanent part of how we experience—literally. Scientists discovered visible differences in the size of London cab drivers’ brains:

In the drivers, the posterior part of the hippocampus had grown physically larger than those in the control group—presumably causing their increased spatial memory. The researchers also found that the longer a cabbie has been doing his job, the bigger the change in that brain region, suggesting that the result was not simply reflecting a pre-existing condition of people who go into the profession, but instead resulted from practice.1

These highly developed modalities can start to become a larger part of our lives. People who have developed an ear for music start hearing melodies in the way raindrops fall or rhythms in the click-clack of a train on its track. Inspiration and delight aren’t just found out there in the world through luck. We build up our ability to discover them.

Attributes and Abilities of Modalities

Modal focus is the ability to select and prioritize the most important information about a task or activity. This establishes a feedback loop: sensing relevant information sharpens focus, which in turn sharpens the senses. Some neuroscience researchers describe focus as a “process that gives rise to a temporary change (often enhancement) in signal processing.”2

Filtering enables us tune out the stimuli which aren’t important, and usually accompanies modal focus. Our brains are very good at this: it’s also called selective filtering or sensory gating. Being able to filter prevents us from being overwhelmed by too much irrelevant information. It helps us keep the story straight and reduces response time and effort. The downside is that it can let expectations become biases. We start to miss things because we’re not expecting them.

Calibration allows us to stabilize our sensory abilities and limits even when contexts or abilities change. Sensory processing can be somewhat fluid, and unique capabilities emerge as we need them. Put on prismatic glasses that show the world upside down, and after about four days, the brain will flip the image upright. Our cognitive processes are constantly tweaking our senses to make our experience of reality understandable and actionable. Since an upright image is easier to understand, that’s what our brains go with. Neural adaptation describes how our senses can do things like block out stimuli after a period of exposure. The flipped image takes a while, but other responses are much quicker. Our eyes adjust to the amount of light in a room, as well as the color balance. We stop feeling the smooth tabletop where our arm is resting, or hearing the chugging sounds of an air conditioner. Motion sickness can occur when we are unable to calibrate our sense of movement to what we see.

Applying Modalities to Design

We develop modalities to help us understand our world, to interact with it, and to fulfill our needs. The use of smartphones has expanded the way we communicate with writing to include emojis, GIFs, videos, and photos. We now converse pictorially (see Figure 6-3). Some might say that this is a devolution in communication, but images are a higher resolution form of visual communication than text. We can actually convey more meaning this way. Devices can change which modalities we develop, and create new ones for us to learn.

Choosing one modality over another may improve an experience. Certain modalities are linked to specific analytical skills or physical abilities, allowing us to respond more effectively. For example, we have some of the fastest response times to spoken conversation but can more easily absorb dense information visually. We filter out a great amount of haptic information, but this also allows us to multitask across complex and automatic physical activities. The kinds of modalities that are used within an experience can enable or hinder successful interactions.

Emoji let us add nuance to text, or even express whole thoughts pictorially
Figure 6-3. Emoji let us add nuance to text, or even express whole thoughts pictorially

Different people have different proficiences and preferences across modalities. The majority of Americans require corrective lenses of some kind. Vision and hearing universally degrade with age. Some people have more strongly or easily developed modalities than others. Some people experience temporary injuries or illnesses. Some bars are really noisy and you need closed captions to be able to follow the basketball game. Our relationship to our sensory and physical abilities varies between people, over the course of our lives, and sometimes from one moment or place to another. Disability is not an issue for a small minority of people, but something everyone experiences in some form or degree.

Multimodalities

Woke up, fell out of bed,

Dragged a comb across my head

Found my way downstairs and drank a cup,

And looking up I noticed I was late.

Found my coat and grabbed my hat

Made the bus in seconds flat

Found my way upstairs and had a smoke,

Somebody spoke and I went into a dream.

— “A DAY IN THE LIFE,” JOHN LENNON/PAUL MCCARTNEY

As the Beatles observed, daily life requires shifting between activities quickly and fluidly.

The reason is simple: our days are fluid sequences of actions and experiences, requiring countless thoughts, decisions, and actions. You wake up with some serious bedhead, and have to use much more product that usual. The coffee is too hot, and you have to wait before you can drink it. And this morning, you can’t just walk to the bus stop. You need to run like there’s medals involved. Life is filled with details and minor adjustments that we make on the fly, and having multimodal abilities allows us to manage them. Besides, all animals are multimodal. Apparently, staying alive means you have to keep track of a bunch of different stuff.

The use of technology relies deeply on our multimodal abilities and how we can develop them in specific ways. We wouldn’t have been able to develop technologies without them in the first place. Using technology, however, is not natural or innate. One of the earliest films was of an approaching train. Some viewers jumped out of the way, believing they were about to be struck. Their eyes were deceived. Riding a bike is a continual flow of pedaling, balancing, and steering combined with scanning the road and listening for other bikers and cars. We look at our phones, listen to notification alerts, and mostly manage to keep our grip on them when our hands are wet. Some of these multimodal abilities have a bigger learning curve. Once we develop them, however, the level of focus and effort required can drop dramatically. We don’t just come with autopilot; we create new autopilot programs all the time across myriad activities. These shortcuts provide rich playgrounds for things like optical illusions, surprise, and playfulness, but can also result in certain types of perceptual and cognitive bias. For designers to create magical experiences, ease and utility must be balanced against creating false or misguided expectations.

Trusted Version and Performance Optimization

A glass of water is a mass of complexity and possibility through the lens of phenomenology (see Figure 6-4). It is cold, it is wet, it is definitely half full. It was not on the table yesterday, and it needs to go in the dishwasher before bedtime. It can be moved, drunk, spilled, shattered, and if you are a well-informed Boy or Girl Scout, it can be used to start a campfire. It can then be used to put that fire out.

A glass of water, but you probably knew that already
Figure 6-4. A glass of water, but you probably knew that already

How do we know it’s there? How do we know all this stuff about it? How do we know what we can do with it? This is where multimodality becomes useful. While we have multiple senses, they have their limitations. Being able to sense different things allows us to understand a wider range of experiences. It also allows us to use different senses together in the same experience. Our senses validate each other, ensuring that we have a reliable perception of reality. If we smell or taste vodka, then we know that it wasn’t water like we thought.

Validation

As observed by neuroscientists Barry E. Stein and M. Alex Meredith, “cross-modal matching is using information obtained through one sensory modality to make a judgement about an equivalent stimulus from another modality.”3 Sensory impressions that fit together are judged to be more reliable. Validation allows us to confirm what we are experiencing. When simultaneous sensory stimuli support each other, we pay more attention, we remember them better, and for longer. Our minds judge them to be reliable, and therefore give them higher priority in attention and memory. When simultaneous stimuli conflict, we may pay more attention to understand what is wrong, or experience confusion. We might try to focus more sensory attention to understand and resolve the conflict, or dismiss the stimuli as unpleasant sensory noise. Cross-modal techniques are believed to be particularly effective in learning experiences.

In interface design, cross-modal stimuli are constructed: integrated code triggers the simultaneous display of pixels, emission of sounds, and haptic feedback. They have to be deliberately matched together to simulate natural cross-modal experiences, like sound with animation or a visual effect with haptic vibration. Our sense of rhythm spans vision, hearing, touch, and proprioception and plays a strong role in how we align cross-modal stimuli. But the mind is a little forgiving. Misalignments of up to a few hundred milliseconds between stimuli are passable. More than this causes the mind tends to perceive separate events or dismiss one or more of the misaligned stimuli as noise.

Integration

Multisensory integration describes how different senses are synthesized into a coherent, multidimensional reality as it unfolds. The combination of taste and smell into the experience of flavor is one example. Audiovisual integration is one of our best-understood experiences across multiple design media. Vison in combination with the vestibular system allows us to keep our balance as we walk. Integrating modalities is fundamental to our most basic activities. One sense can prime the other to be prepared for what happens next, a type of response loop known as feedforward. This allows people to prepare their attention and physical response before something happens. Watching someone’s lips as they speak enhances the cocktail party effect.

Neuroscientific research conducted by Stein and Meredith shows that in humans and other species, an experience that involves two or more senses is judged by our brains as being more important and is therefore perceived more intensely:4

Multisensory integration...is ubiquitous, and even animals with such seemingly exotic sensory apparatus as an infrared system (pit vipers) or an electroreceptor system (some fish) combine these inputs with those from more commonly represented modalities such as vision to provide an integrated world view.5

The sum is often greater than the sensory parts, as Stein and Meredith elaborate: “The integration of inputs from different sensory modalities not only transforms some of their individual characteristics, but does so in ways that can enhance the quality of life. Integrated sensory inputs produce far richer experiences than would be predicted from their simple coexistence or the linear sum of their individual products.” They note that it’s a pattern that seems to exist across all species, from unicellular organisms to higher primates: “We know of no animal with a nervous system in which the different sensory representations are organized so that they maintain exclusivity from one another.”

In design and the arts, this richness makes delightful experiences possible. At a concert, the pounding bass gives the music kinesthetic impact—we feel it as much as we hear it. It might just get you to start moving your feet. Drivers of sports cars dislike too much shock absorption because haptic sensations enhance their perception of driving. In fact, many performance luxury cars are designed to accentuate this sensorial experience. People are not very good at gauging higher speeds visually, but we feel the rumble of the engine, feel the nervy swing into a tightly hugged turn, hear the engine’s throaty growl as we shift gears. All of these sensory cues create an impression of power and speed. None of them have any bearing on acceleration.

A Single Prioritized Sense or Many Together?

Some multimodalities are simple, like a chain of sensation, action, and reaction. Others are more complex, sometimes deceptively so:

Across one main sense

  • The primary sense may be used across multiple modalities. You see movement, shift your gaze, focus in the area of movement, and watch to see what unfolds as you decide to walk closer, keeping vision as the main priority in each part of the sequence. Three different activities: noticing movement, identifying the source of movement, and maintaining observation require you to use vision in different ways, and rely on different cognitive processes.

Across multiple senses

  • You start to prepare dinner. You see and feel that the oil and pan is hot, so you drop minced garlic into it and affirm the decision by watching the oil bubble and hearing it sizzle. Your sense of timing is confirmed by watching and smelling, and you see that it’s time to add garbanzos. And so on, as multiple modalities inform and guide successive sets of senses working together.

How Multimodality Shapes Our Activities and Experiences

“You don’t perceive objects as they are,”6 says neuroscientist David Eagleman. “You perceive them as you are.” Two people can be in the same place and experience very different things. The differences arise from what we bring with us. Along with our memories and habits, much of what shapes an experience depends on the emphasis placed on various senses during each moment, as well as what we are trying to accomplish.

Imagine two people standing a few feet from each other on a crowded city street one morning. Woman A, who lives in the city, is waiting for a taxi to go to an appointment. Woman B, a visitor, is waiting to meet a childhood friend who lives there. Both women are visually scanning the busy streets. Both are looking for something, but everything from their postures to their faces shows them to be in very different modalities. Woman A is looking out for yellow cars and lighted signs and trying to confidently establish the curb territory as hers in case of competing commuters. Woman B is scanning faces, hoping not to look too conspicuous, trying to identify what may be a familiar face, and which may also be exhibiting the same type of expectant scan. Her senses are active, but not sharply focused, because she’s not sure what will signify the arrival of her friend. She listens for her name. She keeps an eye on her phone. She glances at taxis, in case her friend arrives in one. Woman A, meanwhile, is much more focused and expert at her task. Her eyes are intent. Her spatial sense functions peripherally to perceive competition, and has vaguely noticed B as a possible, minor threat. Her arm is ready to shoot up when she sees a potential cab.

Very different experiences of the same moment and context.

In order to design for each of those women, we need to understand their purposes and expectations.. The multimodalities they use play a large part in both.

Attributes and Abilities of Multimodalities

Multimodalities are complex and varied. To support them, there is no particular checklist that covers each case. But there are common attributes and abilities that provide starting points for design.

Focus

Multimodalities structure focus by prioritizing important information and filtering out what’s irrelevant. Despite this, suppressed senses remain on standby. When new information becomes urgent, we can quickly reprioritize (shift). For example, if a fire alarm went off while you were rummaging through a drawer, not only would you hear it, but additional cognitive, motor skills, and reflexes would activate.

Modal focus is crucial to building skills and performing tasks, for several reasons. It optimizes attention, effort, and integration of the most important sensory, cognitive, and motor abilities. This allows us to complete tasks more easily and effectively. It gates the other senses, to reduce interference, but allows these gates to be bypassed when necessary.

Flow

Flow is an experience state where perception, cognition, and action coalesce into effortless performance. Despite their high activity state, a person feels calm, relaxed, and energized.

Sequence

Some activities are broken up into steps that follow a particular order. For pitching a baseball, it might be observing, then windup, early cocking, late cocking, acceleration, and follow-through.7 If the pitcher needs to switch focus from striking out the batter to tagging a base-stealing runner, it is called a shift. When a sequence is fixed, its steps must occur in a specific order. When it is open, its steps can occur in any order.

Simultaneity

Instead of a sequence, some activities may be performed at the same time. Driving a car blends several tasks including watching the road, pressing the pedals, turning the steering wheels, and listening to the engine and other cars. Some of these may be performed at less conscious levels to support greater focus on whatever is top of mind.

Shift

We need to delegate attention across quickly changing circumstances as they unfold. A modal shift might be from visual to auditory search when someone calls for help in the woods. A multimodal shift might happen in a conversation as you shift from listening to speaking. Or even more pronounced, from having a conversation to paying the check at a restaurant. A shift might be the result of natural progression and ending of activity (conversation over), or it might be the result of an interruption (tired waiter needs to go home). A shift may also be caused by internal disruptions. Needs, like thirst, hunger, and sleepiness, remind us to take care of ourselves. A sudden thought, like an idea, insight, or emotion, can also cause a shift in our attention.

Transition

The way we experience shifts are called transitions. They can be harsh and jarring or smooth and supportive, depending on many factors. Often transitions are unplanned, as with many interruptions. Other times they can be planned and orchestrated, as when a “good night” song comes on in some stores, telling shoppers it’s time to shift from browsing to buying and then leaving. Sometimes the best way to ease a transition out of one modality is by creating a reliable way back in, as when apps or devices remember what their user was doing when they stopped, and provide a “bookmark” or the same state when the user returns.

Substitution

Substitutions use alternative senses when there is interference within a modality, like finding your wallet by feel because your eyes are on the road. We’ve also built tools to help substitute modalities. Can’t find your phone? There are several ways to track it, but the most simple is just calling it to play a ringtone. This is shifting from visual to auditory modality and then right back again once you have a sense of where you are looking, or perhaps integrating the two modalities as you scan the room for where the sound is coming from.

Translation

Translations map information to a relevant modality, when an experience lies outside of human perception or outside of practical means to obtain physical information. A weather map, for instance, can show the intensity of a tropical storm, determined by the temperature of the clouds. Translation is a common technique for information visualization as well as for alert sounds such as smoke detectors, seatbelt reminders, and doorbells. (You could consider it the basis of written media.) Unlike visual detection, doorbells are not limited by line of sight—we can hear a doorbell through walls and doors. Translation may also make certain experiences easier or more flexible when needed.

Proficiency

When we repeat certain activities, the brain’s response is to hardwire the activity into our brain. We no longer need to think about the activity to do it. We do it unconsciously, with little or no awareness, resulting in a cognitive state known as flow. In many cases, the less we concentrate while we are executing one of our expert abilities, the better we perform at it. Some activities simply require repetition to develop proficiency. Others required more specialized training or practice. Someone who is proficient at an activity may have an increased ability to perceive things that are relevant, called perceptual expertise.8

Common Categories of Multimodalities

While modalities are described by their focal sense, multimodalities can be more easily classified by their associated purpose or activity. These are not precise, exclusive, or comprehensive categories, and there is a lot of overlap and integration between them. They are helpful for understanding the shared intents, contexts, and sensory considerations across the behaviors:

Basic abilities

  • These are core human activities that are often incorporated into other types of activities. They are essential to daily life. They include activities like sitting, standing, walking, and speech. These multimodal behaviors are usually developed very early in our lives. They become a part of the activities of which we have the least awareness and are easily integrated into multitasking activities.

Orientation and scanning

  • Because vision is such a dominant sense, there are many multimodalities that prioritize visual stimuli. This category encompasses activities in which vision and, to a lesser extent, proprioceptive capabilities are used together. We use vision to navigate our surroundings and to recognize, locate, or measure specific objects in our environment. It includes some of our browsing behaviors, as we seek out specific choices, and make comparative decisions around them. It also includes pathfinding activities, where we develop spatial models of our environments. While there may be supporting modalities, they focus on enabling the absorption or analysis of visual information. Our visual ability to analyze large sets of items and attributes is unparalleled by the other senses in both speed and detail.

Hand–eye coordination (visuo-haptic integration)

  • This is an important category for interaction designers, because haptic tools and interfaces are by far and away the most common. Blending proprioceptive, tactile, and visual modalities, and often supported by auditory modalities, it allows us a wide range of physical interactions with objects and environments. Curiously, the term also applies to foot–eye coordination, like using pedals or kickstands. Activities can be differentiated by whether they require more precision or strength, like lifting a bag of groceries versus performing brain surgery. They can further be differentiated by whether they are manual, tool-based, or interface-based. Manual activities require direct manipulation by the hands, such as picking up a French fry. Tool-based activities employ a tool or other form of equipment that can be manipulated, such as scissors, pencils, or a guitar. Interfaces employ abstracted control systems for mechanical or computational equipment and devices.

Social interaction

  • Social interaction is a major component of human behavior, and we have developed sophisticated tools, norms, and expectations around it. Human linguistic ability is so strong that we have multiple modalities for communication. This includes speech, writing, and body language. We can use them in tandem or alone. The relationship between individuals plays a strong role in shaping communication and is also expressed across several multimodal behaviors such as physical proximity, contact, prosody, and even linguistic formality.

Performance and athletics

  • Our proprioceptive and haptic abilities are supported by other sensory abilities when activities require both strength and precision. Very often these behaviors require repetition or practice to commit skills to implicit memory, reduce response times, or increase speed, strength, or accuracy. Enabling a flow state is often a crucial component of these experiences. These activities include playing sports, acting, dancing, and specialized manual labor.

Cognition and analysis

  • Some activities require a high level of sensory or cognitive focus. They emphasize the resolution and quality of the sensory stimuli of the experience, like listening to music or complex analysis like solving a physics problem. In these cases, maintaining focus is the priority of the experience. Some people will find external stimuli distracting to more internalized sensory or cognitive processes. Some require external tools or models to aid them. These are often subjective personal preferences. A high level of flexibility and individual control or agency can be required.

Applying Multimodality to Design

As we create products for an ever-broader set of circumstances, there are a few guidelines that are useful. Understanding the ways that our minds and senses work together, and what they need to do so effectively, offers a good set of fundamental considerations.

Maintaining Focus

Focus is to experience as understanding is to information. It’s a result of the mind being successfully engaged and able to comprehend a coherent thread through the experience. Focus and flow are some of the more important considerations for designers working with multimodalities. That doesn’t mean every interaction has to be achieved in a state of extreme, brow-furrowed concentration. Or that modalities must play out in seamless preordained patterns. Far from it. Focus is simply the state of mind that results from things going smoothly. It’s a critical state of mind on the path to successfully achieving an aim.

When we talk about focus, we mean a few things. One is the level of awareness that we have for a particular sensation. Are we able to tune in to something using our senses and extract salient information from it? Is doing so a common practice, familiar to most people (and likely users)? What do we need to maintain this concentration, and how important is it that we do so?

Respecting Cognitive Load

We may be multimodal all the time, but the combination is limited by our brain’s ability to process multiple inputs. Exceeding these limits carries the potential for danger, so we must protect ourselves from the results of losing focus, as psychologist Daniel Kahneman writes:

Everyone has some awareness of the limited capacity of attention, and our social behavior makes allowances for these limitations. When the driver of a car is overtaking a truck on a narrow road, for example, adult passengers quite sensibly stop talking. They know that distracting the driver is not a good idea, and they also suspect that he is temporarily deaf and will not hear what they say.9

Like those careful passengers, device designers must also be aware of their ability to interfere with, as well as augment, an experience. We believe that transitioning people between modalities is better than simply creating more and more activities that interrupt each other

Cognitive load varies across modalities. For instance, when reading, we use vision to absorb dense information, that requires significant cognitive processing. Libraries, designed to support reading, have rules about maintaining quiet in order to reduce interruptions and distractions that can break focus. Devices can follow suit.

Overcoming Barriers with Substitutions and Translations

Sometimes the normal way of doing things turns out not to work. If you’re driving, maybe your usual road is closed. So you take another. This also happens to our senses almost every day. What happens when one of your sense modalities doesn’t work? Because we are complex, adaptive beings, our sensory modalities also have options, and some work better than others.

When a barrier to a preferred modality exists, or an experience lies outside of human perception, substitutions and translations can offer alternative modalities to fill in the gaps. If you lose sight of your friend in a crowd, you might call out their name, which is an example of substitution. We effortlessly substitute modalities all the time. If there’s not a practical way to physically receive information, a translation may make use of a different sensory channel. A light on a dark ceramic stove top, for instance, is designed to tell you that it’s hot. Finding out too late by touch is dangerous. Sometimes information simply lies outside of human abilities to perceive, such as high levels of air pollution. Normally we rely on seeing and listening to move around without danger. But airborne particulates are difficult to detect until they have already done damage. Maps and pollution alerts help people understand which areas call for protective measures or avoidance.

Shifts, Interruptions, and Flow

When we experience cognitive fatigue and can no longer absorb or dismiss new information, or another activity takes priority, an interruption occurs. How serious this interruption is depends on the activity. Someone else talking might make you forget what you were going to say, or it might cause you to miss the fact that the traffic light has changed. We have a host of social norms around interruptions, especially ones we generally want, like phone calls, text alerts, alarms of all sorts, and the bells or music used by ice cream trucks.

The point is, interruptions happen, and in some cases are desirable. No matter how focused we are, how intent on remaining single-minded, we naturally shift. Whether that’s experienced as a momentary digression or a more disruptive break depends on the circumstances. Being aware of likely shifts and possible interruptions, or even orchestrating them as needed, is important. Making a safe path out and a clear path in should be high in designers’ minds.

Here are some techniques for managing attention effectively and respectfully:

Maintaining focus

Reinforce

  • Deliver information using more than one sense for high-priority information or learning experiences.

Pace

  • Don’t overtax a user’s cognition, but deliver just-in-time information, especially in experiences that require more focused effort attention or that span multiple modalities.

Block

  • Shifting between different modalities or different types of sensory information within a single modality can be challenging. Try to organize similar forms of sensory information together or maintain a modality or multimodal set consistently through an experience.

Dealing with interruptions

Safety exits

  • This is the experiential equivalent of “degrading gracefully”: a realization that interruptions will happen, so make sure that the way out of a modality is safe whenever possible.

Ease of re-entry

  • Provide threads that are easy to pick back up again to avoid that “what was I doing?” moment of hesitation.

Off-switch

  • Allow users the ability to decline interruptions, either for a period, or permanently.

Allowing shifts

Social or ecosystem norms

  • It’s important to make high priority interruptions quickly and easily identifiable. Meeting user expectations around these kinds of cues can be helpful, but when overused become ignored. It’s also important to consider the impact of an interruptive cue within a social setting, for instance.

Priming

Well-designed interruptions can help users transition their attention more effectively. Certain kinds of interruptions tap into the startle response, like alarms and sirens, prompting people to immediate action. Lower priority interruptions, however, can be calmer and more gradual, allowing people the choice to shift their attention or ignore the cue. Some audio alerts slowly fade in. Some alerts have a “pre-alert” to prepare users for more noticeable stimuli that follows. This can also give people more response time, allowing them to more fully comprehend these interruptions and respond more effectively.

Feedback and Validation

Our senses are pretty invisible to us, disappearing behind all the information they provide. That’s why it helps to have multiple sources that confirm each other—for example, when a sound comes from a moving object, that sound helps us to locate it and predict where it might be next, allowing us to both see and hear it better. We cross-reference between current sensory stimuli and previous stimuli to validate knowledge and train our senses to better filter information.

Body language and physical engagement

There is a large body of work around kinesics, a term coined by Ray Birdwhistell that describes the use of gesture, posture, and movement to communicate—basically, what’s known as “body language.” Gestural outputs, whether conscious or unconscious, got a big boost when Apple started using accelerometers to integrate gestural interfaces on the iPhone. The XBox Kinect added a camera for similar purpose.

We have become accustomed to using a keyboard and mouse or on-screen keyboard and touchscreen together. Skills around other less common interfaces can be less developed. Familiarity is a double-edged sword. Tapping into existing mental models can make new interfaces easy to learn. They cal also get in the way of adopting interfaces that are different but might work better. There is no such thing as an intuitive interface. There are only interfaces that tap into pre-existing knowledge and skills—implicit memory in particular—developed through other activities or means. If it feels intuitive to you, challenge yourself to think about how you developed this pre-existing knowledge or skill and consider whether your users have had the same opportunity.

Metaphors have been put to good use in screen-based design, because we are good at reusing our existing behaviors and applying them to new ones. Understanding what skills people already have is key to developing new types of physical interfaces. Using a steering wheel, or using drawing tools like a pencil—these are good sources of inspiration. Physical interfaces might not require metaphor, but may reuse some of that same physical knowledge. Be cautious assuming that people know how to apply their existing physical skills to new kinds of interactions. Test, measure, learn, and iterate is a good approach for developing new interface modes. As with all new things, you may find a few key hacks and variations that no one saw coming.

Summary

Modalities are patterns of perception, cognition, and action that enable our behaviors. They allow us to focus on important sensations, filter out those that are less important, and adjust our senses to understand what is happening. Multimodalities combine two or more modalities to enable more complex behaviors. When barriers exist to one modality, we make use of substitutions and translations. There are important rules of thumb for designers working with multimodalities. Those include principles such as respecting cognitive load, supporting focus, maintaining flow, and dealing with interruptions.

1 David Eagleman, The Brain: The Story of You, Vintage, 2015, Figure 2-5.

2 Alberto Gallace and Charles Spence, In Touch with the Future: The Sense of Touch From Cognitive Neuroscience to Virtual Reality, Oxford University Press, 2014, #elements_of_multimodal_design.

3 Barry E. Stein and M. Alex Meredith, The Merging of the Senses, MIT Press, 1993.

4 Stein and Meredith, p.15.

5 Stein and Meredith, p. xi.

6 David Eagleman, The Brain.

7 Peggy A. Houglum,Therapeutic Exercise for Musculoskeletal Injuries, Third Edition, Human Kinetics, 2010

8 Michael Harré, Terry Bossomaier, and Allan Snyder, The Perceptual Cues that Reshape Expert Reasoning, Nature 2012. https://www.nature.com/articles/srep00502.

9 Kahneman, p.23.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset