Chapter 7

Behavioral and Physiological Metrics

Contents

During a usability study, most participants do much more than complete tasks, respond to questions, and fill out questionnaires. They may laugh, groan, smirk, grimace, smile, fidget in their chair, look aimlessly around the room, or drum their fingers on the table. They feel a wide range of emotions such as stress, excitement, frustration, and surprise. Certain elements of the product grab their attention, while others are completely ignored. Many of these behaviors and emotions are measurable and offer valuable insights into the user experience of the product being tested. This chapter discusses metrics related to unprompted verbal expressions, eye tracking, emotional engagement, and stress.

7.1 Observing and Coding Unprompted Verbal Expressions

Unprompted verbal expressions provide valuable insight into a participant’s emotional and mental state while they are using a product. The participant will probably make many comments without being asked, some negative (This is hard or “I don’t like this design”) and some positive (“Wow, this is much easier than I expected” or “I really like the way this looks”). Some comments are neutral or just hard to interpret, such as “This is interesting” or “This is not what I expected.”

The most meaningful metric related to verbal expressions is the ratio of positive to negative comments. To do this type of analysis, you first need to catalog all verbal expressions or comments and then categorize each one as positive, negative, or neutral. Once this is complete, simply look at the ratio of positive to negative comments, as illustrated in Figure 7.1. Only knowing that positive comments outnumbered negative comments by a 2:1 ratio does not say a lot by itself. However, it’s much more meaningful if the ratios are compared across different design iterations or between different products. For example, if the ratio of positive to negative comments has increased significantly with each new design iteration, this would be one indication of an improved design. Also, if a participant is interacting with more than one design, the same ratio can be calculated for each individual participant, assuming of course that the time spent with each product is the same.

image

Figure 7.1 Example of coding the percentage of positive, neutral, and negative comments for two different designs

It’s also possible to get more granular by differentiating among different types of unprompted verbal comments, such as the following:

• Strongly positive comments (e.g., “This is terrific!”)

• Other positive comments (e.g., “That was pretty good.)

• Strongly negative comments (e.g., “This website is terrible!”)

• Other negative comments (e.g., “I don’t much like the way that worked.)

• Suggestions for improvement (e.g., “It would have been better if…)

• Questions (e.g., “How does this work?)

• Variation from expectation (e.g., “This isn’t what I was expecting to get.”)

• Stated confusion or lack of understanding (e.g., “This page doesn’t make any sense.”)

• Stated frustration (e.g., “At this point I’d just leave the website!”)

These types of data are analyzed by examining the frequency of comments within each category. Like the previous example, comparing across design iterations or products is the most useful. Categorizing verbal comments beyond just the positive, negative, or neutral can be challenging. It’s helpful to work with another UX researcher to reach some level of agreement about categorizing each comment. Make good use of video recording. Even the best note takers can miss something important. Also, we recommend that these comments be viewed within a larger context. For example, if a participant said that they would never use the product, under any circumstance, yet say something positive about the colors, this needs to be accounted for in other metrics, as well as how the findings are presented. While these metrics are seldom collected because it is fairly time-consuming, they can offer valuable insight into the underlying feelings about a particular design.

7.2 Eye Tracking

Eye tracking in user research has become more common over the past few years. This is in part due to the ease of use of the systems, particularly around analysis, accuracy, and mobile technology (in the form of goggles), as well as new webcam-based technology.

7.2.1 How Eye Tracking Works

Although a few different technologies are used, many eye-tracking systems, such as the one shown in Figure 7.2, use some combination of an infrared video camera and infrared light sources to track where the participant is looking. The infrared light sources create reflections on the surface of the participant’s eye (called the corneal reflection), and the system compares the location of that reflection to the location of the participant’s pupil. The location of the corneal reflection relative to the pupil changes as the participant moves his eyes.

image

Figure 7.2 An eye-tracking system from SMI (www.smivision.com). Infrared light sources and an infrared video camera are directly below the monitor. The system tracks the participant’s eyes automatically in real time.

The first activity in any eye-tracking study is to calibrate the system by asking the participant to look at a series of known points; then the system can subsequently interpolate where he is looking based on the location of the corneal reflection (see Figure 7.3). Typically the researcher can check the quality of the calibration, usually expressed as degrees that deviate from the X and Y visual planes. Deviations less than one degree are generally considered to be acceptable, and less than one-half of a degree is very good. It is critical that the calibration is satisfactory; otherwise all the eye movement data should not be recorded or analyzed. Without a good calibration there will be a disconnect between what the participant is actually looking at and what you assume he is looking at. Following calibration, the moderator makes sure the eye movement data are being recorded. The biggest issue tends to be participants who move around in their seat. Occasionally the moderator is required to ask the participant to move back/forward, left/right, or raise/lower their seat to recapture the participant’s eyes.

image

Figure 7.3 An example of SMI software used to run an eye-tracking study and monitor eye movements in real time. The three windows contain study details (left), stimuli being tracked (top right), and eye being tracked (bottom right).

Information provided by an eye-tracking system can be remarkably useful in a usability test. Simply enabling observers to see where the participant is looking in real time is extremely valuable. Even if you do no further analyses of eye-tracking data, just this real-time display provides insight that would not be possible otherwise. For example, assume a participant is performing a task on a website and there’s a link on the homepage that would take him directly to the page required to complete the task. The participant keeps exploring the website, going down dead ends, returning to the homepage, but never reaching the required page. In a situation like this, you would like to know whether the participant ever saw the appropriate link on the homepage or whether he saw the link but dismissed it as not what he wanted (e.g., because of its wording). Although you could subsequently ask participants that question, their memory may not be completely accurate. With an eye-tracking system you can tell whether the participant at least fixated on the link long enough to read it.

7.2.2 Visualizing Eye-Tracking Data

There are many ways to visualize eye-tracking data. These visualizations tell the story about where people were looking and when. They might be the only thing that your stakeholders really care about. All eye-tracking visualizations are either at an individual level, showing eye movements for one participant, or at an aggregate level, showing eye movements for more than one participant.

Webcam-Based Eye Tracking

New technology has been developed that allows UX researchers to run eye-tracking studies remotely by taking advantage of the participant’s webcam. Webcam-based eye tracking operates under the same premise as more traditional systems use. However, instead of using an infrared signal, a webcam is recognizing the participant’s eyes, specifically the movement of the pupil to determine the location on the stimuli the participant is fixating. Vendors such as EyeTrackShop (www.eyetrackshop) provide web-based eye-tracking services, which include setting up the study, storing the data, and providing the analysis and a report. Participants initially agree to allow their webcam to be used for the study and then go through a calibration step prior to running the study. Figure 7.4 is an example screen that the participant would see during the setup process. Similar to any eye-tracking study, different images or visual stimuli are shown to the participants, along with the option to add different survey questions. This technology has the potential to be very useful for UX researchers in that eye movement data can now be collected from a large number of participants, over a short amount of time, without respect to geography. For example, advertisers are now able to test ad effectiveness with a statistically reliable sample size, across many different markets. Data from an EyeTrackShop study clearly show that the “Devil Ad” is clearly more effective with respect to drawing visual attention than the other two ads (see Figure 7.5).

image

Figure 7.4 Example of the setup procedure using EyeTrackShop. Participants are required to have their face within the profile to ensure proper calibration.

image

Figure 7.5 Ad effectiveness study using EyeTrackShop.com. The top of the screen shows stimuli (with areas of interest), and the bottom of the screen shows basic statistics, such as percentage who noticed each ad, amount of time spent looking at each ad, and how long it took to first notice each ad. The “devil ad” on the left was most effective.

Figure 7.6 shows the series or sequence of fixations that an individual participant made on the Amazon Video website, also known as a scan path. This is perhaps the most common way to visually represent the eye movements for a single participant. A fixation is defined by a pause in the eye’s movement within a well-defined area. Normally these pauses last at least 100 msec (1/10th of a second) or longer. Fixations are usually numbered to indicate their sequence. The size of each circle is proportional to the length or duration of the fixation. The saccades, or movements between fixations, are shown by the lines. In Figure 7.6 it is easy to note that the participant was focused primarily on the faces, as well as the first “learn more” box (on the far left). Scan paths are an excellent way to show how a participant looked at the page and what elements they saw in what order.

Did You Know?

During the saccades, when we’re moving our eyes from one point to another, we’re essentially blind. This is true whether we’re scanning a webpage or reading a book like this one. Of course, we don’t perceive it that way. Our brains are constantly integrating the information from the various fixations to give us the perception of a continuous visual stream of information.

image

Figure 7.6 Example of one individual’s scan path of eye movements on the Amazon Video website.

By far the most common way to visually represent eye movement for multiple participants is through a heat map (see Figure 7.7). In this visualization, the brightest areas (red) represent a greater density of fixations. It is an excellent way to get a sense of what areas of the page attract more (and less) visual attention. It is important to keep in mind that the analysis software allows the researcher to define the scale of what is considered “red” versus “orange,” etc. So, beware that the researcher can easily exaggerate the heat maps to show more or less color. We recommend using the default settings on most software; however, it is important to experiment with using different scales. The opposite visualization is called a focus map, which makes transparent those areas that received more visual attention and darkens those areas that received little or no visual attention. In some sense, a focus map is more intuitive, but a little less common since it is hard to see those areas that are ignored by users.

image

Figure 7.7 Example of a heat map of the Amazon Video website showing the distribution of eye movements across all participants in the study. The brighter areas as shown in red, orange, and yellow received relatively more visual attention.

7.2.3 Areas of Interest

The most common way to analyze eye-tracking data is by measuring visual attention on specific elements or regions. Most researchers are not just interested in how visual attention is distributed across a web page or scene, but whether participants noticed certain things and how much time was spent looking at them. This is particularly the case in marketing, whereby the success of an ad campaign is tied directly to getting customers to notice something. Also, it’s a concern when there are certain elements critical to task success or having a positive experience. When users don’t see them, you can be sure that is a problem.

Figure 7.8 is an example of how to define specific regions on the page. These regions are typically referred to as “look zones” or “areas of interest” (AOIs). AOIs are essentially those things that you want to measure, as defined by a set of x,y coordinates. When analyzing time spent looking at different regions, keep the following in mind:

• Define each region carefully. Ideally, there will be a small amount of white space in between regions to make sure the eye movements don’t get caught in between two AOIs right next to each other.

• Each region should be fairly homogeneous, such as navigation, content, ads, legal information, and so forth. If you prefer to subdivide your AOIs into individual elements, you can always aggregate the data later on.

• When presenting data by AOIs, the question about where participants actually looked within the region typically comes up. Therefore, we recommend including a heat map, as in Figure 7.6, that shows the continuous distribution of fixations.

image

Figure 7.8 Example of the Amazon Movies website with AOIs showing summary statistics for each AOI.

Another useful way to analyze eye movement data by AOIs is through a binning chart (see Figure 7.9). A binning chart shows the percentage of time spent looking at each AOI by some time interval. Keep in mind that the percentages might not add up to 100% unless all the available space is represented within an AOI. Figure 7.9 shows that AOI 1 (green) received more visual attention in the first few seconds relative to the last few seconds. Conversely, AOI 2 (gray) received more visual attention in the last few seconds compared to the first few seconds. This is a useful way to see the relative prominence of each AOI, not just expressed as a total amount of time. Figure 7.10 is a gridded AOI that shows the amount of visual attention given to equal-sized cells. This is a helpful visualization to see the visual attention across a page, particularly when the elements are not consistent across all pages. For example, the researcher may choose to aggregate data from more than one web page into a single gridded AOI to see generally where users are looking.

image

Figure 7.9 Example of a binning chart of the same Amazon Movies website. The binning chart shows the percentage of time spent looking at each AOI during each 1-second interval.

image

Figure 7.10 Example of a gridded AOI for the Amazon Movies website. The gridded AOI shows the amount of visual attention given to equal-sized cells on the page.

7.2.4 Common Eye-Tracking Metrics

There are many metrics associated with eye-tracking data. The following are some of the most common eye-tracking metrics used by UX researchers. It’s important that all of these metrics are associated with specific AOIs. Figure 7.11 is an example of the type of metrics derived from a single AOI.

image

Figure 7.11 Example of common metrics calculated for a single AOI using the SMI software.

Dwell Time

Dwell time is the total amount of time spent looking within an AOI. This includes all fixations and saccades within the AOI, including revisits. Dwell time is an excellent metric that conveys the level of interest with a certain AOI. Obviously, the greater the dwell time, the greater the level of interest in the AOI. As a general rule of thumb, dwell times less than 100 msec generally mean the participant processed a limited amount of information. A dwell time greater than 500 msec generally means the participant had an opportunity to process the information.

Number of Fixations

The number of fixations is simply the total count of fixations with an AOI. The number of fixations, as expected, is strongly correlated with dwell time. Because of this, we typically just report dwell time.

Fixation Duration

Fixation duration is the average time for fixations. Fixation duration typically ranges from 150 to 300 msec. Fixation duration, similar to number of fixations and dwell time, represents the relative engagement with the object. The greater the average fixation duration, the greater the level of engagement.

Sequence

The sequence represents the order or sequence in which each AOI is first fixated. The sequence tells the researcher the relative prominence of each AOI within the context of a given task. Sometimes it is very helpful to know which AOIs are jumping out to users initially and which AOIs are receiving attention later on. Typically, the sequence is calculated as the average order that each AOI was visited. Keep in mind that many participants may not have experienced that exact same order. Sequence is just a best estimate. We also recommend looking at a binning chart (see Figure 7.8) as another view on sequence of AOIs.

Time to First Fixation

In some situations it’s helpful to know how long it takes users to first notice a particular element. For example, you may know that users spend only 7 seconds on average on a page, but you want to make sure that a specific element, such as a “continue” or ‘‘sign up’’ button, is noticed within the first 5 seconds. It’s helpful that most eye-tracking systems time stamp each fixation (i.e., the exact time that each fixation occurred).

One way to analyze these data is to take an average of all the times at which the particular element was first fixated. Data should be treated as elapsed time, starting from the initial exposure. The average represents the amount of time taken to first notice the element, for all of those who did notice it. Of course, it’s possible that some of the participants may not have noticed it all, let alone within the first 5 seconds. Therefore, you may come up with some misleading data showing an artificially quick time by not taking all the participants into account.

Revisits

Revisits are the number of times that the eye fixates within an AOI, leaves the AOI, and returns back to fixate within the AOI. Revisits indicate the “stickiness” of the AOI. Do the users fixate and leave the AOI, never to return, or do they keep coming back with their eyes?

Hit Ratio

The hit ratio is very simply the percentage of participants who had at least one fixation within the AOI. In other words, this is the number of participants who saw the AOI. In Figure 7.10, 10 out of 13 participants (or 77%) fixated within this particular AOI.

Can You Trust What People Say They Saw in a Usability Test?

Albert and Tedesco (2010) ran an experiment in which they used eye tracking to test whether usability test participants report what they see accurately. In this study, participants looked at a series of website homepages. After being shown each homepage, the moderators pointed out a specific element. Half of the participants indicated if they had looked at specific elements based on three potential answers (did not look at the element, not sure if they looked at the element, or did look at the element). The other half of the participants used a five-point scale based on how much time was spent looking at that element (from “no time at all” up to “a lot of time”). Results showed that, in general, the eye movements were consistent with what the participants reported seeing. However, in about 10% of the cases, the participant claimed to have “definitely seen” an element, which the eye-movement data showed they did not fixate. In the second group of participants, about 5% of the cases the participants said they “spent a long time looking at an element,” yet did not have any eye fixations on that element. Together, these results suggest that participants self-reporting what they looked at during a usability test are reasonably reliable but certainly not perfect.

7.2.5 Eye-Tracking Analysis Tips

Over the years we have learned a few things about how to analyze eye-tracking data. Above all else, we strongly recommend you plan your study carefully, as well as taking time to explore the data. It’s very easy to draw the wrong conclusion based on a few heat maps. Here are a few other important tips to keep in mind as you dive into the data.

• Control the amount of exposure time for each participant. If they did not see the same image or stimuli for the same time, predefine the time to only include the first 10 or 15 seconds, or whatever duration makes the most sense given the context.

• If you are not able to control for exposure time, analyze the dwell time as a percentage, not as an absolute. If someone spent 10 seconds and the other person spent 1 minute, their eye movements will be different, as well as the actual amount of time spent looking at each element.

• Only look at time data when the participant is engaged with the task. Do not include any time data when the participant is debriefing about her experience and still being tracked.

• During the study, make sure that the participants are being tracked. Monitor their eye movements in real time. As soon as they start to slouch or turn their head, remind them gently to maintain their original position.

• Be careful when analyzing eye movements on dynamic websites. Websites that change considerably due to ads, flash, frames, and so on confuse most eye-tracking systems. Every new image is essentially treated as separate stimuli. We strong recommend that you consolidate as many web pages together as possible, knowing that not every page is exactly identical. Otherwise, you will end up with way too many web pages that were only viewed by a single participant. An alternative to this is to simply use static images. They are much easier to analyze, but lack an interactive experience.

• Consider using a trigger AOI to control where participants are initially looking at the start of the experiment. A trigger might say “look here to start the experiment.” The text might be in the middle part of the page. After the participant has fixated on the text for a certain number of seconds, the experiment begins. This means that all participants start looking from the same location. This might be overkill for the typical usability test, but should be considered for more tightly controlled eye-tracking studies.

7.2.6 Pupillary Response

Closely related to the use of eye tracking in usability studies is the use of information about the response of the pupil. Most eye-tracking systems must detect the location of the participant’s pupil and calculate its diameter to determine where he or she is looking. Consequently, information about pupil diameter is included in most eye-tracking systems. The study of pupillary response, or the contractions and dilations of the pupil, is called pupillometry. Most people know that the pupil contracts and dilates in response to the level of ambient light, but many people don’t know that it also responds to cognitive processing, arousal, and increased interest. Typically the greater the level of arousal or interest, the larger the pupil size.

Because pupil dilation is correlated with so many different mental and emotional states, it’s difficult to say whether pupillary changes indicate successes or failures in everyday usability testing. However, measuring pupil diameter may be useful in certain situations where the focus is on the amount of mental concentration or emotional arousal. For example, if you are interested mainly in eliciting an emotional response to a new graphic on a website, then measuring changes in pupil diameter (from baseline) may be very useful. To do this, simply measure the percentage deviation away from a baseline for each participant and then average those deviations across the participants. Alternatively, you can measure the percentage of participants who experienced dilated pupils (of a certain amount) while attending to a particular graphic or performing a specific function.

7.3 Measuring Emotion

Measuring emotion is difficult. Emotions are often fleeting, hidden, and conflicted. Asking a participant about what she is feeling through an interview or survey may not always be effective. Many participants tell us what they think we want to hear or simply have difficulty articulating what they are really feeling. Some are even hesitant or afraid to admit their true feelings to a perfect stranger.

Despite the difficulty in measuring emotions, it is still very important for the UX researcher to understand the emotional state of the participant. The participant’s emotional state while experiencing something is almost always a concern. Most UX researchers use a combination of probing questions, as well as interpretation of their facial expressions, and even body language to infer the participant’s emotional state. This may be acceptable for some products; however, it does not always suffice. Some products or experiences are relatively much more emotional and have a greater bearing on the overall user experience. Simply think about the range of emotions a participant might experience when calculating how much money he will have when he retires, reading about a health condition he has, or just playing an action game with friends.

There are essentially three different ways to measure emotions. Emotions can be inferred based on facial expressions, by skin conductance, or by use of EEG. This section highlights three different companies that used these three different approaches. All of these products and services are currently available commercially.

7.3.1 Affectiva and the Q-Sensor

Based on an interview with Daniel Bender, product manager, Affectiva (www.affectiva.com).

The Affective Computing Research group at MIT’s Media Lab was founded in 1998 by Professor Rosalind Picard Sc.D. in an effort to develop technologies that advance understanding of emotions. The aim of the research group is to restore a proper balance between emotion and cognition in the design of technologies for addressing human needs (http://affect.media.mit.edu/). Picard and coinvestigator Rana el Kaliouby, Ph.D., cofounded Affectiva in April 2009 to commercialize technologies developed at the MIT research group. The first product to come from Affectiva is called the Q Sensor (see Figure 7.12).

image

Figure 7.12 Affectiva’s Q Sensor, a wearable, wireless biosensor.

The Q Sensor is a device worn on the wrist that measures the electrical conductance of the skin known as electrodermal activity (EDA). EDA increases when you sweat—small increases in moisture are associated with increased sympathetic nervous system activity indicating emotional activation or arousal. Three types of activation can lead to increases in arousal: increases in cognitive load, affective state, and/or physical activity. Emotional states associated with EDA increases include fear, anger, and joy. Arousal increases are also associated with cognitive demands and may be seen when you are engaged in problem-solving activity. Our state of arousal—and hence the conductivity of our skin—is lower when we are in a relaxed state or bored.

Researchers in a number of fields are using the Q Sensor to measure sympathetic nervous system activity objectively. One of the initial use cases for the Q Sensor has been in understanding the emotional state of students on the autism spectrum. Individuals with autism spectrum disorders often present neutral facial expressions, despite feeling threatened, confused, or otherwise emotionally distressed. Researchers working with autistic students are reviewing EDA data captured with the Q Sensor to better understand the triggers for emotional outbursts. Eventually, the technology will make its way into the classroom where it will serve teachers by providing early warning signals that students are becoming stressed without outward displays of duress. This will enable teachers to respond to their students in a timely and appropriate way.

In the area of user experience research, the Q Sensor can be used to help pinpoint moments of excitement, frustration, or increased cognitive load experienced by the participant. The UX researcher establishes a baseline for each participant. Experiences are then compared to their baseline, with particular attention given to the peaks, or places where there was a peak level in arousal.

While it is helpful knowing what may have triggered an increased level of arousal, it does not tell the researcher whether the experience was positive or negative. This is known as valence. Picard recognized the need to measure valence objectively as she brought Affectiva cofounder el Kaliouby to MIT in January 2007. El Kaliouby’s research had been focused on measuring facial expressions using computer-vision and machine-learning techniques. This technology matured and was incorporated into Affectiva’s second product, the Affdex facial expression recognition system. Affdex is a passive web-based platform that can take streaming video as an input and predict the presence of facial expressions in close to real time. Affdex is being used to measure emotional response to media in online panels and in usability labs. Affdex facial-expression recognition provides an indication of the type of experience associated with the state of arousal.

Facial expressions are captured through a standard web camera on the participant’s computer and time synchronized with data from the Q Sensor. This provides a rich data set, as peaks in arousal can be associated with a positive or negative valence. With Affdex, Affectiva is building the largest database of spontaneously generated facial expressions in the world. This will allow Affectiva to develop more advanced classifiers of different emotions, which will be used to predict increases in sales or brand loyalty. This powerful technology will arm the UX researcher with an additional set of tools to better understand emotional engagement across a wide variety of experiences. Case study 10.5 highlights use of the Q Sensor in the context of using an onscreen and tablet-based textbook.

Relationship among Task Performance, Subjective Ratings, and Skin Conductance

In a study of participants playing a 3D video game (Super Mario 64), Lin, Hu, Omata, and Imamiya (2005) looked at the relationships among task performance, subjective ratings of stress, and skin conductance. Tasks involved playing three different parts of the game as quickly and accurately as possible. Participants played each part (task) for 10 minutes, during which period they could potentially complete the goal (succeed) multiple times. There was a strong correlation between participants’ ratings of how stressful each of the tasks was and their normalized skin conductance (change relative to the participant’s baseline) during the performance of each task. In addition, participants who had more successes during the performance of each task tended to have lower skin conductance levels, indicating that failure was associated with higher levels of stress (see Figure 7.13).

image

Figure 7.13 Data showing subjective ratings of stress (a) and normalized skin conductance (b) for three different tasks in a video game. Both show that Task 3 was the most stressful, followed by Task 2 and then Task 1. Adapted from Lin et al. (2005).

7.3.2 Blue Bubble Lab and Emovision

Based on an interview with Ben van Dongen, CEO and founder, BlueBubbleLab (www.bluebubblelab.com)

Blue Bubble Lab is a media and technology company based in Palo Alto and Amsterdam that focuses on bringing more relevant messages to consumers based on their emotions and behavior. ThirdSight (www.thirdsight.com), a subsidiary of Blue Bubble Lab, has developed a suite of technology products that bring together computer vision, facial expression analysis, and eye tracking. One product, Emovision, is an application that allows the researcher to understand the participants’ emotional state while pinpointing what they are looking at. It is a powerful combination of technologies because the researcher can now draw a direct connection between visual stimuli and an emotional state at any moment in time. This will be invaluable in testing how different visual stimuli produce a range of emotional responses.

Emovision determines the emotional state based on the participants’ facial expressions. In the 1970s, Paul Ekman and Wallace Friesen (1975) developed taxonomy for characterizing every conceivable facial expression. They called it the Facial Action Coding System, which included 46 specific actions involving the facial muscles. From his research, Ekman identified six basic emotions: happiness, surprise, sadness, afraid, disgust, and anger. Each of these emotions exhibits a distinct set of facial expressions that can be reliably identified automatically through computer vision algorithms. Emovision uses a webcam to identify the facial expressions at any moment in time and then classifies it into one of seven unique emotions: neutral, happy, surprise, sad, scared, disgusted, and puzzled. At the same time, the webcam is used to detect eye movements.

Figure 7.14 shows how the Emovision application works. On the left side of Figure 7.14 the participant’s facial expressions are analyzed. Distinct facial muscles are identified and, depending on their shape and movement, an expression is identified. The right side of the application (Figure 7.14) shows the stimulus that is being viewed and the eye movements. In this case the participant is watching a TV commercial, and fixating (as represented by the red dot) in between the two women. The bottom of the screen shows the emotion (in this case it is happy) and assigns a percentage. The line graph depicts the change in emotion over time. When analyzing these data, the researcher can look at any moment in time and identify the associated emotion(s). Also, the researcher can view the overall mood of the experience by seeing the frequency distribution of all the emotions across the experiment. This might be valuable data when comparing different products.

image

Figure 7.14 Example of EmoVision application that incorporates webcam-based eye tracking and facial expression analysis in real time.

One of the more fascinating applications of this technology is being able to target messages to consumers based on their mood. Figure 7.15 is an example of how this technology can be used to capture facial expressions in the real world, determine the overall mood (positive or negative), as well as demographics such as age and gender, and then deliver a targeted message on a digital billboard or other platform.

image

Figure 7.15 Example of how ThirdSight products can be used to deliver tailored messages to consumers based on their mood and other demographics.

7.3.3 Seren and Emotiv

Based on an interview with Sven Krause, key account director, Seren (www.seren.com/)

Seren is a customer experience consultancy based in London. Sven Krause developed a way of measuring a user’s emotional engagement and behavior by combining electroencephalography and eye-tracking data. Seren is applying this technology to a wide variety of contexts, including branding, gaming, service, and website design. Researchers at Seren feel this new technology allows them to gain a more complete picture of the user experience as it measures participants’ unconscious responses to a stimulus.

Seren uses an EEG device developed by Emotiv (www.emotiv.com). EEG measures brain waves, specifically the amount of electrical activity on different parts of the participant’s scalp. Electrical activity is associated with cognitive and emotional states. There is a certain pattern of electrical activity when the participant is in a more excited state relative to a calm state. Also, specific patterns of electrical activity have been associated with other emotional states, such as frustration, boredom, and engagement. EEG technology has been used for many years, for example, helping diagnose patients with epilepsy, sleep disorders, strokes, and other neurological conditions. Only recently has it been applied within the field of marketing and customer experience.

Seren has worked with SMI (www.smivision.com) to integrate the Emotiv headset with their SMI eye tracker. This allows Seren’s researchers to determine what participants are looking at and what triggers their emotional and cognitive state. The integration of both EEG and eye-tracking data is critical, as all data will have a consistent time stamp, allowing the researcher to explore both eye movement and EEG data for a specific event.

Setting up and using their system is fairly straightforward. Participants wear the EEG device on their head, with a series of small conductive pads that contact the scalp and forehead. The EEG device is connected wirelessly to the eye tracker. Baseline measures are taken for a few minutes to allow the participant to get comfortable with the setting. After the researcher feels she has achieved an acceptable baseline, the study begins. Figure 7.16 shows a typical setup. The researcher is monitoring both eye movements and EEG feedback in real time (as shown in Figure 7.17).

image

Figure 7.16 Typical setup at Seren using EEG technology.

image

Figure 7.17 An SMI application that allows the researcher to observe EEG feedback and eye movements in real time.

Electroencephalography data are extremely useful in monitoring the emotional engagement of a participant throughout a session. Results can be used to prompt additional questions or to create “emotional heatmaps” that identify areas that led to a change of the emotional state.

7.4 Stress and Other Physiological Measures

Stress is unquestionably an important aspect of the user experience. Participants might feel stressed when they have difficulty finding important information or when they are unsure about a transaction they are going through. Measuring stress as part of a typical usability study is rarely done because it is hard to pinpoint the causes of stress. Perhaps the participants are nervous being in a lab environment, are worried about not doing well, or just don’t like having their stress levels measured! Because it is hard to associate stress levels to the user experience, these metrics must be approached cautiously. However, they still might be valuable in certain situations.

7.4.1 Heart Rate Variance

One of the most common ways to measure stress is by heart rate, specifically heart rate variability (HRV). HRV measures the time intervals between heart beats. Somewhat counterintuitive, having a certain level of variability in heart rate is healthier than not having any variability at all. Measuring HRV has become much easier in the last few years, thanks primarily to the obsession with fitness and health and, of course, mobile technology. Many runners and other athletes are interested in measuring their heart rates when running. These athletes are likely to be wearing a device on their chest that measures their heart rate and pulse directly. This information can be sent directly to any device. People who aim to reduce stress in their life now can use a handful of smartphone apps to help them measure and monitor their stress levels. One popular app, called Azumio (www.azumio.com), allows users to measure their stress levels using their own smartphone. The user gently places their finger over the camera, and the software is able to detect the heat rate and calculate HRV (see Figure 7.18). HRV is calculated after about 2 minutes, and a stress score is calculated.

image

Figure 7.18 Example of Azumio Stress Checker app for the iPhone that measures stress through HRV by detecting heart rate through the camera.

These new apps might be useful for UX research, particularly when evaluating more emotionally based products such as dealing with a person’s health or finances. It would be very easy to measure HRV before and after using different designs. It is quite possible that one design resulted in a greater overall HRV rate across participants than other designs. We certainly don’t recommend this as a sole measure of their experience, but it might offer an additional set of data points and potentially insight into causes of stress in their experience.

7.4.2 Heart Rate Variance and Skin Conductance Research

Several studies have sought to determine whether skin conductivity and heart rate could be used as indicators of stress or other adverse reactions in a usability setting. For example, Ward and Marsden (2003) used skin conductance and heart rate to measure user reactions to two versions of a website: a well-designed version and a poorly designed version. The poorly designed version included extensive use of drop-down lists on the homepage to “hide” much of the functionality, provided impoverished navigational cues, used gratuitous animation, and had occasional pop-up windows containing ads. Heart rate and skin conductance data were plotted as changes from the participant’s baseline data established during the first minute of the session.

Both measures showed a decrease in heart rate and skin conductance for the well-designed website. For the poorly designed site, skin conductance data showed an increase over the first 5 minutes of the session, followed by a return to baseline over the final 5 minutes. Heart rate data for the poorly designed version showed some variability, but the overall trend was to stay at the same level as the baseline, unlike the well-designed version, which showed a decrease relative to baseline. Both measures appear to reflect greater stress in interacting with the poorly designed site.

Trimmel, Meixner-Pendleton, and Haring (2003) measured skin conductance and heart rate to assess the level of stress induced by the response times for web pages to load. They artificially manipulated page load times to be 2, 10, or 22 seconds. They found significant increases in heart rate as response time (page load time) increased, as shown in Figure 7.19. A similar pattern was found for skin conductance. This is evidence of physiological stress associated with longer response times.

image

Figure 7.19 Data showing the heart rate of participants as they experienced different levels of response time waiting for web pages to load. Wait times of 10 and 22 seconds yielded progressively greater increases in heart rate relative to baseline, indicating physiological stress. Adapted from Trimmel et al. (2003) used with permission.

7.4.3 Other Measures

A few creative researchers have come up with some other techniques that might be appropriate for assessing the user’s level of frustration or engagement while interacting with a computer. Most notably, Rosalind Picard and her team in the Affective Computing Research Group at the MIT Media Lab have investigated a variety of techniques for assessing the user’s emotional state during human–computer interaction. Two of these techniques that might have application to usability testing are the PressureMouse and the Posture Analysis Seat.

The PressureMouse (Reynolds, 2005), shown in Figure 7.20, is a computer mouse with six pressure sensors that detect how tightly the user is gripping the mouse. Researchers had users of the PressureMouse fill out a five-page web-based survey (Dennerlein et al., 2003). After submitting one of the pages, participants were given an error message indicating that something was wrong with their entries on that page. After acknowledging the error message, participants were then taken back to that page, but all the data they had entered had been deleted and they had to reenter it. As illustrated in Figure 7.21, participants who had been categorized as members of a “high-response” group (based on their negative ratings in a usability questionnaire about the online survey) gripped the mouse significantly tighter for the 15 seconds after their loss of data than they did for the 15 seconds before.

image

Figure 7.20 The PressureMouse is an experimental mouse that can detect how tightly the user is gripping it. The plastic overlay (a) transmits pressure to six sensors on the top and sides of the mouse (b). As users become frustrated with an interface, many of them subconsciously grip the mouse tighter. The pressure-sensitive mouse was developed by Carson Reynolds and Rosalind Picard of the MIT Media Lab.

image

Figure 7.21 In this visualization of data from the PressureMouse, the mouse leaves a “trail” on the screen. The thickness of the trail indicates how tightly the participant is gripping the mouse. In this example, the participant is initially gripping with normal pressure while completing the online survey. When he clicked on the “Continue” button (#1), the pressure was still normal, until he started reading the error message, which caused him to grip the mouse tighter (#2). Finally, after dismissing the dialog box and seeing that the data he had entered was now gone, his grip on the mouse got even tighter (#3). Adapted from Reynolds (2005); used with permission.

The Posture Analysis Seat measures the pressure that the user is exerting on the seat and back of the chair. Kapoor, Mota, and Picard (2001) found that they could reliably detect changes in posture on the part of the participant, such as sitting upright, leaning forward, slumping backward, or leaning sideways. These may be used to infer different levels of engagement or interest on the part of the participant. Of course, anyone who has taught can easily see a student’s engagement based on how much they slouch in their seat!

These new technologies have yet to be used in everyday usability testing, but they look promising. As these or other technologies for measuring engagement or frustration become both affordable and unobtrusive, they can be used in many situations in which they could provide valuable metrics, such as designing products for children who have limited attention spans, evaluating users’ patience for download times or error messages, or measuring teenagers’ level of engagement with new social networking applications.

7.5 Summary

This chapter covered a variety of ways to measure user behaviors and emotions. This provides potentially valuable insights into the deeper user experience that is often very easy to miss in the course of a usability test. These tools are becoming much easier to use, more accurate, more versatile and powerful, and even quite affordable. Despite these many advances, we strongly recommend taking advantage of other UX metrics and not relying solely on this technology to tell you everything about the user experience. Here’s a summary of some of the key points to remember.

1. A structured approach to collecting unprompted verbal expressions during a usability test can be very helpful by tabulating the number of positive and negative comments made by participants during each of the tasks.

2. Eye tracking can be a significant benefit in many kinds of usability tests. The technology continues to improve, becoming more accurate, easier to use, and less intrusive. The value is being able to compare the effectiveness of different designs, as well as calculate metrics based on areas of interest. Key metrics include dwell time, time to first fixation, and hit ratio. There are many ways to visualize results from eye tracking, such as heat maps and gridded AOIs.

3. There are three ways to measure emotions: skin conductance, facial expressions, and EEG. Skin conductance measures level of arousal, facial expressions are classified and associated with six basic emotions, and EEG measures brain wave activity with unique signatures tied to specific emotional responses. There are new technologies based on each approach that even integrate eye movement data into their applications. These are powerful new tools used to gain insight into the emotional response of the user.

4. Stress is an important component of the user experience. It is measured most often as heart rate variance. New apps allow the researcher to calculate HRV very easily. However, there are many factors that impact stress beyond the user experience.

5. Other techniques for capturing information about the participant’s behavior, such as a mouse that registers how tightly it is being gripped, are on the horizon and may become useful additions to the battery of tools available for use in usability testing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset