Chapter 3

Testing for Emotional Response

An increasing body of research suggests that establishing product desirability is largely a matter of first impression. A positive emotional response to design aesthetics makes users more likely to overlook or forgive poor usability or limited functionality and can help trigger positive behaviors. With close ties to both visual perception and attitudinal inquiry based on short exposure times, emotional response appears to be ideally suited for adaptation to the five-second method. A template for using the method for testing emotional response requires that stakeholders identify critical design values and their opposites, ask for descriptions of the design in their own words, then use rating scales to measure their perception of the identified values.

Keywords

Emotion; emotional design; emotional response; desirability; first impression

Oscar Wilde. Will Rogers. A popular brand of dandruff shampoo.

Each has been referenced as the origin of one of our most closely held modern anxieties: “You never get a second chance to make a first impression.” While the true source of its origin may not ultimately matter, the concept itself has taken a firm hold in the world of product design—and, consequently, in the central concerns of marketing, sales, executive leadership, and any other stakeholder interested in a product’s bottom line. An increased emphasis on creating a favorable first impression means an increased pressure on UX and visual designers to define and measure emotional response.

In his book Designing for Emotion, Aaron Walter (2011) describes emotional design as a logical extension of Maslow’s famous hierarchy of human needs, which hypothesizes that satisfying human needs is a sequential process, starting with the lower levels (physiological and safety requirements) and moving on to higher-level needs, such as appreciation of aesthetics. Extended to the use of products (online or otherwise), the central idea is that a product must establish a sense of functionality, reliability, and usability (e.g., addressing the lower-level needs) before a user can recognize its perceptual aspects as contributing to its overall desirability.

However, an increasing number of studies suggest that establishing desirability is largely a matter of first impression and will not necessarily wait for the satisfaction of the lower-level needs. The title of a frequently cited Canadian study (Lindgaard et al., 2006) sets an extremely narrow margin for error: “You have 50 milliseconds to make a first impression.” A 2011 eye-tracking study (Sheng et al., 2011) suggests a slightly more generous—but no less onerous—180 ms. Such evidence points to first impressions—favorable or otherwise—being formed instantaneously, almost entirely via the perceptual senses, and suggests that the risks of ignoring the criticality of first impressions are huge.

Numbers of milliseconds aside, there are two reasons why researchers should care deeply about creating favorable first impressions through design. First, they impact a product or application’s perceived (actual) ability to meet the lower-level needs of the user and thus help create immediate connections with users on a personal level. Second, an inability to elicit critical positive emotions (joy, serenity, interest, hope, amusement, inspiration), which forge positive behaviors, means a greater likelihood of eliciting negative emotions (dislike, agitation, indifference, boredom, discouragement, animosity) that forge negative behaviors. If users have a positive first impression of the design aesthetics, they are more likely to overlook or forgive poor usability or limited functionality (Kurosu and Kashimura, 1995), and can go so far as motivating increased registrations, completed transactions, etc. With a negative impression, users are more likely to find fault with the interactions of a site or product, even if its overall usability is good and the product offers real value.

There is arguably a fine line between emotional response and first impression, depending on how one chooses to define them. They can be seen as interchangeable concepts, each referring to what one feels upon immediate exposure to a product. Alternatively, they can be differentiated by requiring distinct degrees of meaningful interaction with a product. Whatever the definitions, visual perceptions of a design, and the initial reactions they elicit, undoubtedly influence the connection a user will make with a product—which means that five-second tests (when used appropriately) can play a role in determining whether that connection is positive or negative.

3.1 Common Approaches in Five-Second Tests

Most of the tests analyzed for this book included some degree of emotional response inquiry, usually in the form of one or more questions included within a mixed test format. Typically, these questions made use of the word “feel” in an attempt to tap into a participant’s sense of awareness regarding a design:

• “In one word, how does the design make you feel?”

• “What is the general feeling you get out of the page?”

• “What is your overall feeling when you first see the site?”

• “What does the site’s design make you feel about the company?”

Another common strategy seen in the sample asked respondents to describe first reactions in their own words:

• “What are one or two words you would use to describe the look and feel of the site?”

• “How would you describe the personality of this site?”

• “What was your first impression(s) of this application?”

• “What is the first thing that comes to mind when you viewed this page?”

Less frequently, questions were worded as direct solicitations of whether a design triggers an innately positive or negative reaction:

• “Does the site design appeal to you?”

• “Is there anything particularly gratifying about the design?”

• “What was most off-putting about the design of this web site?”

• “Did something about the design bother you?”

The problem with trying to gauge emotional response this way correlates directly with the problems with the mixed, or unfocused, format—i.e., when trying to measure several things in a single test, there is a greater risk for getting suboptimal data, and greater attention must be paid to how a test is constructed. A more effective approach would be to adhere strictly to an attitudinal test format, as outlined in Section 2.2. Very few tests analyzed for this book took this approach, but here’s an example of one that did:

Instructions: “Please view the site to rate its visual style and appearance (Figure 3.1).”

Q1. “Give a rating from 0 to 5: 0=‘ugly,’ 5=‘beautiful’)”

Q2. “Give a rating from 0 to 5: 0=‘shady,’ 5=‘trustworthy’)”

Q3. “Give a rating from 0 to 5: 0=‘dated,’ 5=‘modern’)”

Q4. “Rate your comfort with sending money on this site 0–5: 0=‘no way,’ 5=‘comfortable’)”

Q5. “Imagine you were a customer of this site and it worked perfectly. Assuming your friends needed this service, based on style alone, how likely are you to recommend the service 0–5: 0=‘no chance,’ 5=‘very likely’)?”

image
Figure 3.1 Design image for test consisting solely of emotional response questions.

On the whole, this represents a fairly well constructed test:

• It correctly aligns with, and adheres to, the attitudinal format.

• The instructions are effective, in that they directly address what the participant is expected to do—render an opinion on items of perception.

• The test image has not been optimized to eliminate scrolling completely; however, a quick look at the scrollbars indicates that only a very small percentage of the total image is not visible, so there is enough to judge the design in its totality.

• Q1 and Q3 each ask for the participant’s opinion concerning the page’s aesthetic appeal. They are consistent with the expectation set in the instructions and make effective use of a rating scale featuring opposing design values.

• Q2 and Q4 are constructed similarly to the others, but rather than focusing on different aspects of aesthetic appeal, each speaks to the perception of trustworthiness (Q2 does so directly, while Q4 does so by inference). Trustworthiness is certainly related to emotional response but requires a more nuanced approach to test with any effectiveness. (Chapter 4 will deal with testing for trust and credibility in more detail.)

• Only Q5 is truly problematic, for multiple reasons (use of an unrealistic context, asking to predict future behavior, long and complex wording of the question), and should not be included.

3.2 Iterating a Viable Five-Second Test Approach

With such close ties to both visual perception and attitudinal inquiry based on short exposure times, emotional response appears to be ideally suited for adaptation to the five-second test method. The issue, of course, is how best to execute it. As mentioned previously, a mixed test format risks suboptimal results. A memory dump test might provide some sense of emotional response, but simply asking “what did you remember?” is more likely to result in target identification data.

The attitudinal format is the obvious choice for this type of questioning, but the standard line of questioning can be further enhanced by borrowing from an established method for testing for emotional response. In a series of presentations on measuring emotional response, Hawley (2010) describes a process of first determining brand attributes and their opposites, then using product reaction cards (Benedek and Miner, 2002) to see which positive and negative words test participants use to describe a site or design. Using a survey tool, participants were first shown a page design option, then were asked to describe the design by selecting five adjectives from a list of 60. (They were also given an opportunity to explain why they made the choices they did.) By tabulating the submitted positive and negative attributes per design, researchers can measure how closely a design stimulus achieves the desired reaction.

Obviously, the use of product reaction cards does not fit within the confines of an online five-second test. However, the idea of using descriptor words to confirm or contradict attributes that the stakeholders seek to have aligned with their design can be accommodated in a single test. After some iteration, a modification of Hawley’s approach was developed to provide a template for obtaining an effective mix of qualitative and quantitative emotional response data:

• During test preparation, stakeholders identify two to four design values that they assume or desire the tested design to represent, as well as their opposites. For example, a law firm wanting to present an image of being a reliable and tenacious advocate might use the following value pairs:

• stable/unreliable

• competent/inept

• determined/hesitant

• In the first phase of a test, participants are asked to provide two descriptor words that they believe best describe the design. Doing so allows them to relate their own personal connection to the design in their own words. From an analysis standpoint, it allows the researcher to both (a) divide the descriptors into categories of positive/complimentary, negative/disparaging, or neutral and (b) get direct confirmation or contradiction of any of the established design values, or their opposites.

• In the second phase, participants are prompted to provide a rating for each preidentified design value on a scale, with the desired value on one end and its opposite on the other. This approach puts the participants’ focus on each value for consideration individually and allows them to “plot” their choice easily using a number value.

3.3 Testing the Template

A pilot test of this approach was set up using the home page of a financial services firm. In a purely fictional research scenario, the stakeholders hope that the following design values are confirmed:

• Professional: the design should represent a company that values and insists on competence and skill in everything they do.

• Clear: the design should present information free from clutter and with relative transparency.

• Stimulating: the design should motivate the viewer enough to initiate further investigation of the site and its information.

• Reassuring: the design should elicit a sense of approachability and confidence in the firm as a provider of quality services.

The pilot test was constructed as follows:

Instructions: “You are in the market for a personal financial planner when you come across this site. After viewing a page for 5 s, you’ll be asked for your reaction to the design (Figure 3.2).”

Q1. “What two words would you use to describe the appearance of this site?”

Q2. “Provide a rating for this design, from 1 to 5: 1=Professional, 5=Amateurish”

Q3. “Provide a rating for this design, from 1 to 5: 1=Confusing, 5=Clear”

Q4. “Provide a rating for this design, from 1 to 5: 1=Stimulating, 5=Dull”

Q5. “Provide a rating for this design, from 1 to 5: 1=Intimidating, 5=Reassuring”

image
Figure 3.2 Emotional response test image, for financial services web site.

Note that the numerical assignments of the design values and their opposites alternate within the scales. In Q2 and Q4, the desired values are assigned “1,” with the opposites assigned “5”; in Q3 and Q5, the assignments are reversed. This approach is not necessary but can help lessen the likelihood of habituation or repetition of answers (see Section 2.7).

Upon completion of the test, the nonprompted descriptors were visualized in a word cloud (Figure 3.3), using a free tool available at http://www.wordle.net/. In terms of alignment with the stated design values, the data show instances of both corroborative and contradictory descriptors:

• Professional: 7 of 21 respondents used the word “professional” to describe the design.

• Clear: Responses were largely devoid of any measurement either way on the clarity value, with responses indicating that the design is “clean” and “cluttered” canceling each other out.

• Stimulating: There is some problem with the stated design value of “stimulating” via indications that the design is “bland” or “vanilla.”

• Reassuring: This value was noted in only two responses using the words “trustworthy” and “legitimate”; no responses indicated any sentiment to the contrary.

image
Figure 3.3 Word cloud representation of emotional response descriptors, n=21.

Overall, 11 of 32 nonprompted descriptors were categorized as positive, with another 11 categorized as neutral, and the remainder as negative descriptors. With only a quarter of the unprompted responses being negative, this set of data may be interpreted as indicative of an overall positive-to-neutral emotional response to the design.

However, the intent of Q1 (“What two words would you use to describe the appearance of this site?”) was to compile a list of adjectives or descriptors that confirm the positive design values (or indicate design problems by confirming the opposites). As discussed in Section 2.8 regarding the writing of test questions, the wording of Q1 (“What two words would you use to describe the appearance of this site?”) is not specific enough to keep some respondents from deviating from the desired data format or providing answers that are not particularly useful:

• 9 of 21 responses consisted of two descriptors, as expected

• 5 of 21 responses consisted of a single descriptor

• 7 of 21 responses consisted of a phrase of two or more words (“stock photos,” “old school,” “too cluttered,” “nothing comes to mind”)

The results of the scale ratings are visualized in a series of graphs (Figure 3.4), in which the shaded areas indicate the user response rates. These results indicate that respondents generally affirmed the design values of “professional” and “clear,” while raising some question as to whether the design is “stimulating” or “reassuring.”

image
Figure 3.4 Scale results for design value ratings, n=21.

To further test the template, this test was replicated using a content page from the web site of a family health club. In an effort to improve the data for the nonprompted responses, Q1 was rewritten to ask: “What two adjectives would you use to describe the appearance of this site?” The goal in making this change was to cut down on single descriptor responses and multiword phrases, and produce more responses containing two descriptors. The design values were also changed to align better with the atmosphere a family health club might wish to promote:

• professional

• easygoing

• modest

• reassuring

Instructions: “You are researching health clubs when you come across this site. Be prepared to state how the design makes you feel about the club (Figure 3.5).”

Q1. “What two adjectives would you use to describe the general appearance of this site?”

Q2. “Provide a rating for this design, from 1 to 5: 1=Professional, 5=Amateurish”

Q3. “Provide a rating for this design, from 1 to 5: 1=Easygoing, 5=Assertive”

Q4. “Provide a rating for this design, from 1 to 5: 1=Pretentious, 5=Modest”

Q5. “Provide a rating for this design, from 1 to 5: 1=Intimidating, 5=Reassuring”

image
Figure 3.5 Emotional response test image, for health club web site.

As predicted, the change of wording used in Q1 resulted in an improvement in the response data. While there was not nearly as much confirmation of the exact design values and opposites as in the first experiment, instances of responses containing two descriptor words jumped from 9 of 21 responses to 19 of 20 responses. (This result does not mean to imply that a change in wording strategy will guarantee better results but does support the idea that wording specificity can increase the chances of getting the data you want.) 18 of 38 descriptor words submitted in Q1 were negative, indicating an overall neutral-to-negative response to the design, given the desired design values. The rating scale results for Q2–Q5 (Figure 3.6) showed that respondents generally affirmed the design value of “professional” but also gave support to the opposing values of “assertive,” “pretentious,” and “intimidating.”

image
Figure 3.6 Scale results for design value ratings, n=21.

With these sets of data in hand, the researcher is able to not only quantify the success of a design against company and/or stakeholder values but also get a richer idea of how the design is perceived (and point the way to possible improvements) by having respondents describe what they’re seeing in their own descriptive words. At minimum, this template for using the five-second test for emotional response provides a much more robust means of evaluating first impressions than simply including one or more attitudinal questions in a mixed format test. However, to reiterate a point made at the outset of the book, using the five-second test method for emotional response testing should not be considered as the be-all and end-all. While it can help settle design disputes within a product team and get you pointed in the right direction, results should be used as the starting point for determining whether longer exposures to the design confirm positive first impressions hold and/or alleviate negative first impressions.

Recommended test template for emotional response testing

1. During test planning, identify up to four values that the company or designer wishes to convey in the page or site design.

2. Test instructions should follow the attitudinal approach for providing opinion-based responses. (Context statements are acceptable, as long as they are not unrealistic.)

3. In the first test question, ask the participant to provide adjectives (no more than three) that best describe the design.

4. All remaining questions ask the participant to assign a rating to a targeted value and its opposite: “Provide a rating for this design, from 1 to 5: 1=corporate value, 5=corporate value opposite.” To guard against habituation, consider alternating the positions of the values and opposites on the rating scales. To compensate for potential ordering effects, consider randomizing the order in which the word pairs are presented.

5. Once the test has concluded, categorize the descriptors submitted for the first question as positive, negative, or neutral; make note of how many provide a direct or indirect match against the design values or opposites.

6. Create graphs that visualize the scale ratings given to the values and opposites in the remaining questions and compare the two data sets to determine the overall emotional response to the design.

References

1. Benedek, J., Miner, T., 2002. Measuring Desirability: New Methods for Evaluating Desirability in a Usability Lab Setting, paper presented at UPA 2002 Conference, 8–12 July, Orlando, FL. Available from: <http://www.pagepipe.com/pdf/microsoft-desirability.pdf>.

2. Hawley, M., 2010. Rapid Desirability Testing: A Case Study: UXmatters’ Uxmatters.com. [online]. Available from: <http://www.uxmatters.com/mt/archives/2010/02/rapid-desirability-testing-a-case-study.php>.

3. Kurosu, M. and Kashimura, K. 1995. “Apparent usability vs. inherent usability: experimental analysis on the determinants of the apparent usability”, CHI '95 Conference Companion on Human Factors in Computing Systems, Denver, CO, 7-11 May. New York: ACM, pp. 292–293.

4. Lindgaard G, Fernandes G, Dudek C, Brown J. Attention web designers: you have 50 milliseconds to make a good first impression!. Behav Inf Technol. 2006;25(2):115–126.

5. Sheng, H., Lockwood, N. S., Dahal, S., 2011. Eyes Don’t Lie: Understanding Users’ First Impressions on Websites Using Eye Tracking, paper presented at HCI International’13, 21–26 July 2013, Las Vegas, NV, pp. 635–641.

6. Walter A. Designing for Emotion New York, NY: A Book Apart/Jeffrey Zeldman; 2011.

Recommended Reading

1. Anderson SP. Seductive Interaction Design Berkeley, CA: New Riders; 2011.

2. Boehm, N., 2010. Organized Approach to Emotional Response Testing|UX Magazine [online]. Available from: <https://uxmag.com/articles/organized-approach-to-emotional-response-testing> (accessed 20.12.13).

3. Inchauste, F., 2011. UX is 90% Desirability. GetFinch [blog]. 10 March 2011, Available from: <http://www.getfinch.com/2011/03/ux-is-mostly-desirability/> (accessed 22.12.13).

4. Norman DA. Emotional Design New York, NY: Basic Books; 2004.

5. Van Gorp T, Adams E. Design for Emotion Boston, MA: Elsevier/Morgan Kaufmann; 2012.

6. Van Schaik P, Ling J. The role of context in perceptions of the aesthetics of web pages over time. Int J Hum Comput Stud. 2009;67(1):79–89.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset