Chapter 8. Conclusion

WE HAVE COVERED A lot of ground in this book. We hope you have enjoyed reading it as much as we have enjoyed writing it.

Throughout the book we have been arguing that designers will gain value from, and will also add immense value to, the design of A/B tests. Hopefully, you now understand the basics of A/B testing, how A/B testing enables you to compare one or more versions of a designed service experience, and how A/B testing can help you understand more about your users’ needs. The benefits of conducting A/B experiments are being able to get access to a large number of users at one time in their real-world context, and the ability to compare many different versions of your design(s) simultaneously. A/B tests can improve product performance for different cohorts of users, and can help you figure out how to attract and retain new users. As the purpose of A/B testing is to evaluate which experiences perform better for users and for your business according to previously defined, relevant metrics (such as which links or buttons are clicked or how long someone stays on a site), we hope that you are now also intrigued by understanding what metrics work best for your business context.

Beyond specific features, products, or business models, though, A/B tests can help you develop an understanding of human behavior more generally; they can help you to hone your skills regarding what works best for your users in your interfaces and in your information and service design. This is the power of A/B testing, and as we pointed out at the beginning of this book with our analog and digital photography example, it is only since the advent of the internet that we have been able to conduct this kind of research.

But, as the old phrase goes, with great power comes great responsibility. In this chapter, we will review some critical concepts from the book, and introduce some reflections on the ethics of experimentation.

A key point we have emphasized is that experimentation and data analysis should always be in service of helping you understand your users more deeply and helping you to become more aware of your users’ needs and the motivations for their actions; it should never dehumanize people, and should never “reduce them to numbers.”

To help you keep focused on a holistic view of your users, we stressed that one of the most important aspects of experimental work is triangulating with other sources and types of data—that is, making sure you draw on results from surveys, interviews, lab experiments, and other methodologies. Experimentation cannot answer all your questions. You need to always be contextualizing the data you collect from individual experiments into a program of design evaluation and user understanding, based on collation of results from other experiments. You should also take inspiration for interpretation from, and cross-check, your results, with other data you have in the company. This kind of back-and-forth will also provide inspiration for more experiments.

Picking up from the idea of triangulation, as we pointed out in the Preface and Chapter 1, a company benefits greatly when user experience researchers, designers, data scientists, market analysts, frontend developers, systems engineers, and business strategists are in dialogue, when they come together to work on the design of data collection and the sharing of insights. For designers who are interested in this bigger picture, their work remit can extend beyond the design of specific features of a technology or service. We emphasized the value of these conversations and collaborations by pointing out that alliances at the grass roots level of a company that can truly facilitate a more holistic and user-centric focus and make your company more successful. We talked about making “data friends” to further develop your data but also your business literacy; this is a means of sharing your insights and educating others about the user-centric aspects of the product(s) you are designing. Through data sharing, you and your work can have a broader impact on the company values as well as a specific feature’s success or failure in moving a single business metric. Making such data friends can transform a company from the bottom up, giving voice to your users. Experimental data can be a boundary-crossing asset within a business.

To ground these grand beliefs and claims and get you on your way to engaging more fully with experimentation and experimental data, in the early chapters of the book we introduced you to central concepts that underpin the basics of all experimental design, not just A/B testing.

In Chapter 2 and Chapter 3 we reviewed some basic concepts from experimental science, including statistical ways of establishing reliable and replicable causality, hypothesis generation, sample and test group determination (ways of dividing up all your users into different groups), significance testing, and so on. We laid the foundations for a framework we ourselves use to turn questions into hypotheses, to prioritize hypotheses, and to design experiments to test those hypotheses. We introduced some methods and shared some examples to show you how to think about the results you get from your experiments.

In Chapter 4, Chapter 5, and Chapter 6 we got even more concrete, covering the basics of designing and running A/B tests. We illustrated all these concepts using examples: first our summer camp metaphor, and then specific examples from our work and from the work of our associates and colleagues.

Finally, in Chapter 7 we discussed the importance of summarizing, sharing, and collating results and sharing those across your company. If you are in a small company, go and talk to people in person. You can also set up regular meetings, create a regular newsletter, and create physical posters to adorn your workspace with results if you share physical office space. Of course, the larger your company, the more planned-out and formal the processes, procedures, and platforms for such sharing may need to be. Many large companies invest a great deal of resources into insights-sharing platforms and portals, and host regularly calendared meetings to facilitate cross-functional as well as cross-team and within-team communication. Communication and collaboration are key to a successful experimentation strategy that foregrounds the value of good, user-centered design.

As we close out this book, we want to remind you that our philosophy about data isn’t that quantitative data experimentally collected has all the answers all the time; we believe the process of data gathering, analysis, and interpretation needs to be conducted judiciously. In the next section we will remind you of some of the pitfalls people have fallen into with A/B testing in the past and also to introduce you to some high-level considerations in the area of research ethics. In raising research and data ethics as an important topic, we don’t want to scare you; we just want to point to some of the conversations around data ethics, to note that the areas of research and data ethics are constantly shifting as we learn more about the power of what can be done with data at scale, and to invite you to the conversation if you are interested.

Ethical Considerations

As we have discussed examples in this book, we have alerted you to be cautious about your experimental design(s), about how many and what kinds of tests to run, and about how you interpret your results. We pointed out that experimental testing and data analysis cannot answer all research and business questions. We also pointed to occasions where running too many experiments can cause problems—“over testing.” We noted that users can become confused if interfaces and interactions shift constantly, and if features that they value are moved or removed. And, if you run too many experiments concurrently, you risk having users exposed to multiple variables at the same time, creating experimental confounds such that you will not be able to tell which one is having an impact on behavior, ultimately muddying your results.

Beyond these cautions though, there is one potential risk we haven’t yet discussed about experimentation at scale. Poorly constructed research with human participants can have deleterious effects on your users’ wellbeing.

The area of research and data ethics focuses on understanding and remediating these concerns. While we are strong advocates for designing through data collection and experimentation, we also believe that ethical issues around data collection and experimentation should always be foregrounded when designing experiments. We believe that as advocates for a human-centered approach to crafting great experiences, designers are well equipped to play an important role in driving the ethics of experimentation. To whet your appetite for these considerations, we will briefly introduce some high-level thoughts on the topic of ethics in the next section.

Ethics in Online Experimentation

Ethics in philosophy addresses the moral principles that govern a person’s behavior. Research Ethics refers to the correct rules of conduct necessary for carrying out research, including experimentation at scale, as with A/B tests. Of course, moral issues rarely involve unambiguous right or wrong answers; they involve judgment.

The area of research ethics is always in flux, responding to unexpected ramifications and unforeseen consequences of research engagements. Indeed, there is a long history of concern about experimentation with human participants in modern science, dating back at least as far as the 1930s. Examples include: the Tuskegee Syphilis Study, which ran from 1932 until 1972, where researchers sponsored by the US Department of Health withheld treatment from 400 African American men, even when penicillin became widely available; the CIA’s studies throughout the 1950s and 1960s on mind control research that included administering LSD to research participants who were not informed as to what they were being given; and Stanley Milgram’s now very famous 1960s “electric shock” experiments that showed many people are willing to do morally questionable things when ordered to do so by an authority figure. In 1974 Congress passed the National Research Act and in 1979, the National Commission released The Belmont Report, a report that lays out guidelines for ethical research on human subjects.

As authors of this book, we personally advocate that anyone whose work touches people’s lives should take time to be aware of legal and ethical concerns and engage in company discussions around the ethics of data collection and use. As much as we have focused in this book on designing rigorous experiments to understand the impact of your work on human behavior, it is equally important to consider whether your users are inconvenienced or exposed to potential harm as a result of these endeavors. As you think through potential issues, you will need to make a judgment call as to whether the research is justified or not.

We aren’t suggesting you become an expert in research ethics, unless you wish to investigate the area further for your own interest and satisfaction. Yet the internet industry is gaining better insight as to potential hazards to users of poor experimentation, poor data management practices, and flawed data analysis. As an industry, we are becoming more aware of the importance of exploring and addressing ethical questions up-front. Our aim here is neither to scare you away from this type of research, nor to undermine its many benefits. Rather, we hope to empower you with a nuanced view of both the benefits and potential detriments of A/B testing so that you can be an active participant in the discourse around experimenting at scale with human participants, driving discussions about how to design research that is fair and ethical for participants.

Design Experimentation Versus Social Experimentation

An important thing to keep in mind when you are designing experiments is to think carefully about how your experimental tests may affect what information people have available to them and how your changes may affect people’s behavior.

For example, there are important differences between what have been called “deception experiments” and the ones we have been advocating for. In our discussion of A/B tests, we have been advocating for the testing of interface and interaction specific design characteristics such as flows, layouts, informational features, call-to-action buttons, or explanatory text. We have advocated for and have shared illustrative examples where variants of a design are tested alongside one another in different test conditions in order to compare their effects. We have also introduced a few algorithmic examples with the intention of surfacing the best, most relevant search results in order to help users most successfully find what they are looking for.

In contrast to “surface-level testing,” “power of suggestion” or “deception” examples involve what Raquel Benbunan-Fich calls Code/Deception or C/D experimentation.1 In this kind of experimentation, “the programming code of an algorithm is altered to induce deception”; these are examples of “behavioral experimentation” and the experimental engineering of social relationships and relationship structures. These experiments carry a significantly higher risk of having an impact on people’s lives beyond the interface, beyond the service encounter. To be clear on the distinction, C/D experimentation differs from the kind of feature and flow focused A/B testing we have been discussing in this book in that in the former case, socially relevant information is being deliberately manipulated, and what is presented is misleading or inaccurate. Further, in these instances, the social functioning of individuals within their existing or emerging social relationships are affected. By contrast, the manipulation of features and flows in the interface focuses on the performance of the site itself, through evaluation of call-to-action button design, icon and image salience, the effectiveness of different color schemes, the comprehensibility of descriptive text, and so on.

In our experience though, the distinction between surface-level feature testing and social manipulation may not always be crystal clear. So, if a study involves any potential effects that reach far beyond the service interaction you are designing for, caution must be taken.

Two “Power of Suggestion” Experiments

In 2012, the social networking site Facebook conducted a mood manipulation study2 in collaboration with some university academics. For one week in January 2012, data scientists altered what 700,000 Facebook users saw in their newsfeed when they logged in. The emotional valence of the newsfeed was skewed such that some people were shown content with a lot of happy, upbeat words, while others were shown content that contained more sad, downbeat words. Analysis of user behavior after a week of being exposed to these differentially loaded newsfeeds revealed that users who were in the emotional manipulation conditions were affected: there was a kind of emotional “contagion,” such that they were more likely to post content in accord with the emotional valence they’d experienced–happy to happy and sad to sad.

Shortly after the Facebook study became a news sensation, OKCupid, an online dating site, revealed it had conducted experiments on users that had similarly manipulated the user experience.3 For background, OKCupid has very compelling user engagement and onboarding methods in which users fill out extensive surveys about their hopes, fears, and desires, their likes and dislikes, and their values and concerns around possible partners, around dates, and around dating. The results of these surveys are used to estimate degrees of compatibility between users of the site, fueling the recommendations that are made. In the experimental test(s), OKCupid adjusted the compatibility scores on a small test group without notifying participants. Users received altered compatibility scores that suggested they were more or less compatible with others on the site. Some changes were significant—for example, driving compatibility ratings up from 30% to 90% compatible. After the experiment was over, the affected users who were the unwitting participants in the study, were emailed with details of the accurate compatibility scores. While people may have experienced awful dates or had their hopes raised based on false information, OKCupid was matter-of-fact in their response, publicly stating that they did not know of the relationship outcomes, but that there was a statistically significant increase in communications between (false) positive matches–presumably because people were trying to figure out why the matches were positive.

The Facebook and OKCupid “power of suggestion” experiments both involve instances of deception, whether about compatibility matches, or the emotional valence of content. This type of deception is a major topic of focus for ethics researchers.

Toward Ethical A/B Testing

These experiments and the surrounding controversy underscores that as designers, researchers and product teams who engage in scientific inquiry through experimentation, we should always be careful to ask questions and to act in the best interests of our users, and to be ethically, scientifically, and legally critical in our practice.

One important consequence designers becoming involved in the design of experiments is that it reduces the distance between the disciplines of design and research; this means that the area of research ethics will become part of your design practice. Many well-trained user experience researchers and data scientists are required to take ethics classes so they can develop a sensibility for user safety and protection. In our promise to equip you to participate in the conversations around user protection in experimentation, we wanted to provide you with a brief introduction to the best practices established by more mature fields (including psychology and medicine) for the protection of research participants. As online internet experimentation grows as a discipline, we would encourage you to take some of these best practices and look for opportunities to apply them to your own A/B testing culture and practice.

Key Concepts

One essential concept in research with human participants is informed consent. This is the process in which a study participant consents to participate in a research project after being informed of its procedures, as well as the potential risks and benefits. Of course, people cannot be truly informed if they don’t understand or can’t decipher the terms and conditions of the services they use. Most online services roll consent to be studied into their “Terms of Service.” However it is well known that most people don’t read the terms of service, and even if they do, they don’t understand what the terms really mean. The creation of terms of service that are easier to understand is itself an interesting design opportunity. In practice, however, it is not always possible to gain informed consent. This concern may be addressed by asking a smaller but representative group of people how they would feel about taking part in a particular study. If they find the study permissible, it can usually be assumed that the real participants will also find it acceptable. This is known as presumptive consent.

In fields such as medicine and psychology, most researchers offer debriefs after studies. The goal of a debrief is to share the overall purpose of the experiment and what will happen with the results to participants. Often participants are also given the opportunity to give feedback on their experience of the experiment. Finding opportunities to scale the practice of debriefs to online, large-scale experiments is another exciting way to mature the practice of A/B testing. Consider, for instance, connecting with your users through blogs and “trusted tester” programs; you’d be surprised how many people are eager to be part of co-designing the services they use by being active participants in studies.

Central to all research is confidentiality. This means all participant data that is personally identifying should remain confidential. It should not be shared with anyone not directly involved in the experimental logistics, and all experimental data should be cleaned—or “scrubbed”—of anything that would allow a participant to be personally identified. All user data, especially personally identifiable information (PII) needs to be carefully managed following legal guidelines. For A/B testing, this means finding ways to anonymize user data and avoid designing experiments or logging information that requires storing, measuring, or tracking personally identifiable information. Also note, if your company operates in multiple countries, you should be aware that legal restrictions and cultural norms around data collection vary significantly; check out the restrictions and regulations about the collection, management, and storage of user data, which can include data from large scale experiments.

The ability to withdraw from an experiment is also usually offered. All participants are given the opportunity to withdraw from an experiment at any time, even if it is in the middle of the experiment. How this best practice would scale to an online A/B test remains an open question—how can we effectively allow users to know they are being observed without substantially changing their behavior is, again, a design challenge.

Finally, there is the issue of deception, which we mentioned earlier. Here, participants are intentionally misled in order to test their behavior under certain conditions. Deception can include withholding information, deliberately distorting information, misrepresentation of the value of information, and so on. Deception is a tricky ethical issue, especially where it may have an impact on the user’s life beyond the immediate interactions with your service. Understanding and avoiding deception will have increasingly far-reaching effects, as many online businesses begin sitting more at the intersection of different groups of people (such as hosts and guests, buyers and sellers, or creators and consumers of art, music, culture, etc.), whose relationships and livelihood depend on their online experiences.

Not all of these have been effectively translated into the context of A/B testing—for example, the issue of debriefing and the ability to withdraw—but we thought we would give you some ideas as to how issues of the ethical protection of participants have unfolded in the social sciences, medicine, and in some cases user research. As A/B testing matures as a discipline, scaling many of these practices will become increasingly pressing. We hope you are excited and empowered to apply your expertise in human-centered design to participate in these discussions going forward.

Asking Questions, Thinking Ethically

As an advocate for users, providing them with the best possible experience as they interact with your design—your designed device, application, or service—can feel a bit lofty. We’ve posed and introduced many broad and challenging questions which will take the collaboration of many folks in the industry to address. However, the implications of these ethical considerations can also have immediate and close-to-home impact on how your business runs online experiments.

Taking the time to ask yourself and your colleagues questions about the ethics of your studies can make ethics a piece of your A/B testing routine. When designing an experiment, we encourage you to think through whether the research has any potential for causing your users any kind of harm. In general, the bottom line is that test participants should not be exposed to any risk greater than those encountered in their normal daily life.

We want to get you into thinking this way, thinking ethically, right from the get-go. Here are a few questions to get you started:

  • Are we acting in our users’ best interest?

  • Could this research cause any distress to the research participants?

  • Could they be exposed to any physical harm by taking part in the research? Will the experimental tests we are planning on running cause psychological distress or harm to those who are exposed to our treatments? Does the experimental test have risk of impeding anyone’s ability to form meaningful relationships or make a living?

  • Could they be shamed, socially embarrassed, offended, or will the test condition frighten them?

  • Could the experiment possibly be implemented as, experienced as, or construed as an algorithmic, deceptive, behavioral experiment?

  • Are you releasing the experiment to vulnerable populations such as the elderly, disabled, or children? If so, you may need to make some special accommodations.

  • Carefully consider what, if any, spill-overs there are of your experiment to other online actions and interactions, and to offline behaviors. Ask yourself, could your study have a negative effect on a person’s everyday life circumstances?

We also note that, as a designer–researcher, often you cannot accurately predict the risks of taking part in an A/B study. Over time, however, you will develop a keener sensibility around potential hazards so will be able to predict and preempt them in your experimental design phase, in the same way that your intuition for human behavior will become honed through repeated exposure to A/B testing and experimental outcomes.

We discussed triangulation in the context of making sense of different sources of data together to further your intuition about your users and your product. More broadly, considering different sources of feedback is another way to look for potential ethical concerns. One good source of insight as you hone your sensibilities around experimental design ethics is the feedback that may come in to the company through other channels in tandem with the launch of your study. Did your customer service helpline or community managers get an unexpected influx of complaints? As you are learning how to design successful experiments it may be worth making friends with some customer service experts in your company. Talking with them and reviewing incoming comments from users who may be in an experimental treatment group is another form of data triangulation. In this instance, though, the data triangulation is for the protection of your users as much as for the interpretation of your results.

As mentioned, another area in which you can get involved with your design skills is in the area of the terms of service for the product experiences you are designing. This dovetails with the area of informed consent. Terms of service for many online services legally cover companies for carrying out user testing. Check your company’s terms of service, and review how your users experience their “permissioning” of different kinds of user data collection.

Another area you can get involved in is participation on review boards and committees. Most governmental and educational institutions that engage with human participant research have Institutional Review Boards, commonly shortened to “IRBs.” Committees review proposals to assess if the potential benefits of the research are justifiable in the light of possible risk of harm to participants. These committees may request researchers make changes to the study’s design or procedure, or in extreme cases deny approval of the study altogether. Many large organizations have review boards consisting of experts from research and legal departments who can review, raise questions about potential risks, and suggest alternatives. In smaller companies, we would encourage you to make reviewing your work with your peers and coworkers—for both research methodology as well as ethical considerations—a necessary component of your workflow.

We hope that you have found this discussion about ethics engaging and empowering. As stated, our goal is certainly not to make you nervous about the risks of experimentation. When executed ethically, it is a powerful tool to deliver great experiences to your users, increase the health of your business, and hone your intuitions as a designer. We believe that adding ethical considerations to your A/B testing practice will make you a better and more effective designer of research and data collection, and let you continue to push toward the goal of making all product development fundamentally human-centered.

Last Words

Throughout this book we have stressed that in the online world of digital services, applications, and devices, design is centrally focused on users, the consumers who use our products. We have shared our belief that successful businesses need to engage with their users, with the people who use their services—a business with no users is not a successful business. We wrote this book because we believe that designers should have a stronger role in user data gathering and user-focused product strategy, and that designers can have an impact on business metric evaluation. Industry-based design practice is focused on designing the best possible experience for users.

Why do we feel these beliefs needed to be explicitly stated?

In the Preface to the book we laid out several concerns we have encountered in our own practice as user-centered design advocates, concerns that led us to write this book. These concerns include worries that experimentation takes the creativity out of design practice. Or that experimentation denies the skill set and authority of the designer. Or that experimentation is too limited to truly offer insights into what is meaningful for users.

Our experience suggests that our perspective is somewhat different from that of many designers, many product managers, and many business strategists. Therefore, we wanted to share it more clearly.

We believe:

  • Design always advocates for users and is accountable to users, and that good design brings with it a responsibility toward reflecting and addressing user needs through well-designed products and experiences.

  • Design practice needs to be invested in representing users accurately and appropriately and that new methods are always needed to develop an understanding of users and user behaviors. Data is the new currency for business, and a new creative “mulch,” a fertile ground for design insights to be tested, inspired, developed, and extended. Experimentation and experimental data analysis can be an integral part of the design process. We believe that designers are, or should be, fundamentally interested in being disciplined about data and its collection, analysis, and use.

  • A design perspective is needed to ensure that optimal user experiences are appropriately represented in business goals, measures, and metrics. Designers, user experience researchers, and data scientists are the perfect people to call out when metrics that concern users and usage do not make sense, when there is a gap between company vision and mission, and how that lands with everyday users. Understanding the gaps, verifying hunches and informal observations with data, can help you have influence on business metrics and business strategy.

Throughout the book we have shared examples to demonstrate how gathering data about different design options can furnish evidence for deciding between different features but also for making more strategic decisions about the technology or service, and thus business directions. We have illustrated how experimental investigation through data collection has helped us decisions about local elements such as selecting between different features and also help with global decisions such as business strategy. We consider experimental investigations to be conversational engagements with users.

We also offered three ways of thinking about data: data driven, data informed, and data aware:

  • Data-driven design implies that the data drives all design decisions. As we have shown through the chapters in this book, sometimes collecting experimental data through A/B tests delivers critical design insight. Sometimes, collecting data can help you decide between different versions of a similar feature, at other times you can decide between whole user flows and or services options. This is not always the case, however. Being data driven in this way only works if you’ve done the prior work to establish the questions that need to be addressed as well as the overall goals.

  • Being data informed means that you may not be as targeted and directed in what you are trying to understand. Instead, what you’re trying to do is actually inform the way you think about a problem.

  • Finally, we introduced a term we use every day to be inclusive of all kinds of data gathering, from experimental A/B testing to interview studies—“data aware.” In a data-aware mindset, you are aware of the fact that there are many types of data to answer many questions. Our use of “aware” rather than “driven” or “informed” is intended to underscore the fact that not only is design a creative process but that designing ways in to collect data to understand the user experience is also fundamentally a creative process.

As you continue to develop your skills, think about these different levels of engagement of experimentation and data: how data can answer clearly stated questions, how data can inform the way you think about a problem, and how a rich philosophy around and fascination with different kinds of data can enrich your design practice.

As we conclude this book, we’d like to acknowledge that this is not the last word on the topic of A/B testing, and certainly not the last word on the ways in which data and design are intimately related and intertwined, or how they are excellent partners in the creation of services, of products, and of user experiences. We hope that this book has given you foundational knowledge for exploring further, both in practice but also in your reading and learning.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset