13

An evaluation framework

13.1 Introduction

13.2 DECIDE: A framework to guide evaluation

13.1 Introduction

Designing useful and attractive products requires skill and creativity. As products evolve from initial ideas through conceptual design and prototypes, iterative cycles of design and evaluation help to ensure that they meet users' needs. Deciding when and how to evaluate a product requires careful consideration and may be different for different kinds of products.

The case studies in the previous chapter illustrate some of the approaches used.

The design process starts with the designers working to develop a product that meets users' requirements, but, as you have seen, understanding requirements tends to happen by a process of negotiation between designers and users. As designers understand users' needs better, their designs reflect this understanding. Similarly, as users see and experience design ideas, they are able to give better feedback that enables the designers to improve their designs further. The process is cyclical, with evaluation facilitating understanding between designers and users.

Evaluation is driven by questions about how well the design or particular aspects of it satisfy users' needs and offer appropriate user experiences. Some of these questions provide high-level goals to guide the evaluation. For example, does this product excite users so that they will buy and use it? Others are much more specific. Can users find a particular menu item? Do they interpret a particular graphic as the designers intended and do they find it attractive? Practical constraints play a big role in shaping how evaluation is done: tight schedules, low budgets, or little access to users constrain what evaluators can do. There are ethical considerations too: medical records must be private and so are certain areas of people's homes.

Experienced designers get to know what works and what doesn't. As you have seen in Chapter 12, there is a broad repertoire of evaluation methods that can be tailored for specific circumstances. Knowing and having the confidence to adapt methods is essential. The wide variety of mobile and ubiquitous systems coming onto the market challenges conventional evaluation practices, which must be adapted to provide useful feedback. Therefore, when planning evaluations, evaluators must consider the nature of each product, the kinds of users that will use it, and the contexts of use, as well as logistical issues, such as the budget, the schedule, the skills and equipment required for the evaluation. Planning evaluation studies involves asking questions about the process and anticipating potential problems. Within interaction design there are many books and websites that list different techniques and guidelines for conducting an evaluation, but there is very little overarching guidance for how to plan an evaluation. To help you, we propose the DECIDE framework, which provides a structure for planning evaluation studies.

The main aims of this chapter are to:

  • Discuss the conceptual, practical, and ethical issues involved in evaluation.
  • Introduce and explain the DECIDE framework.

13.2 DECIDE: A Framework to Guide Evaluation

Well-planned evaluations are driven by goals which aim to seek answers to clear questions, which may be stated explicitly, upfront, as in usability testing, or may emerge as the evaluation progresses, as in ethnographic evaluation. The way questions are stated also varies depending on the stage in the design when the evaluation occurs. Questions help to determine the kind of evaluation approach that is adopted and the methods used. Practical issues, such as the amount of time available to carry out the evaluation, the availability of participants, and suitable equipment, also impact these decisions. Ethical issues must also be considered, particularly when working with users, and evaluators must have enough time and expertise to evaluate, analyze, interpret, and present the data that they collect. The DECIDE framework provides the following checklist to help you plan your evaluation studies and to remind you about the issues you need to think about. It has the following six items:

  1. Determine the goals.
  2. Explore the questions.
  3. Choose the evaluation approach and methods.
  4. Identify the practical issues.
  5. Decide how to deal with the ethical issues.
  6. Evaluate, analyze, interpret, and present the data.

A list has the tendency to suggest an order in which things should be done. However, when working with the DECIDE framework, it is common to think about and deal with items iteratively, moving backwards and forwards between them. Each item in the framework is related to the other items, so it would be unusual to work in any other way.

13.2.1 Determine the Goals

What are the high-level goals of the evaluation? Who wants it and why? An evaluation (i) to help clarify that user needs have been met in an early design sketch has different goals from an evaluation (ii) to select the best representation of a metaphor for a conceptual design, or an evaluation to finetune an interface, or (iii) to examine how mobile technology changes working practices, or (iv) to inform how the next version of a product should be changed, or (v) to explore the impact of ambient technology in a social space; or (vi) to investigate what makes collaborative computer games engaging.

Goals guide the evaluation by helping to determine its scope, so identifying what these goals are is the first step in planning an evaluation. For example, we can restate the first general goal statement mentioned above more clearly as:

Check that the sketch indicates that designers have understood the users' needs.

Activity 13.1

Rewrite each of the general goal descriptions above (i–vi) as a goal statement and suggest which case study or part of a case study from Chapter 12 fits the goal statement.

Comment

  1. Identify the best representation of the metaphor on which the design will be based, e.g. HutchWorld, record-keeping system for Indian auxiliary nurse midwives.
  2. Ensure that the interface is consistent, e.g. Olympic Messaging System, HutchWorld.
  3. Investigate the degree to which mobile technology influences working practices, e.g. record-keeping system for Indian auxiliary nurse midwives, perhaps also the Nokia cell phone study, though we are not told this explicitly in the case study.
  4. Identify how the interface of an existing product could be engineered to improve its usability for use by people from a different culture, e.g. the Nokia cell phone study.
  5. Investigate the impact of the technology on social interaction, e.g. the HelloWall.
  6. Determine the nature of collaborative computer games, e.g. the digital hockey game.

In turn, these goals influence the approach chosen to guide the study and the selection of evaluation methods.

13.2.2 Explore the Questions

In order to make goals operational, we must clearly articulate the questions to be answered by the evaluation study. For example, the goal of finding out why some customers prefer to purchase paper airline tickets over the counter rather than e-tickets can be broken down into a number of relevant questions for investigation. What are customers' attitudes to these e-tickets? Perhaps they don't trust the system and are not sure that they will actually get on the flight without a ticket in their hand. Do customers have adequate access to computers to make bookings? Are they concerned about security? Does the electronic system have a bad reputation? Is the user interface to the ticketing system so poor that they can't use it? Maybe some people can't managed to complete the transaction. Maybe some people like the social interaction with a ticketing agent?

Questions can be broken down into very specific subquestions to make the evaluation even more finegrained. For example, what does it mean to ask, “Is the user interface poor?” Is the system difficult to navigate? Is the terminology confusing because it is inconsistent? Is the response time too slow? Is the feedback confusing or maybe insufficient? Subquestions can, in turn, be further decomposed if even more specific issues need to be addressed.

Activity 13.2

Imagine you have been asked to evaluate the impact of the HelloWall on users' behavior. Based on what you know about the HelloWall from Chapter 12, write two or three questions that you could investigate.

Comment

You could ask a variety of questions. Here are some that we thought of:

  1. Do users notice the HelloWall?
  2. For those who do notice it, how do they react to it?
  3. Do they explore how it behaves when they are different distances from the wall?
  4. Do they seem to enjoy interacting with it?
  5. Do they tell others about it? If so, what do they say?

13.2.3 Choose the Approach and Methods

Having identified the goals and some questions that you want to investigate, the next step is to choose the evaluation approach and methods that you will use. As mentioned in Chapter 12, the evaluation approach influences the kinds of methods that are used. For example, when performing an analytical evaluation, methods that directly involve users will not be used. During usability testing, ethnography will not be used. Often different approaches are used in this way, depending on the questions to be answered and the resources available for performing the study. Practical and ethical issues (discussed next) have to be considered and trade-offs made. For example, the methods that seem most appropriate may be too expensive, or may take too long, or may require equipment or expertise that is not available, so compromises are needed.

As you saw in several of the case studies discussed in Chapter 12, combinations of approaches and methods are often used to obtain different perspectives. For example, the methods used in field studies tend to involve observation, interviews, or informal discussions with participants. Questionnaires may also be used and so might diary studies in which participants record aspects of their technology usage. Usability testing also involves multiple methods and, as we have already said, is often supplemented with field studies. Each type of data tells the story from a different point of view. Together these perspectives give a broad picture of how well the design meets the usability and user experience goals that were identified during requirements gathering.

Activity 13.3

Which approaches and methods could be used in an evaluation to answer the questions that we provided for Activity 13.2? These were:

  1. Do users notice the HelloWall?
  2. For those who do notice it, how do they react to it?
  3. Do they explore how it behaves when they are different distances from the wall?
  4. Do they seem to enjoy interacting with it?
  5. Do they tell others about it? If so, what do they say?

Comment

A field study is most appropriate because we want to investigate how people react to this new technology being placed in their natural environment. The methods that could be used are as follows. Observation could be used for answering both questions and subquestions. This could be done by video or by a person making notes. Questionnaires and interviews could also be designed to collect data to answer these questions.

images

13.2.4 Identify the Practical Issues

There are many practical issues to consider when doing any kind of evaluation, and it is important to identify as many of them as possible before starting the study. However, even experienced evaluators encounter surprise events, which is why it is useful to do a pilot study (discussed in Chapter 7). Some issues that should be considered include access to appropriate users, facilities and equipment, whether schedules and budgets are realistic, and whether the evaluators have the appropriate expertise to conduct the study. Depending on the availability of resources, there may have to be compromises that involve adapting or substituting methods. For example, evaluators may wish to perform usability tests using 20 users and then to run a three-week-long field study, but the budget available for the study may only cover the cost of five testers and a shorter field study. Another example is provided by the Nokia cell phone, which involved evaluating cell phones in a country where the evaluators do not speak the language fluently and are only slightly aware of cultural norms. In this situation the evaluators had to work out how to collect the data that they needed to answer their evaluation questions. Furthermore, cell phone users are highly mobile so the evaluators knew that there would be places where the cell phones would be used that they could not go, e.g. in bathrooms, bedrooms. During the study the evaluators may also have experienced surprise events that required them to take decisions there and then. For example, it may not have been possible to ride in the taxi or car with the user because there was not enough room. Of course, no evaluation is going to be perfect, and a good field study can be done without the evaluator seeing how the product is used 100% of the time, but it is helpful to be aware of the kind of compromises that may be necessary. Thinking about the kind of users who will be involved and logistical issues, such as availability of equipment, the schedule and the budget, and the kind of expertise needed to perform the study, when planning a study will help to ensure its success.

Users

It goes without saying that a key aspect of an evaluation is involving appropriate users or, in the case of analytical evaluation, focusing on the characteristics of the anticipated user population. When doing usability testing, for example, users must be found who represent the user population for which the product is targeted. This generally involves identifying users with a particular level of experience, e.g. novices or experts, or users with a range of expertise. The number of males and females within a particular age range, cultural diversity, educational experience, and personality differences may also need to be taken into account, depending on the kind of product being evaluated. Questionnaire surveys require large numbers of participants, so ways of identifying and reaching a representative sample of participants are needed. For field studies to be successful, the evaluator needs access to users who will interact with the technology in their natural setting.

Another issue to consider is how the users will be involved. The tasks used in a usability laboratory study should be representative of those for which the product is designed. However, there are no written rules about the length of time that a user should be expected to spend on an evaluation task. Ten minutes is too short for most tasks and two hours is a long time, so what is reasonable? Task times will vary according to the type of evaluation, but when tasks go on for more than 20 minutes, consider offering breaks. It is accepted that people using desktop computers should stop, move around, and change their position regularly after every 20 minutes spent at the keyboard to avoid repetitive strain injury. Evaluators also need to put users at ease so they are not anxious and will perform normally; it is important to treat them courteously. Participants should not be made to feel uncomfortable when they make mistakes. Greeting users, explaining that it is the product that is being tested and not them, and planning an activity to familiarize them with it before starting the task all help to put users at ease in test situations.

In field studies the onus is on the evaluators to fit in with the users and to cause as little disturbance to participants and their activities as possible. This requires practice, and even anthropologists who are trained in ethnographic methods may cause unforeseen changes (see Dilemma box below).

Dilemma: Is it Possible to Study People's Behavior without Influencing it?

A newspaper article describes how an anthropology student traveling through northern Kenya happens by chance to come upon an unknown tribe. He studies their rituals and reports the study in his PhD dissertation and several published articles in acclaimed journals. The study draws considerable attention because finding an unknown tribe is unusual in this day and age. It is the dream of many anthropologists because it allows them to study the tribe's customs before they are changed by outside influences. Of course, having published his work, the inevitable happens; more anthropologists make their way to the village and soon members of the tribe are drinking coke and wearing tee-shirts from prestigious universities and well-known tourist destinations. The Western habits of these outsiders gradually changes the tribe's behavior.

Ethnographers face a dilemma: is it possible to study people's behavior without changing it in the process?

Facilities and Equipment

There are many practical issues concerned with using equipment in an evaluation. For example, when using video you need to think about how you will do the recording: how many cameras and where do you put them? Some people are disturbed by having a camera pointed at them and will not perform normally, so how can you avoid making them feel uncomfortable? How will you record data about use of a mobile device when the users move rapidly from one environment to another? Several of the case studies in Chapter 12 addressed these issues. Think back, or reread the Nokia cell phone study, the Indian auxiliary midwife data collection study, and HutchWorld.

Activity 13.4

The evaluators of the Nokia cell phones described some of the logistics that they needed to consider; what were they?

Comment

The evaluators did not speak Japanese, the language of the users, and they knew that people using cell phones can be fast-moving as they go about their busy lives. Some of the things that the evaluators suggest may be necessary when conducting such a study include: spare batteries for recording devices; change and extra money for taxies or unforeseen expenses; additional clothes in case the weather suddenly changes, e.g. a rain jacket; medications; and snacks in case they don't have an opportunity to buy meals.

Schedule and Budget Constraints

Time and budget constraints are important considerations to keep in mind. It might seem ideal to have 20 users test your interface, but if you need to pay them, then it could get costly. Planning evaluations that can be completed on schedule is also important, particularly in commercial settings. However, as you have seen in the interview with Sara Bly, there is rarely enough time to do the kind of studies that you would ideally like, so you have to compromise and plan to do the best job possible with the resources and time available.

Expertise

Different evaluation methods require different expertise. For example, running user tests requires knowledge of experimental design and video recording. Does the evaluation team have the expertise needed to do the evaluation? If you need to analyze your results using statistical measures and you are unsure of how to use them, then consult a statistician before starting the evaluation and then again during data collection and analysis, if needed.

Activity 13.5

  1. Direct observation in the field, user testing, and questionnaires were used in the HutchWorld case study. What practical issues are mentioned in the case study? What other issues do you think the developers had to take into account?
  2. Direct observation in the field and interviews were the main methods used for evaluating early design ideas for the Indian auxiliary nurse midwives' record-keeping system. What practical issues had to be taken into account?
  3. In the study to investigate the conditions that make a collaborative digital ice hockey game engaging, the evaluators had to consider several practical issues, what were they?

Comment

  1. No particular practical issues are mentioned for the direct observation in the hospital, but there probably were restrictions on where and what the team could observe. For example, it is likely that access would be denied to very sick patients and during treatment times. Not surprisingly, user testing posed more problems, such as finding participants, putting equipment in place, managing the tests, and underestimation of the time needed to work in a hospital setting compared with the fast production times at Microsoft.
  2. The team did not speak the language of the users and both the Indian culture and the local culture of the nurses was foreign to them. There were other practical issues too, but these were the main ones. The team needed to establish acceptable ways of behaving, observing, and asking questions that were respectful, yet timely, and provided the data that they needed.
  3. The evaluators collected physiological data so they had to ensure that they did not cause physical or emotional harm to the participants. Expertise was needed to use the recording equipment which was strapped to the participants, so the study had to be done in a controlled environment. They also had to find participants whose ability to play the game was similar.

13.2.5 Decide How to Deal with the Ethical Issues

The Association for Computing Machinery (ACM) and many other professional organizations provide ethical codes (Box 13.1) that they expect their members to uphold, particularly if their activities involve other human beings (ACM, 1992). All data gathering requires you to consider ethical issues (see Chapter 7), but this is particularly important for evaluation because the participants are often put into unfamiliar situations. People's privacy should be protected, which means that their name should not be associated with data collected about them or disclosed in written reports (unless they give explicit permission). Personal records containing details about health, employment, education, financial status, and where participants live should be confidential. Similarly, it should not be possible to identify individuals from comments written in reports. For example, if a focus group involves nine men and one woman, the pronoun ‘she’ should not be used in the report because it will be obvious to whom it refers.

Box 13.1: ACM Code of Ethics

The ACM code outlines many ethical issues that professionals are likely to face (ACM, 1992). Section 1 outlines fundamental ethical considerations, while section 2 addresses additional, more specific considerations of professional conduct. Statements in section 3 pertain more specifically to individuals who have a leadership role. Principles involving compliance with the code are given in section 4. Three principles of particular relevance to this discussion are:

  • Ensure that users and those who will be affected by a system have their needs clearly articulated during the assessment of requirements; later the system must be validated to meet requirements.
  • Articulate and support policies that protect the dignity of users and others affected by a computing system.
  • Honor confidentiality.

Most professional societies, universities, government, and other research offices require researchers to provide information about activities in which human participants will be involved. This documentation is reviewed by a panel and the researchers are notified whether their plan of work, particularly the details about how human participants and data collected about them will be treated, is acceptable. Drawing up such an agreement is mandatory in most universities and major organizations. Indeed, special review boards generally prescribe the format required and many provide a detailed form which must be completed. Once the details are accepted the review board checks periodically in order to oversee compliance. In American universities they are known as Institutional Review Boards (IRB). Other countries use different names for similar processes. Over the years IRB forms have become increasingly detailed, particularly now that much research involves the Internet and people's interaction via communication technologies across the Internet. Several law suits at prominent universities have heightened attention to IRB compliance to the extent that it sometimes takes several months and multiple amendments to get IRB acceptance. IRB reviewers are not only interested in the more obvious issues of how participants will be treated and what they will be asked to do, they also want to know how the data will be analyzed and stored. For example, data about subjects must be coded and stored to prevent linking participants' names with that data. This means that names must be replaced by a code and that the code and the data must be stored separately, usually under lock and key. Figure 13.1 contains part of a completed IRB form to evaluate a Virtual Business Information Center (VBIC) at the University of Maryland.

Activity 13.6

Imagine you plan to conduct online interviews with 20 participants in a new chat environment (perhaps using AIM, AOL, or Yahoo! chat). What privacy issues would you need to consider when completing your IRB form?

Comment

You would need to discuss how you will perform the interview so that it is private; how you will collect the data; how the data will be stored, analyzed, and reported. For each, you will need to specify privacy and security considerations. For example, each participant will have a code. The codes and the names to which they relate will be stored separately from the data. At no time will real names be used, nor will there be reference to any markers that could enable identity of the participant, e.g. where the participant lives, works, gender, or ethnicity if these are distinguishing features among the pool of participants.

People give their time and their trust when they agree to participate in an evaluation study and both should be respected. But what does it mean to be respectful to participants? What should participants be told about the evaluation? What are participants' rights? Many institutions and project managers require participants to read and sign an informed consent form similar to the one in Box 13.2. This form explains the aim of the study and promises participants that their personal details and performance will not be made public and will be used only for the purpose stated. It is an agreement between the evaluator and the participants that helps to confirm the professional relationship that exists between them.

images

Figure 13.1 Excerpt from an IRB to evaluate a Virtual Business Information Center (VBIC) at the University of Maryland

Box 13.2: Informed Consent Form

I state that I am over 18 years of age and wish to participate in the evaluation study being conducted by Dr. Hoo and his colleagues at the College of Extraordinary Research, University of Highland, College Estate.

The purpose of the study is to assess the usability of HighFly, a website developed at the National Library to provide information to the general public.

The procedures involve the monitored use of HighFly. I will be asked to perform specific tasks using HighFly. I will also be asked open-ended questions about HighFly and my experience using it. In addition, the evaluators will observe my use of HighFly in my workplace and home using a handheld device and laptop or desktop computer.

All information collected in the study is confidential, and my name will not be identified at any time.

I understand that I am free to ask questions or to withdraw from participation at any time without penalty.

__________________ ____

Signature of Participant Date

(Adapted from Cogdill, 1999.)

The following summary provides guidelines that will help ensure evaluations are done ethically and that adequate steps to protect users' rights have been taken.

  • Tell participants the goals of the study and exactly what they should expect if they participate. The information given to them should include outlining the process, the approximate amount of time the study will take, the kind of data that will be collected, and how that data will be analyzed. The form of the final report should be described and, if possible, a copy offered to them. Any payment offered should also be clearly stated.
  • Be sure to explain that demographic, financial, health, or other sensitive information that users disclose or is discovered from the tests is confidential. A coding system should be used to record each user and, if a user must be identified for a follow-up interview, the code and the person's demographic details should be stored separately from the data. Anonymity should also be promised if audio and video are used.
  • Make sure participants know that they are free to stop the evaluation at any time if they feel uncomfortable with the procedure.
  • Consider your relationship with the participants and decide whether it is appropriate to provide incentives such as food, book tokens, or financial payment. For example, if it is your child taking part in a colleague's study, a gift token or a toy would be more appropriate than offering payment as an incentive.
  • Avoid including quotes or descriptions that inadvertently reveal a person's identity by using numbers or fictitious names to record and identify individuals. Where quotes are reported to illustrate findings then it is convention to replace words that would reveal the source with representative words, in square brackets. For example, if the study was evaluating a university's information system and one of the participants commented: “When I tried to send a message to Harry Jones about my meeting with Mary Ann Green the whole system suddenly froze,” then the comment would be quoted as: “When I tried to send a message to […] about my meeting with […] the whole system suddenly froze.”
  • Ask users' permission in advance to quote them, promise them anonymity, and offer to show them a copy of the report before it is distributed.

Activity 13.7

Think back to the Hutch World and Indian auxiliary nurse midwives case studies. What ethical issues did the developers have to consider?

Comment

The developers of HutchWorld considered all the issues just listed above. In addition, because the study involved patients, they had to be particularly careful that medical and other personal information was kept confidential. They were also sensitive to the fact that cancer patients may become too tired or sick to participate, so they reassured them that they could stop at any time if the task became onerous.

The team working with the Indian auxiliary nurse midwives were particularly careful to make sure that the nurses knew their rights and that they felt treated with respect. This was essential in order to build trust. Furthermore, it is likely that the participants may not have known about the usual evaluation ethics so the team was particularly careful to ensure that they were informed. Since this study also involved a medical system the team needed to ensure that personal medical information was treated confidentially. Privacy and security were major considerations.

The explosion in Internet and web usage has resulted in more research on how people use these technologies and their effects on everyday life (Jones, 1999). Consequently, there are many projects in which developers and researchers are logging users' interactions, analyzing blogs, recording web traffic, or examining conversations in chatrooms, bulletin boards, or on email. These studies can be done without users knowing that they are being studied. This raises ethical concerns, chief among which are issues of privacy, confidentiality, informed consent, and appropriation of others' personal stories (Sharf, 1999). People often say things online that they would not say face to face. Furthermore, many people are unaware that the personal information they share online can be read by someone with technical know-how years later, even after they have deleted it from their personal mailbox (Erickson et al., 1999).

Actvity 13.8

Studies of user behavior on the Internet may involve logging users' interactions and keeping a copy of their conversations with others. Should users be told that this is happening?

Comment

Yes, it is better to tell users in advance that they are being logged. Knowledge of being logged often ceases to be an issue as users become involved in what they are doing.

Dilemma: What Would You Do?

There is a famous and controversial story about a 1961–62 experiment by Yale social psychologist Stanley Milgram to investigate how people respond to orders given by people in authority. Much has been written about this experiment and details have been changed and embellished over the years, but the basic ethical issues it raises are still worth considering, even if the details of the actual study have been distorted.

The participants were ordinary residents of New Haven who were asked to administer increasingly high levels of electric shocks to victims when they made errors in the tasks they were given. As the electric shocks got more and more severe, so did the apparent pain of the victims receiving them, to the extent that some appeared to be on the verge of dying. Not surprisingly, those administering the shocks became increasingly disturbed by what they were being asked to do, but several continued, believing that they should do as their superiors told them. What they did not realize was that the so-called victims were, in fact, very convincing actors who were not being injured at all. Instead, the shock administrators were themselves the real subjects of the experiment. It was their responses to authority that were being studied in this deceptive experiment.

This story raises several important ethical issues. First, this experiment reveals how power relationships can be used to control others. Second and equally important, this experiment relied on deception. The experimenters were, in fact, the subjects and the fake subjects colluded with the real scientists to deceive them. Without this deception the experiment would not have worked.

Is it acceptable to deceive subjects to this extent for the sake of scientific discovery? What do you think?

13.2.6 Evaluate, Interpret, and Present the Data

Decisions are must be made about what data is needed to answer the study questions, how the data will be analyzed, and how the findings will be presented (see Chapter 8). To a great extent the method used determines the type of data collected, but there are still some choices. For example, should the data be treated statistically? Some general questions also need to be asked. Is the method reliable? Will the method measure what is intended, i.e. what is its validity? Are biases creeping in that will distort the results? Will the results be generalizable, i.e. what is their scope? Will the evaluation study be ecologically valid or is the fundamental nature of the process being changed by studying it?

Reliability

The reliability or consistency of a method is how well it produces the same results on separate occasions under the same circumstances. Another evaluator or researcher who follows exactly the same procedure should get similar results. Different evaluation methods have different degrees of reliability. For example, a carefully controlled experiment will have high reliability, whereas observing users in their natural setting will be variable. An unstructured interview will have low reliability: it would be difficult if not impossible to repeat exactly the same discussion.

Validity

Validity is concerned with whether the evaluation method measures what it is intended to measure. This encompasses both the method itself and the way it is performed. If, for example, the goal of an evaluation study is to find out how users use a new product in their homes, then it is not appropriate to plan a laboratory experiment. An ethnographic study in users' homes would be more appropriate. If the goal is to find average performance times for completing a task, then a method that only recorded the number of user errors would be invalid.

Biases

Bias occurs when the results are distorted. For example, expert evaluators performing a heuristic evaluation may be more sensitive to certain kinds of design flaws than others, and this will be reflected in the results. Evaluators collecting observational data may consistently fail to notice certain types of behavior because they do not deem them important. Put another way, they may selectively gather data that they think is important. Interviewers may unconsciously influence responses from interviewees by their tone of voice, their facial expressions, or the way questions are phrased, so it is important to be sensitive to the possibility of biases.

Scope

The scope of an evaluation study refers to how much its findings can be generalized. For example, some modeling methods, like the keystroke model, have a narrow, precise scope. The model predicts expert, error-free behavior so, for example, the results cannot be used to describe novices learning to use the system. The problems of overstating the results were discussed in more detail in Chapter 8.

Ecological Validity

Ecological validity concerns how the environment in which an evaluation is conducted influences or even distorts the results. For example, laboratory experiments are controlled and are quite different from workplace, home, or leisure environments. Laboratory experiments therefore have low ecological validity because the results are unlikely to represent what happens in the real world. In contrast, ethnographic studies do not impact the environment as much, so they have high ecological validity.

Ecological validity is also affected when participants are aware of being studied. This is sometimes called the Hawthorne effect after a series of experiments at the Western Electric Company's Hawthorne factory in the USA in the 1920s and 1930s. The studies investigated changes in length of working day, heating, lighting, etc., but eventually it was discovered that the workers were reacting positively to being given special treatment rather than just to the experimental conditions. Similar findings sometimes occur in medical trials. Patients given the placebo dose (a false dose in which no drug is administered) show improvement that is due to receiving extra attention that makes them feel good.

Assignment

Find a journal or conference publication that describes an interesting evaluation study or select one from www.hcibib.org or from a digital library such as the ACM Digital Library. Then use the DECIDE framework and your knowledge from Chapters 7 and 8 to analyze it. Some questions that you should seek to answer include:

  1. (a) What are the goals and the questions that provide the focus for the evaluation?
  2. (b) Which evaluation approaches and methods are used?
  3. (c) What data is collected and how is it analyzed?
  4. (d) What practical and ethical issues have been considered?
  5. (e) Comment on the reliability, validity, ecological validity, biases, and scope of the study.
  6. (f) Is there evidence of one or more pilot studies?
  7. (g) Is triangulation used?
  8. (h) What are the strengths and weaknesses of the study report? Write a 50–100 word critique that would help the author(s) improve their paper.

Summary

In this chapter we introduced the DECIDE framework, which will help you to plan an evaluation. There are six steps in the framework:

  1. Determine the goals.
  2. Explore the questions.
  3. Choose the approach and methods.
  4. Identify the practical issues.
  5. Decide how to deal with the ethical issues.
  6. Evaluate, interpret, and present the data.

Key Points

  • There are many issues to consider before conducting an evaluation study. These include the goals of the study, the approaches and methods to use, practical issues, ethical issues, and how the data will be gathered and analyzed.
  • The DECIDE framework provides a useful checklist for planning an evaluation study.

Further Reading

DENZIN, N.K. and LINCOLN, Y.S. (2005) The SageHandbook of Qualitative Research, 3rd Edition. Sage Publications. This book is a collection of chapters by experts in qualitative research. It is an excellent resource.

HOLTZBLATT, K. (ed.) (2005) Designing for the mobile device: experiences, challenges and methods. Communications of the ACM 48(7): 32–66. This collection of papers points out the challenges that evaluators face when studying mobile devices, particularly when the most appropriate study is a field study that may involve working in a different culture and changing physical environments regularly.

JONES, S. (ed.) (1999) Doing Internet Research: Critical Issues and Methods for Examining the Net. Sage Publications. As the title states, this book is concerned with research. However, several of the chapters provide information that will be useful for those evaluating software used on the Internet.

SHNEIDERMAN, B. and PLAISANT, C. (2005) Designing the User Interface: Strategies for Effective Human-Computer Interaction, 4th edn. Chapter 4: Evaluating interface designs, pp. 139–171. Addison-Wesley. This chapter provides a useful overview of evaluation and provides valuable references.

MALONEY-KRICHMAR, D. and PREECE, J. (2005) A Multilevel Analysis of Sociability, Usability and Community Dynamics in an Online Health Community, ACM Transactions on Computer-Human Interaction, 12(2):1–32. This paper describes how activity in an online community was evaluated using a combination of theoretical frameworks and evaluation methods.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset