Chapter 13
In This Chapter
Recognizing the purpose of evaluations
Getting an overview of Kirkpatrick’s Four Levels
Understanding the basics of creating a practical evaluation plan
Listing of skills facilitators/trainers can use for their own professional evaluation
Evaluate performance — that’s the fifth and last stage of the ADDIE model discussed in Chapter 3. When you reach this stage, you have made it through the entire Training Cycle, and it is now that you can see the beauty of the complete cycle. You will find yourself returning to the earlier stages during the evaluation stage. For example, you will return to the assessment stage to confirm that you’re evaluating what you designed the training for in the first place. You may use the objectives you wrote in the second stage to create specific evaluation criteria. You want the training objectives you write to be specific, measurable, and easily converted to items on an evaluation instrument or performance rating.
The evaluation stage of The Training Cycle, highlighted in Figure 13-1, is important to you as a trainer. It is here that you can prove your value as a business partner to your organization. You will be able to answer questions such as: How has training changed employee performance? How has training increased sales or reduced expenses? How has training reduced rework and defects? How has training affected turnover and employee satisfaction? And ultimately, how is training affecting the bottom line?
In this chapter, I expand on the reason for conducting evaluations, describe Kirkpatrick’s Four Levels of evaluation, and provide guidance for how to design your own evaluation plan.
An exciting aspect of this chapter is a sidebar, written by Dr. Don Kirkpatrick in which he describes how the Four Levels came about. Dr. Kirkpatrick died a few years after writing this sidebar and it is a treasure to have his story in his words. His practical and logical thinking process for how he developed the Four Levels proves that everything doesn’t need to be complicated. Sometimes, less is better. This chapter also presents a sidebar by Dr. Jim Kirkpatrick and Dr. Jack Phillips, another leader in the training evaluation arena. We are fortunate to have these gentlemen help us understand evaluation.
The purpose of training is to change employees: their behavior, opinions, knowledge, or level of skill. The purpose of evaluation is to determine whether the objectives were met and whether these changes have taken place. One way to consider the importance of evaluations is to recognize the feedback it provides. Feedback can be obtained through self-reporting or by observing the trainee or from business data.
Various kinds of evaluations provide feedback to different people:
The concept of different levels of evaluation helps you understand the measurement and evaluation process. Don Kirkpatrick originally developed his Four Levels of training evaluation, reaction, learning, behavior, and results almost one-half century ago. Jim Kirkpatrick has taken up his father’s charge to impart the importance of evaluation. He gives it a new slant with Return on Expectations. Jack Phillips approaches evaluation from a return on investment (ROI) perspective. Even with the new evaluation twists, Kirkpatrick’s original Four Levels are as applicable today as they were in the 1950s.
As I discuss each of the Four Levels, note the elegant but simple sequence from Level 1 through Level 4. At each level the data become more objective and more meaningful to management and the bottom line. In addition, progressing through each level requires more work and more sophisticated data-gathering techniques.
It is interesting to note that, although there is an increased interest in training evaluation, many organizations still evaluate training at only the first level. It is easiest but doesn’t really give organizations what they need to measure the value of training.
Level 1, or participant reaction data, measures the learners’ satisfaction with the training. The reaction to a training session may be the deciding point as to whether it will be repeated — especially if it was conducted by someone outside the organization. If the training was presented internally, the Level 1 evaluation provides guidance about what to change. This is true for externally funded training as well, but sometimes, the provider doesn’t have a chance to make the improvement. The employee’s company will just not attend another training session. Most training efforts are evaluated at this level.
Sometimes called smile sheets, Level 1 evaluation usually consists of a questionnaire that participants use to rate their level of satisfaction with the training program and the trainer, among other things. In a virtual classroom, Level 1 data is often collected through electronic surveys.
If you’re conducting a multiple-day training session, it is beneficial to evaluate at the end of each day. If you’re conducting a one-day session, you may decide to evaluate it halfway through. It benefits the trainer because it provides feedback to you to adjust the design to better meet the participants’ needs. It also benefits participants because it allows them to think about what they learned and how they will apply it to their jobs.
No attempt is made to measure behavioral change or performance improvement. Nevertheless, Level 1 data does provide valuable information.
Level 2 measures the extent to which learning has occurred. The measurement of knowledge, skills, or attitude (KSAs) change indicates what participants have absorbed and whether they know how to implement what they learned. Probably all training designs have at least one objective to increase participant knowledge. Many training sessions also include objectives that improve specific skills. And some training sessions such as diversity or team building attempt to change attitudes.
The training design’s objectives provide an initial basis for what to evaluate in Level 2. This is the point in which trainers can find out not only how satisfied the participants are, but what they can do differently as a result of the training. Tests, skill practices, simulations, group evaluations, role-plays and other assessment tools focus on what participants learned during the program. Although testing is a natural part of learning, the word test often conjures up stress and fears left over from bad experiences in school. Therefore, when possible, substitute other words for tests or exams — even if testing is what you’re doing. Measuring learning provides excellent data about what participants mastered as a result of the training experience. This data can be used in several ways:
Level 3 evaluation measures whether the skills and knowledge are being implemented. Are participants applying what they learned and transferring what they learned to the job?
Because this measurement focuses on changes in behavior on the job, it becomes more difficult to measure for several reasons. First, participants can’t implement the new behavior until they have an opportunity. In addition, it is difficult to predict when the new behavior will be implemented. Even if there is an opportunity, the learner may not implement the behavior at the first opportunity. Therefore, timing becomes an issue for when to measure.
To complicate things even more, the participant may have learned the behavior and applied it on the job, but the participant’s supervisor would not allow the behavior to continue. As a trainer, you hope that’s not happening, but unfortunately it occurs more often than you want. This is when the training department must ask itself whether the problem requires a training solution. Due to the nature of Level 2 evaluation, the measures may include the frequency and use of skills, with input on barriers and enablers.
Measuring Levels 1 and 2 should occur immediately following the training, but you can see why this would not be true for Level 3. To conduct a Level 3 evaluation correctly, you must find time to observe the participants on the job, create questionnaires, speak to supervisors, and correlate data. You can also see that even though measuring at Level 3 may be difficult, the benefits of measuring behaviors are very clear.
Level 4 measures the business impact. Sometimes called cost-benefit analysis (incorrectly) or return on investment, it determines whether the benefits from the training were worth the cost of the training. At this level the evaluation is not accomplished through methods like those suggested in the previous three levels.
Results could be determined by factors such as reduced turnover, improved quality, increased quantity or output, reduction of costs, increase in profits, increased sales, improved customer service, reduction in waste or errors, less absenteeism, or fewer grievances. You also need to determine the other side of the equation, that is, how much the training costs to design and conduct as compared to the results. Identifying and capturing this data are relatively easy. You would account for the cost of the trainer, materials and equipment, travel for participants and the trainer, training space, and the cost of having participants in a training session instead of producing the organization’s services and products.
Measurements focus on the actual results on the business as participants successfully apply the program material. Typical measures may include output, quality, time, costs, and customer satisfaction.
Before we go further, is it logical to start with Level 1 and move through 2, 3, and 4? Well, not necessarily. Remember that in several places I’ve reminded you to start with the end in mind. This is certainly true with evaluation. Return to the purpose of the training. What does the organization want to accomplish? What does it expect as a result of the training? Reduced turnover rates? Larger sales? Fewer accidents? This is Level 4. Before you design your training program find out from management what their expectation is. Jim Kirkpatrick, Don’s son, calls this Return on Expectation, or ROE.
You will of course still use all four Levels and the chronological order will still be the same. But thinking about Level 4 first — standing Don’s Four Levels on their head if you will — gives evaluation a different perspective. Let’s examine how you can measure each of the levels. Methods for each level are presented along with guidelines to consider as you develop your evaluation plan.
Level 1, or reaction, can be easily measured during the training event or immediately following it, preferably before participants leave the classroom. A questionnaire, composed of both questions with a rating scale and open-ended questions, is usually used. Some trainers allow participants to take the evaluation with them and/or go online to complete it after the session. The drawback is a lower return rate.
How do you begin?
Determine what you want to learn about the training.
You will most likely want to know something about the content, including its relevance to the participants’ jobs and the degree of difficulty. You will also want to gather data about the trainer including effectiveness, communication skills, the ability to answer questions, and approachability.
Design a format that will be both easy for participants to complete and presents a way for you to quantify participant responses.
Many formats exist, and the design will be a factor of the first question — What do you want to learn? You may choose to have statements rated on a one- to seven-point scale representing strongly disagree to strongly agree (or poor to excellent). You may also wish to ask open-ended questions, or you may choose to do a combination of both types of questions. Even if you develop a format that has questions that are rated on a scale, it is a good idea to add space at the end for comments.
Plan to obtain 100-percent response ratings that are complete.
How do you accomplish all that? Most of it is in the timing. Ensure that you plan time into the training design to complete the questionnaire. Twenty minutes prior to the end of the training session pass out the evaluations and ask participants to complete them. It should take only ten minutes, allowing you time to facilitate your closing activity ensuring the learners’ last interactions are memorable and positive.
Your training department will most likely have determined an acceptable standard against which you will measure results. For example, 5. 5 on a 7. 0 scale may be considered acceptable by your training department.
If you follow these guidelines, you will be well on your way to finalizing an effective Level 1 evaluation. A comprehensive training-competency checklist can be found at the end of this chapter. You may want to use some aspects of it to create your Level 1 evaluation. You and your colleagues may want to use the evaluation to provide feedback to each other on your training skills.
Level 2, or learning, can be measured using self-assessments, facilitator assessments, tests, simulations, case studies, and other exercises. This type of evaluation should be conducted at the end of the training before participants leave to measure the degree to which the content was learned. Use pre- and post-test results to compare improvement.
Measuring before and after the training session gives you the best data because you will be able to compare the knowledge level prior to the session and after the training has been completed. This is the best way to determine whether participants gained knowledge or skills during the session. How do you evaluate? Remember KSAs? That’s what you will measure: the knowledge, skills, and attitudes that the training session was designed to improve. Use tests to measure knowledge and attitude surveys to measure changing attitudes. To measure skill acquisition, use performance tests, where the participant is actually modeling the skill. Like Level 1, attempt to obtain a 100-percent response rate.
Some trainers use a control group for comparison. And Dr. Kirkpatrick recommends that you do so if it is practical. While this may be the most scientific way to gather data, it may be wasted time. For example, if participants have another way to learn between pre- and post-tests, then don’t send them to training. On the other hand, if training is the only way that participants can gain the knowledge, skills, and attitudes, then why bother with a control group? You’ll need to be your own judge about whether a control group is beneficial.
Many types of testing formats exist from which to choose, and each brings with it advantages and disadvantages. True/False tests are easy to develop and cover many questions in a short amount of time. The drawback is that if the test is too easy, having superficial knowledge may lead to an inflated score. Be sure to use a subject matter expert (SME) to assist with the design.
Other options that exist include oral tests, essay tests, multiple-choice tests, and measured simulations such as in-basket exercises, business games, case studies, or role-plays. Assessments may also include self-assessments, team assessments, or instructor assessments. A Web-based evaluation may also be created. Although Web-based assessments are cost and time efficient, they bring a couple problems, including guaranteeing that participants are who they say they are when signing in and the ability to protect the questions in the exam banks. Most organizations using Web-based assessments have addressed both of these issues. Whatever testing option you use, be sure that the results are quantifiable.
Finally, ensure that testing conditions are optimized by limiting noise, having adequate lighting and comfortable temperature, and eliminating interruptions. Be certain that the test is only as long as it needs to be. Be sure that participants know the rules, such as whether it is acceptable to ask questions during the test.
Level 3, or behavior, is used to determine the successful transfer of learning to the workplace. Unlike levels one and two, you need to allow time for the changed behavior to occur. How much time? The experts differ, and with good reason. The amount of time required for the change to manifest itself will be dependent on the type of training, how soon the participant has an opportunity to practice the skill, how long it takes participants to develop a new behavioral pattern, and other aspects of the job. So, how long? Anywhere from 2 to 8 months. You will probably need to work with the subject matter expert (SME) to determine the length of the delay required to allow participants an opportunity to transfer the learning (behavior) to the job.
By the way, like Level 2, a pre- and post-testing method is recommended. And again, the question of whether you want to incorporate a control group for comparison needs to be decided.
What do you measure? Each item in an evaluation instrument should be related to an objective taught as part of the training program. Based on the training objectives, you will create a list of skills that describe the desired on-the-job performance. Use a SME to assist you with the design of the evaluation. They will understand the nuances of being able to complete a task. For example, a skill may be “uses the four-step coaching model with employees.” A SME will know that even though the four steps are essential, what truly makes the model successful is the supervisor’s “willingness to be available to respond to questions at any time. ” Knowing this, your checklist of skills will be expanded to include availability.
After you have identified the skills to measure, you select an evaluation method. Evaluation tools may include interviews, focus groups, on-site observations, follow-up questionnaires, customer surveys, and colleague or supervisory input.
Level 3 evaluations should not be taken lightly. It takes a major resource investment. Know what you will measure; know how you will measure; and most important, know how you will use the data.
What do you do if the results show that performance has not been changed or skills mastered? You need to go back to the training design. Certainly the first step is to determine whether the skill is required. If yes, examine the training material to ensure that appropriate learning techniques have been used and that enough emphasis has been placed on the skill. Perhaps a job aid is required to improve performance. Sometimes, you may discover something that did not show up in the needs assessment. For example, you may discover that participants are not using the skill because it is no longer important or is not frequently used on the job. In that case you may want to remove it from the training session. This is a perfect example of how the fifth stage of the Learning Cycle feeds back into the first stage.
Getting results that demonstrate performance has not improved is not what a trainer wants to hear. However, it is good to have the knowledge to make an intelligent decision about whether to maintain the training program, overhaul it, or scrap it entirely. Without a Level 3 evaluation, you would not likely be able to make a wise decision.
Level 4, or results, may be the most difficult to measure. Dr. Kirkpatrick frequently stated that the question he was asked most often was, “How do you evaluate Level 4?” Even though Level 4 is the most challenging, training professionals need to be able to demonstrate that training is valuable and can positively affect the bottom line.
One of the issues of evaluating at this level is that you can never be sure whether external factors affected what happened. There is always a possibility that something other than the on-the-job application contaminated the results. Can you really isolate the effects training has on business results? This is one time that using a control group can be helpful to the evaluation results. Yet even with a control group there may be other factors that impact the business, such as the loss of a good customer, the introduction of a new competitor, a new hiring practice, or the economy. Several statistical methods are available to you to consider other evidence. I don’t cover them in this book.
So what do you do when management asks you to provide tangible evidence that training is having positive results? A before-and-after measurement is relatively easy because records for the kinds of things you measure (turnover, sales, expenses, errors, grievances) are generally available. The trick is to determine which figures are meaningful.
A second difficultly is predicting the amount of time that should be allowed for the change to take effect. It may be anywhere between 9 and 18 months. Gather the data that you believe provides evidence of the impact of the training. This measurement usually extends beyond the training department and utilizes tools that measure business performance, such as sales, expenses, or rework. Remember, a key issue is to try to isolate training’s impact on results. You may not be able to prove beyond a doubt that training has had a positive business impact, but you will be able to produce enough evidence so that management can make the final decision.
You may choose to evaluate training at one or all levels. How do you decide? Base your decision on answers to the following questions:
Just because there are four evaluation levels doesn’t mean that you should use all four. After answering the preceding questions, decide which levels will be the most beneficial for each training program. Evaluation experts agree that Level 3 and especially Level 4 should be used sparingly due to the time and cost involved. A rule of thumb seems to be use Level 4 in those situations where the results are a top organizational priority or for training that is expensive.
I have presented a number of evaluation methods in this chapter. In this section, I examine some of the specific tools and discuss the advantages and disadvantages to help you choose the one that will work best.
This method measures how well trainees learn program content. A facilitator administers paper-and-pencil or computer tests in class to measure participants’ progress. The test should measure the learning specified in the objective. Tests should be valid and reliable. Valid means that an item measures what it is supposed to measure; reliable means that the test gives consistent results from one application to another.
Multiple-choice questions take time and consideration to design. However, they maximize test-item discrimination yet minimize the accuracy of guessing. They provide an easy format for the participants and an easy method for scoring.
True/False tests are more difficult to write than you may imagine. They are easy to score.
Matching tests are easy to write and to score. They require a minimum amount of writing but still offer a challenge.
Fill-in-the-blank or short-answer questions require knowledge without any memory aids. A disadvantage is that scoring may not be as objective as you may think. If the questions do not have one specific answer, the scorer may need to be more flexible than originally planned. Guessing by learners is reduced because there are no choices available.
Essays are the most difficult to score, although they do measure achievement at a higher level than any of the other paper-and-pencil tests. Scoring is the most subjective.
These question-and-answer surveys determine what changes in attitude have occurred as a result of training. Practitioners use these surveys to gather information about employees’ perceptions, work habits, motivation, values, beliefs, and working relations. Attitude surveys are more difficult to construct because they measure less tangible items. There is also the potential for participants to respond with what they perceive is the “right” answer.
Instructors’ or managers’ observations of performance on the job or in a job simulation indicate whether a learner is demonstrating the desired skills as a result of the training. Facilitate this process by developing a checklist of the desired behaviors. This is sometimes the only way to determine whether skills have transferred to the workplace. Some people panic or behave differently if they think they are being observed. Observations of actual performance or simulated performance can be time-consuming. It also requires a skilled observer to decrease subjectivity.
Also called performance checklists or performance evaluation instruments, these are surveys using a list of performance objectives required to evaluate observable performance. The checklists may be used in conjunction with observations.
Hard production data such as sales reports and manufacturing totals can help managers and instructors determine actual performance improvement on the job. An advantage of using productivity reports is that no new evaluation tool must be developed. The data is quantifiable. Disadvantages include a lack of contact with the participant and that records may be incomplete.
Progress and proficiency assessments by both managers and participants indicate perceived performance improvement on the job. Surveys may not be as objective as necessary.
Training managers, participants, and supervisors compare needs analysis results with course objectives and content to determine whether the program was relevant to participants’ needs. Relevancy ratings at the end of the program also contribute to the comparison.
Sometimes called a response sheet or smiley sheet, participants respond on end-of-program evaluation forms to indicate what they liked and disliked about the training delivery, content, logistics, location, and other aspects of the training experience. The form lets participants know that their input is desired. Both quantitative and qualitative data can be gathered.
Interviews can be used to determine the extent to which skills and knowledge are being used on the job. They may also uncover constraints to implementation. Like no other method, interviews convey interest, concern, and empathy in addition to collecting data. They are useful when it isn’t possible to observe behaviors directly. The interviewer becomes the evaluation tool, and this can be both an advantage and a disadvantage. Interviews are more costly than other methods, but they give instant feedback, and the interviewer has the ability to probe for more information.
Professional trainers administer assessment sheets and evaluation forms to measure the instructor’s competence, effectiveness, and instructional skills. See an example at the end of this chapter.
All of these evaluation methods work. All give you the information you need. Some work better than others for each of the Four Levels. The final decision about which method to use will be yours.
Trainers face a persistent trend to be more accountable and to prove their worth — their return on the dollars invested in training. Dr. Jack Phillips has been credited with the development of Return on Investment (ROI) to evaluate training.
ROI measurement compares the monetary benefits of the training program with the cost of the program. Few organizations conduct evaluations at this level. The current estimates appear to be somewhere between 10 and 20 percent. Even if organizations do evaluate training using ROI, many limit its use only to those training programs that
Although many organizations claim to want to know more about training’s ROI, few seem to be willing to make the investment required to gather and analyze the data.
ROI presents a process that produces the value-added contribution of training in a corporate-friendly format. The ROI process consists of five steps. Note that Kirkpatrick’s Levels 1 through 4 are essential for gathering the initial data.
Collect post-program data.
A variety of methods, similar to those identified in the last section, are used to collect Level 1 through Level 4 data.
Isolate the effects of training.
Many factors may influence performance data; therefore, steps must be taken to pinpoint the amount of improvement that can be attributed directly to the training program. A comprehensive set of tools is used that may include a control group, trend line analysis, forecasting models, and impact estimates from various groups.
Convert data to a monetary value.
Assess the Level 4 data and assign a monetary value. Techniques may include using historical costs, using salaries and benefits as value for time, converting output to profit contributions, or using external databases.
Tabulate program costs.
Identifying program costs includes at least the cost to design and develop the program, materials, facilitator salary, facilities, travel, administrative and overhead, and the salary or wages for the participants who attend the program.
Calculate the ROI.
ROI is calculated by dividing the net program benefits (program benefits less program costs) by the program costs times 100. In this step, you also identify intangible benefits, such as increased job satisfaction, improved customer service, and reduced complaints.
Although it feels like a great deal of work, it will be worth it if you need to provide evidence to management regarding the value of training.
Given the extra effort, it’s worth examining the benefits of using ROI in the process.
Probably the most important one is to respond to management’s question of whether training adds value to the organization. The ROI calculations convince management that training is an investment, not an expense. The ROI analysis also provides information about specific training programs: Which ones contributed the most to the bottom line; which ones need to be improved; and which ones are an expense to the organization. When the training practitioner acts on this data, the process has the added benefit of improving the effectiveness of all training programs.
ROI provides an essential aspect of the entire evaluation process.
Bringing a chapter that could be as big as a book to a close can be difficult. I am delighted that, if you chose to read the entire chapter, you were able to read comments directly from the evaluation experts, Drs. Kirkpatrick and Phillips. Both recognize the value that evaluation holds to making improvements.
You may now be ready to put your evaluation plan together. If so, don’t be shy about asking for outside assistance. Statisticians and researchers and other professionals will be able to expedite a process that will meet your stakeholders’ needs.
As a trainer, the Level 1 evaluations are important to you. Don’t ignore the feedback. Another practice you should consider is to ask a training colleague to conduct a peer review. Do you have a colleague whose opinion you value? Ask the individual to observe one of your programs — even a portion is helpful. Another trainer will observe things and give you feedback on techniques that participants may overlook.
I am delighted to share with you a comprehensive trainer-competency checklist in Table 13-1. Copy it and ask a colleague for input on your next training session. You can also find this evaluation form on the Learning and Development website.
Table 13-1 Training/Facilitating Competency Checklist
Did the facilitator |
|
Skill |
Comments |
Prepare: |
|
Facilitate Learning: |
|
Create a Positive Learning Environment: |
|
Encourage Participation: |
|
Communicate Content and Process: |
|
Deal with the Unexpected: |
|
Ensure Learning Outcomes: |
|
Establish Credibility: |
|
Evaluate the Training Solution: |
|
Additional Comments? |
Evaluation is the final stage in The Training Cycle but is certainly only the beginning of improving training. It will be up to you to take your training efforts to the next level, relying on evaluation to help you decide what to improve. In Dr. Kirkpatrick’s words, “Evaluation is a science and an art. It is a blend of concepts, theory, principles, and techniques. It is up to you to do the application.”