Measuring individual productivity

T. Fritz    University of Zurich, Zurich, CHE

Abstract

Measuring productivity of individual developers is challenging. In some domains, such as car manufacturing, specific outcome measures over time, such as the number of cars produced in a day, can work well to measure and incentivize productivity. However, the less clearly defined and more flexible process of software development makes it difficult, if not impossible, to define such measures. In particular, there is no single and simple best metric that can be used for all software developers and more individual combinations of measures are wanted and needed that also take into account the process and not just the final outcome. In this chapter, we will discuss some of the challenges and previous insights on the measuring of individual developer productivity.

Keywords

measuring; productivity; software development; flow; context switches; Goodhart's law

In the last century, one company was said to have determined the price of a software product by the estimated number of lines of code written. In turn, the company also paid their software developers based on the number of lines of code produced per day. What happened next is that the company’s developers started “gaming the system”—they suddenly wrote a lot more code, while the functionality captured in the code decreased. As one might imagine, adding more lines of code to a program without changing the behavior is easy. So incentivizing employees on a single outcome metric might just foster this behavior.

Overall, this example shows that it is not easy to find a good measure for a developer’s success or productivity. In some domains, such as car manufacturing, specific measures on the quantity of the outcome, such as the number of cars produced in a day, have worked well to incentivize employees and to measure their success. In the software development domain, however, where the outcome and the overall process of developing software is less clearly defined and less tangible, such outcome measures are difficult to define. In particular, trying to reduce the complex activity of developing software into a single measure of one particular outcome of the development process is probably impossible.

No Single and Simple Best Metric for Success/Productivity

In a study we conducted, we asked professional developers how they measure and assess their personal productivity [1]. The results show that software developers generally want a combination of measures to assess productivity, and these combinations varied significantly across developers. Even for the one metric that developers rated highest overall for assessing productivity—the number of work items (tasks, bugs) closed—developers stated that it heavily depends on the complexity and size of the task, so that further information is needed to interpret this metric.

These findings further indicate that there is no single and simple best measure for assessing a developer’s productivity. Given the variety of artifacts produced during software development, such as code artifacts, test cases, bug reports, and documentation, it is also not surprising that just focusing on the code would does not adequately reflect the progress a developer makes in his/her work. As one example of the variety of artifacts generated by developers every day, are the around 10,000 Java source files, 26,000 Bugzilla bug reports and the 45,000 newsgroup entries that were created over one development cycle of Eclipse—an open source IDE [2]. While the code is being compiled and executed in the end, the other artifacts are just as important to the process to make sure the software product is developed the right way and works.

Measure the Process, Not Just the Outcome

The variety in outcomes or artifacts generated in the process is just one important aspect for measuring a developer’s work and productivity. A majority of the participants in our study also mentioned that they are particularly productive when they get into the “flow” without many context switches. So rather than just focusing on a measure of the outcomes of the development activity, such as the code artifacts, the process of developing the software is important as well. Software development is a complex process that is comprised of various activities and frequent interruptions and interactions with multiple stakeholders involved in the development process, such as fellow developers, requirements engineers, or even customers [35].

Measuring aspects of the process of developing software, such as flow or context switches, is difficult since their cost and impact on productivity vary and are difficult to determine. For example, take a developer who is testing a system and has to wait for a build or for the application to start up. Switching context in this situation to perform a shorter task and filling the time in between might actually increase his/her productivity. On the other hand, take a developer who is programming and in the “flow.” When the developer is interrupted by another developer and asked about last weekend’s football scores, the forced context switch is expensive, decreases productivity, and can even result in more errors being made overall, as studies have shown [6]. So overall, while aspects of the process are important for measuring productivity, they are difficult to quantify and measure. Recently emerging biometric sensing technologies might provide new means to measure such aspects of individual productivity better, especially due to their pervasiveness and their decreasing invasiveness.

Allow for Measures to Evolve

When it comes to a person’s fitness and health, wearable fitness tracking devices, such as the Fitbit [7] or the Withings Pulse Band [8] have recently gained widespread adoption. Most of these devices employ a simple step count measure that has also been shown to be very successful in providing users valuable insights on their activity level and in promoting and improving health and fitness (eg, [9,10]). In an interview-based study with 30 participants who had used and adopted wearable activity tracking devices “in the wild” for between 3 and 54 months, we found that the step count provided long-term motivation for people to become and stay active. Yet, while for several participants, the devices helped foster engagement in fitness, the device, and in particular the step count measure, did not support their increasingly sophisticated fitness goals. As one example, the step count helped some participants to walk more and start running, but when they started adding new activities to further increase fitness, such as weight lifting or yoga, the devices failed to capture these [11].

Similarly, the activities and responsibilities of an individual developer evolve over time and thus the measures capturing his/her productivity have to evolve. For example, someone might start out as a developer in a team, and later on be promoted to manage teams. While in the first role, the number of bug fixes completed might be a reasonable proxy for their productivity, but as a manager, they will focus mostly on their team being productive and have little to no time to work on bug fixes themselves.

Goodhart’s Law and the Effect of Measuring

Goodhart’s law states that “When a measure becomes a target, it ceases to be a good measure.” This effect happened in the first example presented in this chapter. As soon as the lines of code metric was used to measure a developer’s productivity and affected the developer’s salary, it ceased to be a good indicator, with developers gaming the system to benefit. Coming back to the fitness-tracking domain, we found from our interviews that the numerical feedback provided by the devices effected users’ behavior in ways other than the intended fitness improvement. In many cases, the accountability and getting credit for activities became important, and even more important than the original goal, so that users adjusted their sports activities for better accountability, and became unhappy and annoyed when they forgot their devices despite being very active, or were merely “fishing for numbers” or gaming the system. One user, for example, stopped going backward on an elliptical machine since the device did not pick up on it.

The effect depends on how and what a measure is being used for, eg, is it just used for personal retrospection or is it used for adjusting someone’s salary? In any case, one needs to assess the influence a certain measure might have on a developer’s behavior and the risks it bears.

How to Measure Individual Productivity?

Measuring productivity of individual developers is challenging given the complex and multifaceted nature of developing software. There is not a single and simple best measure that works for everyone. Rather, you will have to tailor the measurement to the specific situation and context you are looking at and the specific goal you have in mind with your measurement. For example, if you are conducting a research study on the productivity gain of a code navigation support tool, a measure such as the edit ratio [12], ie, the ratio between edit and select events, might be a good indicator. However, this neglects other aspects, such as interruptions that occur or the resumption lag, which might be important measures when looking at a more general developer productivity definition. Similarly, due to the collaborative nature of software development, it is not necessarily possible to isolate an individual’s productivity from the team’s productivity. Take, for example, the idea of reducing interruptions for an individual developer. While reducing interruptions for the single developer might very well increase her/his productivity, it could block other team members from working and decrease their productivity, as well as the team’s overall.

One also has to assess who will or should have access to the measurement data and how might this affect the developer’s behavior and thus the productivity measure in the end. If the developer knows that her/his boss will have access to certain parts of the data, it is likely that the developer will make sure that the data that the boss has access to fits the purpose rather than the reality; whereas, if the data is just for the developer or, for example, independent research that anonymizes data completely, it is more likely that the data will reflect reality. Finally, a very important point is the privacy of the collected data. Are you actually able to collect the data needed from the developers, are you allowed to analyze it and did you make sure that only the intended people have access to it? Since any productivity measure will most likely require fairly sensitive information that could be used for or against someone, you need to make sure to pay attention to privacy concerns, treat them carefully, and be transparent about who will have access to the data.

References

[1] Meyer A.N., Fritz T., Murphy G.C., Zimmermann T. Software developers? Perceptions of productivity. In: Proc. of the ACM SIGSOFT 22nd international symposium on the foundations of software engineering 2014 (FSE'14); 2014.

[2] http://www.eclipse.org/eclipse/development/eclipse_3_0_stats.html.

[3] Müller S., Fritz T. Stakeholders' information needs for artifacts and their dependencies in a real world context. In: Proc. of the IEEE international conference on software maintenance 2013 (ICSM'13); 2013.

[4] Perry D., Staudenmayer N., Votta L. People, organizations, and process improvement. IEEE Softw. 1994;11(4):36–45.

[5] Singer J., Lethbridge T., Vinson N., Anquetil N. An examination of software engineering work practices. In: CASCON first decade high impact papers (CASCON '10). 2010.

[6] Bailey B.P., Konstan J.A. On the need for attention-aware systems: measuring effects of interruption on task performance, error rate, and affective state. Comput Hum Behav. 2006;22(4):685–708.

[7] http://www.fitbit.com/.

[8] http://www2.withings.com/us/en/products/pulse.

[9] Bravata D., Smith-Spangler C., Sundaram V., Gienger A., Lin N., Lewis R., et al. Using pedometers to increase physical activity and improve health. JAMA. 2007;298:2296–2304.

[10] Consolvo S., McDonald D., Toscos T., Chen M., Froehlich J., Harrison B., et al. Activity sensing in the wild: a field trial of UbiFit garden. In: Proc. CHI'08. 2008.

[11] Fritz T., Huang E.M., Murphy G.C., Zimmermann T. Persuasive technology in the real world: a study of long-term use of activity sensing devices for fitness. In: Proc. of the ACM conference on human factors in computing systems 2014 (CHI'14); 2014.

[12] Kersten M., Murphy G.C. Using task context to improve programmer productivity. In: Proc. of the ACM SIGSOFT 14th international symposium on the foundations of software engineering 2006 (FSE'06); 2006.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset