Table of Contents

Cover image

Title page

Copyright

Contributors

Acknowledgments

Introduction

Perspectives on data science for software engineering

Abstract

Why This Book?

About This Book

The Future

Software analytics and its application in practice

Abstract

Six Perspectives of Software Analytics

Experiences in Putting Software Analytics into Practice

Seven principles of inductive software engineering: What we do is different

Abstract

Different and Important

Principle #1: Humans Before Algorithms

Principle #2: Plan for Scale

Principle #3: Get Early Feedback

Principle #4: Be Open Minded

Principle #5: Be smart with your learning

Principle #6: Live With the Data You Have

Principle #7: Develop a Broad Skill Set That Uses a Big Toolkit

The need for data analysis patterns (in software engineering)

Abstract

The Remedy Metaphor

Software Engineering Data

Needs of Data Analysis Patterns

Building Remedies for Data Analysis in Software Engineering Research

From software data to software theory: The path less traveled

Abstract

Pathways of Software Repository Research

From Observation, to Theory, to Practice

Why theory matters

Abstract

Introduction

How to Use Theory

How to Build Theory

In Summary: Find a Theory or Build One Yourself

Success Stories/Applications

Mining apps for anomalies

Abstract

The Million-Dollar Question

App Mining

Detecting Abnormal Behavior

A Treasure Trove of Data …

… but Also Obstacles

Executive Summary

Embrace dynamic artifacts

Abstract

Acknowledgments

Can We Minimize the USB Driver Test Suite?

Still Not Convinced? Here’s More

Dynamic Artifacts Are Here to Stay

Mobile app store analytics

Abstract

Introduction

Understanding End Users

Conclusion

The naturalness of software

Abstract

Introduction

Transforming Software Practice

Conclusion

Advances in release readiness

Abstract

Predictive Test Metrics

Universal Release Criteria Model

Best Estimation Technique

Resource/Schedule/Content Model

Using Models in Release Management

Research to Implementation: A Difficult (but Rewarding) Journey

How to tame your online services

Abstract

Background

Service Analysis Studio

Success Story

Measuring individual productivity

Abstract

No Single and Simple Best Metric for Success/Productivity

Measure the Process, Not Just the Outcome

Allow for Measures to Evolve

Goodhart’s Law and the Effect of Measuring

How to Measure Individual Productivity?

Stack traces reveal attack surfaces

Abstract

Another Use of Stack Traces?

Attack Surface Approximation

Visual analytics for software engineering data

Abstract

Gameplay data plays nicer when divided into cohorts

Abstract

Cohort Analysis as a Tool for Gameplay Data

Play to Lose

Forming Cohorts

Case Studies of Gameplay Data

Challenges of Using Cohorts

Summary

A success story in applying data science in practice

Abstract

Overview

Analytics Process

Communication Process—Best Practices

Summary

There's never enough time to do all the testing you want

Abstract

The Impact of Short Release Cycles (There's Not Enough Time)

Learn From Your Test Execution History

The Art of Testing Less

Tests Evolve Over Time

In Summary

The perils of energy mining: measure a bunch, compare just once

Abstract

A Tale of Two HTTPs

Let's ENERGISE Your Software Energy Experiments

Summary

Identifying fault-prone files in large industrial software systems

Abstract

Acknowledgment

A tailored suit: The big opportunity in personalizing issue tracking

Abstract

Many Choices, Nothing Great

The Need for Personalization

Developer Dashboards or “A Tailored Suit”

Room for Improvement

What counts is decisions, not numbers—Toward an analytics design sheet

Abstract

Decisions Everywhere

The Decision-Making Process

The Analytics Design Sheet

Example: App Store Release Analysis

A large ecosystem study to understand the effect of programming languages on code quality

Abstract

Comparing Languages

Study Design and Analysis

Results

Summary

Code reviews are not for finding defects—Even established tools need occasional evaluation

Abstract

Results

Effects

Conclusions

Techniques

Interviews

Abstract

Why Interview?

The Interview Guide

Selecting Interviewees

Recruitment

Collecting Background Data

Conducting the Interview

Post-Interview Discussion and Notes

Transcription

Analysis

Reporting

Now Go Interview!

Look for state transitions in temporal data

Abstract

Bikeshedding in Software Engineering

Summarizing Temporal Data

Recommendations

Card-sorting: From text to themes

Abstract

Preparation Phase

Execution Phase

Analysis Phase

Tools! Tools! We need tools!

Abstract

Tools in Science

The Tools We Need

Recommendations for Tool Building

Evidence-based software engineering

Abstract

Introduction

The Aim and Methodology of EBSE

Contextualizing Evidence

Strength of Evidence

Evidence and Theory

Which machine learning method do you need?

Abstract

Learning Styles

Do additional Data Arrive Over Time?

Are Changes Likely to Happen Over Time?

If You Have a Prediction Problem, What Do You Really Need to Predict?

Do You Have a Prediction Problem Where Unlabeled Data are Abundant and Labeled Data are Expensive?

Are Your Data Imbalanced?

Do You Need to Use Data From Different Sources?

Do You Have Big Data?

Do You Have Little Data?

In Summary…

Structure your unstructured data first!: The case of summarizing unstructured data with tag clouds

Abstract

Unstructured Data in Software Engineering

Summarizing Unstructured Software Data

Conclusion

Parse that data! Practical tips for preparing your raw data for analysis

Abstract

Use Assertions Everywhere

Print Information About Broken Records

Use Sets or Counters to Store Occurrences of Categorical Variables

Restart Parsing in the Middle of the Data Set

Test on a Small Subset of Your Data

Redirect Stdout and Stderr to Log Files

Store Raw Data Alongside Cleaned Data

Finally, Write a Verifier Program to Check the Integrity of Your Cleaned Data

Natural language processing is no free lunch

Abstract

Natural Language Data in Software Projects

Natural Language Processing

How to Apply NLP to Software Projects

Summary

Aggregating empirical evidence for more trustworthy decisions

Abstract

What's Evidence?

What Does Data From Empirical Studies Look Like?

The Evidence-Based Paradigm and Systematic Reviews

How Far Can We Use the Outcomes From Systematic Review to Make Decisions?

If it is software engineering, it is (probably) a Bayesian factor

Abstract

Causing the Future With Bayesian Networks

The Need for a Hybrid Approach in Software Analytics

Use the Methodology, Not the Model

Becoming Goldilocks: Privacy and data sharing in “just right” conditions

Abstract

Acknowledgments

The “Data Drought”

Change is Good

Don’t Share Everything

Share Your Leaders

Summary

The wisdom of the crowds in predictive modeling for software engineering

Abstract

The Wisdom of the Crowds

So… How is That Related to Predictive Modeling for Software Engineering?

Examples of Ensembles and Factors Affecting Their Accuracy

Crowds for Transferring Knowledge and Dealing With Changes

Crowds for Multiple Goals

A Crowd of Insights

Ensembles as Versatile Tools

Combining quantitative and qualitative methods (when mining software data)

Abstract

Prologue: We Have Solid Empirical Evidence!

Correlation is Not Causation and, Even If We Can Claim Causation…

Collect Your Data: People and Artifacts

Build a Theory Upon Your Data

Conclusion: The Truth is Out There!

Suggested Readings

A process for surviving survey design and sailing through survey deployment

Abstract

Acknowledgments

The Lure of the Sirens: The Attraction of Surveys

Navigating the Open Seas: A Successful Survey Process in Software Engineering

In Summary

Wisdom

Log it all?

Abstract

A Parable: The Blind Woman and an Elephant

Misinterpreting Phenomenon in Software Engineering

Using Data to Expand Perspectives

Recommendations

Why provenance matters

Abstract

What’s Provenance?

What are the Key Entities?

What are the Key Tasks?

Another Example

Looking Ahead

Open from the beginning

Abstract

Alitheia Core

GHTorrent

Why the Difference?

Be Open or Be Irrelevant

Reducing time to insight

Abstract

What is Insight Anyway?

Time to Insight

The Insight Value Chain

What To Do

A Warning on Waste

Five steps for success: How to deploy data science in your organizations

Abstract

Step 1. Choose the Right Questions for the Right Team

Step 2. Work Closely With Your Consumers

Step 3. Validate and Calibrate Your Data

Step 4. Speak Plainly to Give Results Business Value

Step 5. Go the Last Mile—Operationalizing Predictive Models

How the release process impacts your software analytics

Abstract

Linking Defect Reports and Code Changes to a Release

How the Version Control System Can Help

Security cannot be measured

Abstract

Gotcha #1: Security is Negatively Defined

Gotcha #2: Having Vulnerabilities is Actually Normal

Gotcha #3: “More Vulnerabilities” Does not Always Mean “Less Secure”

Gotcha #4: Design Flaws are not Usually Tracked

Gotcha #5: Hackers are Innovative Too

An Unfair Question

Gotchas from mining bug reports

Abstract

Do Bug Reports Describe Code Defects?

It's the User That Defines the Work Item Type

Do Developers Apply Atomic Changes?

In Summary

Make visualization part of your analysis process

Abstract

Leveraging Visualizations: An Example With Software Repository Histories

How to Jump the Pitfalls

Don't forget the developers! (and be careful with your assumptions)

Abstract

Acknowledgments

Disclaimer

Background

Are We Actually Helping Developers?

Some Observations and Recommendations

Limitations and context of research

Abstract

Small Research Projects

Data Quality of Open Source Repositories

Lack of Industrial Representatives at Conferences

Research From Industry

Summary

Actionable metrics are better metrics

Abstract

What Would You Say… I Should DO?

The Offenders

Actionable Heroes

Cyclomatic Complexity: An Interesting Case

Are Unactionable Metrics Useless?

Replicated results are more trustworthy

Abstract

The Replication Crisis

Reproducible Studies

Reliability and Validity in Studies

So What Should Researchers Do?

So What Should Practitioners Do?

Diversity in software engineering research

Abstract

Introduction

What Is Diversity and Representativeness?

What Can We Do About It?

Evaluation

Recommendations

Future Work

Once is not enough: Why we need replication

Abstract

Motivating Example and Tips

Exploring the Unknown

Types of Empirical Results

Do's and Don't's

Mere numbers aren't enough: A plea for visualization

Abstract

Numbers Are Good, but…

Case Studies on Visualization

What to Do

Don’t embarrass yourself: Beware of bias in your data

Abstract

Dewey Defeats Truman

Impact of Bias in Software Engineering

Identifying Bias

Assessing Impact

Which Features Should I Look At?

Operational data are missing, incorrect, and decontextualized

Abstract

Background

Examples

A Life of a Defect

What to Do?

Data science revolution in process improvement and assessment?

Abstract

Correlation is not causation (or, when not to scream “Eureka!”)

Abstract

What Not to Do

Example

Examples from Software Engineering

What to Do

In Summary: Wait and Reflect Before You Report

Software analytics for small software companies: More questions than answers

Abstract

The Reality for Small Software Companies

Small Software Companies Projects: Smaller and Shorter

Different Goals and Needs

What to Do About the Dearth of Data?

What to Do on a Tight Budget?

Software analytics under the lamp post (or what star trek teaches us about the importance of asking the right questions)

Abstract

Prologue

Learning from Data

Which Bin is Mine?

Epilogue

What can go wrong in software engineering experiments?

Abstract

Operationalize Constructs

Evaluate Different Design Alternatives

Match Data Analysis and Experimental Design

Do Not Rely on Statistical Significance Alone

Do a Power Analysis

Find Explanations for Results

Follow Guidelines for Reporting Experiments

Improving the reliability of experimental results

One size does not fit all

Abstract

While models are good, simple explanations are better

Abstract

Acknowledgments

How Do We Compare a USB2 Driver to a USB3 Driver?

The Issue With Our Initial Approach

“Just Tell us What Is Different and Nothing More”

Looking Back

Users Prefer Simple Explanations

The white-shirt effect: Learning from failed expectations

Abstract

A Story

The Right Reaction

Practical Advice

Simpler questions can lead to better insights

Abstract

Introduction

Context of the Software Analytics Project

Providing Predictions on Buggy Changes

How to Read the Graph?

(Anti-)Patterns in the Error-Handling Graph

How to Act on (Anti-)Patterns?

Summary

Continuously experiment to assess values early on

Abstract

Most Ideas Fail to Show Value

Every Idea Can Be Tested With an Experiment

How Do We Find Good Hypotheses and Conduct the Right Experiments?

Key Takeaways

Lies, damned lies, and analytics: Why big data needs thick data

Abstract

How Great It Is, to Have Data Like You

Looking for Answers in All the Wrong Places

Beware the Reality Distortion Field

Build It and They Will Come, but Should We?

To Classify Is Human, but Analytics Relies on Algorithms

Lean in: How Ethnography Can Improve Software Analytics and Vice Versa

Finding the Ethnographer Within

The world is your test suite

Abstract

Watch the World and Learn

Crashes, Hangs, and Bluescreens

The Need for Speed

Protecting Data and Identity

Discovering Confusion and Missing Requirements

Monitoring Is Mandatory

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset