Perspectives on data science for software engineering
Software analytics and its application in practice
Six Perspectives of Software Analytics
Experiences in Putting Software Analytics into Practice
Seven principles of inductive software engineering: What we do is different
Principle #1: Humans Before Algorithms
Principle #3: Get Early Feedback
Principle #5: Be smart with your learning
Principle #6: Live With the Data You Have
Principle #7: Develop a Broad Skill Set That Uses a Big Toolkit
The need for data analysis patterns (in software engineering)
Needs of Data Analysis Patterns
Building Remedies for Data Analysis in Software Engineering Research
From software data to software theory: The path less traveled
Pathways of Software Repository Research
From Observation, to Theory, to Practice
In Summary: Find a Theory or Build One Yourself
Can We Minimize the USB Driver Test Suite?
Still Not Convinced? Here’s More
Dynamic Artifacts Are Here to Stay
Transforming Software Practice
Universal Release Criteria Model
Resource/Schedule/Content Model
Using Models in Release Management
Research to Implementation: A Difficult (but Rewarding) Journey
How to tame your online services
Measuring individual productivity
No Single and Simple Best Metric for Success/Productivity
Measure the Process, Not Just the Outcome
Goodhart’s Law and the Effect of Measuring
How to Measure Individual Productivity?
Stack traces reveal attack surfaces
Visual analytics for software engineering data
Gameplay data plays nicer when divided into cohorts
Cohort Analysis as a Tool for Gameplay Data
A success story in applying data science in practice
Communication Process—Best Practices
There's never enough time to do all the testing you want
The Impact of Short Release Cycles (There's Not Enough Time)
Learn From Your Test Execution History
The perils of energy mining: measure a bunch, compare just once
Let's ENERGISE Your Software Energy Experiments
Identifying fault-prone files in large industrial software systems
A tailored suit: The big opportunity in personalizing issue tracking
Developer Dashboards or “A Tailored Suit”
What counts is decisions, not numbers—Toward an analytics design sheet
Example: App Store Release Analysis
A large ecosystem study to understand the effect of programming languages on code quality
Code reviews are not for finding defects—Even established tools need occasional evaluation
Post-Interview Discussion and Notes
Look for state transitions in temporal data
Bikeshedding in Software Engineering
Card-sorting: From text to themes
Recommendations for Tool Building
Evidence-based software engineering
The Aim and Methodology of EBSE
Which machine learning method do you need?
Do additional Data Arrive Over Time?
Are Changes Likely to Happen Over Time?
If You Have a Prediction Problem, What Do You Really Need to Predict?
Do You Have a Prediction Problem Where Unlabeled Data are Abundant and Labeled Data are Expensive?
Do You Need to Use Data From Different Sources?
Structure your unstructured data first!: The case of summarizing unstructured data with tag clouds
Unstructured Data in Software Engineering
Summarizing Unstructured Software Data
Parse that data! Practical tips for preparing your raw data for analysis
Print Information About Broken Records
Use Sets or Counters to Store Occurrences of Categorical Variables
Restart Parsing in the Middle of the Data Set
Test on a Small Subset of Your Data
Redirect Stdout and Stderr to Log Files
Store Raw Data Alongside Cleaned Data
Finally, Write a Verifier Program to Check the Integrity of Your Cleaned Data
Natural language processing is no free lunch
Natural Language Data in Software Projects
How to Apply NLP to Software Projects
Aggregating empirical evidence for more trustworthy decisions
What Does Data From Empirical Studies Look Like?
The Evidence-Based Paradigm and Systematic Reviews
How Far Can We Use the Outcomes From Systematic Review to Make Decisions?
If it is software engineering, it is (probably) a Bayesian factor
Causing the Future With Bayesian Networks
The Need for a Hybrid Approach in Software Analytics
Use the Methodology, Not the Model
Becoming Goldilocks: Privacy and data sharing in “just right” conditions
The wisdom of the crowds in predictive modeling for software engineering
So… How is That Related to Predictive Modeling for Software Engineering?
Examples of Ensembles and Factors Affecting Their Accuracy
Crowds for Transferring Knowledge and Dealing With Changes
Combining quantitative and qualitative methods (when mining software data)
Prologue: We Have Solid Empirical Evidence!
Correlation is Not Causation and, Even If We Can Claim Causation…
Collect Your Data: People and Artifacts
Conclusion: The Truth is Out There!
A process for surviving survey design and sailing through survey deployment
The Lure of the Sirens: The Attraction of Surveys
Navigating the Open Seas: A Successful Survey Process in Software Engineering
A Parable: The Blind Woman and an Elephant
Misinterpreting Phenomenon in Software Engineering
Using Data to Expand Perspectives
Five steps for success: How to deploy data science in your organizations
Step 1. Choose the Right Questions for the Right Team
Step 2. Work Closely With Your Consumers
Step 3. Validate and Calibrate Your Data
Step 4. Speak Plainly to Give Results Business Value
Step 5. Go the Last Mile—Operationalizing Predictive Models
How the release process impacts your software analytics
Linking Defect Reports and Code Changes to a Release
How the Version Control System Can Help
Gotcha #1: Security is Negatively Defined
Gotcha #2: Having Vulnerabilities is Actually Normal
Gotcha #3: “More Vulnerabilities” Does not Always Mean “Less Secure”
Gotcha #4: Design Flaws are not Usually Tracked
Gotcha #5: Hackers are Innovative Too
Gotchas from mining bug reports
Do Bug Reports Describe Code Defects?
It's the User That Defines the Work Item Type
Do Developers Apply Atomic Changes?
Make visualization part of your analysis process
Leveraging Visualizations: An Example With Software Repository Histories
Don't forget the developers! (and be careful with your assumptions)
Are We Actually Helping Developers?
Some Observations and Recommendations
Limitations and context of research
Data Quality of Open Source Repositories
Lack of Industrial Representatives at Conferences
Actionable metrics are better metrics
What Would You Say… I Should DO?
Cyclomatic Complexity: An Interesting Case
Are Unactionable Metrics Useless?
Replicated results are more trustworthy
Reliability and Validity in Studies
So What Should Researchers Do?
So What Should Practitioners Do?
Diversity in software engineering research
What Is Diversity and Representativeness?
Once is not enough: Why we need replication
Mere numbers aren't enough: A plea for visualization
Don’t embarrass yourself: Beware of bias in your data
Impact of Bias in Software Engineering
Which Features Should I Look At?
Operational data are missing, incorrect, and decontextualized
Data science revolution in process improvement and assessment?
Correlation is not causation (or, when not to scream “Eureka!”)
Examples from Software Engineering
In Summary: Wait and Reflect Before You Report
Software analytics for small software companies: More questions than answers
The Reality for Small Software Companies
Small Software Companies Projects: Smaller and Shorter
What to Do About the Dearth of Data?
What can go wrong in software engineering experiments?
Evaluate Different Design Alternatives
Match Data Analysis and Experimental Design
Do Not Rely on Statistical Significance Alone
Follow Guidelines for Reporting Experiments
Improving the reliability of experimental results
While models are good, simple explanations are better
How Do We Compare a USB2 Driver to a USB3 Driver?
The Issue With Our Initial Approach
“Just Tell us What Is Different and Nothing More”
Users Prefer Simple Explanations
The white-shirt effect: Learning from failed expectations
Simpler questions can lead to better insights
Context of the Software Analytics Project
Providing Predictions on Buggy Changes
(Anti-)Patterns in the Error-Handling Graph
How to Act on (Anti-)Patterns?
Continuously experiment to assess values early on
Every Idea Can Be Tested With an Experiment
How Do We Find Good Hypotheses and Conduct the Right Experiments?
Lies, damned lies, and analytics: Why big data needs thick data
How Great It Is, to Have Data Like You
Looking for Answers in All the Wrong Places
Beware the Reality Distortion Field
Build It and They Will Come, but Should We?
To Classify Is Human, but Analytics Relies on Algorithms
Lean in: How Ethnography Can Improve Software Analytics and Vice Versa
Finding the Ethnographer Within
Crashes, Hangs, and Bluescreens