While models are good, simple explanations are better

Venkatesh-Prasad Ranganath    Kansas State University, Manhattan, KS, United States

Abstract

We have all heard the phrase “Keep It Simple, Stupid (KISS).” Even so, we have been reminded about it time and again because “simple” varies with contexts. Here is an encounter with an incarnation of KISS while using data mining to enable software testing—tell me what I need and not all that you know.

Keywords

Pattern Mining; Software Testing; Structural; USB Protocol

Acknowledgments

The USB compatibility testing effort was carried out at Microsoft by Pankaj Gupta, Venkatesh-Prasad Ranganath, and Pradip Vallathol with advice from Randy Aull, Robbie Harris, Jane Lawrence, and Eliyas Yakub.

I am pretty sure you have heard the phrase Keep It Simple, Stupid (KISS). Even so, I am certain that you have been reminded about it time and again because “simple” varies with contexts. Here is my encounter with an incarnation of KISS while using data mining for software engineering—tell me what I need and not all that you know.

How Do We Compare a USB2 Driver to a USB3 Driver?

During the development of Windows 8, the USB team responsible for the USB driver in Windows had the option of extending the USB driver in Windows 7 (which we will refer to as the USB2 driver) to support the USB3 protocol. Instead, they decided to implement a new USB driver (which we will refer to as the USB3 driver) that supported all three versions of the USB protocol. Since USB2 driver was time-tested on previous versions of Windows, the USB team decided to ship both USB2 and USB3 drivers with USB2 driver servicing devices on the USB2 port and USB3 driver servicing devices on the USB3 port.

The success of the USB3 driver depended on supporting the following use-case: when a user plugs in a USB2 device into a USB3 port on Windows 8, the user experience should be similar to that of the device plugged into a USB2 port on Windows 8. In other words, when servicing a USB2 device, the behavior of the USB3 driver should be similar to that of the USB2 driver. Since the USB3 driver did not share any code with the USB2 driver, this problem amounted to ensuring/checking behavioral similarity between programs at well-defined external interfaces.

Around this time, my team was working on mining structural and temporal patterns [1] and we were introduced to the preceding problem. After some discussions, the USB team suggested that it might suffice to compare USB2 and USB3 drivers in terms of the request-response (traffic) patterns observed at their external interfaces when servicing the same device. The rationale being that observable behavior of drivers is dictated by the requests and responses exchanged via their external interface. So, all we needed to do was to mine patterns from the request-response (traffic) logs and compare the mined patterns.

The Issue With Our Initial Approach

After configuring the pattern mining algorithms to process USB driver traffic logs, we used them to mine patterns that took one of the following forms:

1. A conjunctive propositional predicate that describes an event. For example, the predicate method="fopen" && path="passwd.txt" && mode="r" describes an event in the log where fopen was invoked to open passwd.txt file in read mode.

2. Two conjunctive propositional predicates linked by a simple temporal operator that describes relative temporal ordering of the events. For example, (method="fopen" && path="passwd.txt" && return="0x1234") followed by (method="fclose" && file_handle="0x1234") describes two events e1 and e2 in the log where (method="fopen" && path="passwd.txt" && return="0x1234") describes e1, (method="fclose" && file_handle="0x1234") describes e2, and e1 is followed by e2 in the log.

The algorithms associated the mined patterns with numeric measures of significance such as support and confidence.

To compare the patterns mined from a pair of traffic logs, we mulled over how to rank patterns present in both logs and patterns unique to each log. We experimented with various thresholds of the difference between measures of a pattern common to both logs. We pondered over the order in which to consider the measures when comparing common patterns. Finally, we settled on some thresholds and presented our findings to the USB team.

While patterns-based comparison of logs [2] helped the USB team to identify instances of different traffic patterns exhibited by USB2 and USB3 drivers, the instances did not make immediate sense as the reasons for reporting the instances were not apparent (due to the seemingly ad-hoc choice of various ranking orders and thresholds). Consequently, the developer could not easily identify the interesting instances that required further exploration; hence, the solution could not be used in its current state.

We had the required information but it wasn’t readily accessible.

“Just Tell us What Is Different and Nothing More”

At this point, we asked the USB team “How could we improve the results?” They said “just tell us what is the different and nothing more (no ranking, no statistics).” They suggested that we just identify:

1. patterns common to the USB2 driver that are not exhibited by USB3 driver and

2. patterns uncommon to the USB2 driver but exhibited by USB3 driver.

In other words, given a set of USB2 devices, (1) amounted to identifying patterns observed with every USB2 device and USB2 driver combination but not observed with a USB2 device and USB3 driver combination and (2) amounted to identifying patterns observed with a USB2 device and USB3 driver combination but not exhibited by any USB2 device and USB2 driver combination.

This suggestion led us to disregard information about measures of significance and focus on identifying differences. The differing patterns identified with this suggestion had a simple explanation—absence of common patterns and presence of uncommon patterns. Consequently, this result was easy to process and, consequently, identify the differences to explore further. In the end, the USB team was happy with resulting solution and used it to quickly identify behavioral differences between USB2 and USB3 drivers that needed triaging.

While trimming down to the essence was crucial, we also needed to simplify the textual representation of patterns, remove redundant information, incorporate domain/expert knowledge, and work with existing driver framework to collect data. All of this was necessary to make the results readily accessible to and actionable by the user.

Looking Back

When we started our effort to help the USB team, we incorrectly assumed all information in the model of the system (ie, the set of mined patterns along with various measures of significance) generated by our approach was important. This might have been due to various reasons such as our lack of expertise in USB, our past success in using all information, or our lack of understanding of the problem. Independent of the reasons and the complexity of the underlying model, we were excited by and interested in detailed models while our customers, the USB team members, were interested in simple explanations of observations that could help them with quick project planning decisions.

Time and again, I have heard similar arguments in discussions about which machine learning or data mining technique to use—support vector machines provide highly accurate but hard to interpret/explain models while simple regression provides easy to interpret/explain but possibly less accurate models. In a way, the preceding experience provides better perspective for these arguments.

For example, consider a linear regression model that can explain both the polarity (ie, positive or negative) and magnitude of influence of independent variables (factors) on the dependent variable (outcome). All of the information in the model is not relevant to a user who is only interested in knowing if the effect of the factors on the outcome is linear. Similarly, the magnitude of influence is irrelevant to a user who is only interested in identifying the factors that negatively affect the outcome.

Users Prefer Simple Explanations

To summarize, when using data science for software engineering (and possibly other purposes), we should keep in mind that users are interested in simple explanations (of course, backed by robust and detailed models) of the observed phenomenon that can help them make good and quick decisions.

This insight proved to be rather useful in my subsequent flings with data analysis. Now, I hope you too will consider this insight in your next data science excursion and either be more effective or challenge the insight.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset