Chapter 8

Analyzing Security Data

Andrew Meneely*    * Department of Software Engineering, Rochester Institute of Technology, Rochester, NY, USA

Abstract

Security is a challenging and strange property of software. Security is not about understanding how a customer might use the system; security is about ensuring that an attacker cannot abuse the system. Instead of defining what the system should do, security is about ensuring that system does not do something malicious. As a result, applying traditional software analytics to security leads to some unique challenges and caveats. In this chapter, we will discuss four “gotchas” of analyzing security data, along with vulnerabilities and severity scoring. We will describe a method commonly-used for collecting security data in open source projects. We will describe some of the state-of-the-art in analyzing security data today.

Methods: Analyzing Vulnerability Coverage.

Keywords

Security

Vulnerability

Mining

Repository

8.1 Vulnerability

Software engineers today are faced with a tough set of expectations. They are asked to develop their software on time, on budget, and with no bugs. The “no bugs” category, often called software quality, is a grab-bag of various types of mistakes that a developer can make. Bug reports might have statements like “the app crashes when I hit this button,” or “the installation script fails on this operating system.” Preventing, finding, and fixing bugs are an enormous portion of the software development lifecycle that manifests itself in activities like software testing, inspections, and design.

But bugs also have a darker, more sinister cousin: the vulnerability. Rather than the system failing to do what it’s supposed to do, the system is abused in a cleverly malicious way. Instead of registering an email address in a web application, an attacker may inject operating system commands that leads to root access on the server. Or a common segmentation fault becomes a denial of service attack when it can be reproduced over and over again.

Informally, a vulnerability is a software fault that has security consequences. Our formal definition of a vulnerability is as follows (adapted from [1]), which is based on a standard definition of fault found in Definition 1.

Definition 1

A vulnerability is an instance of a fault that violates an implicit or explicit security policy.

Faults are the actual mistake a developer made in source code that results in a failure. The “security policy” of a system is an implicit or explicit understanding of how the system adheres to three properties: confidentiality, integrity, and availability. For example, if a healthcare system exposed patient records to the public, then the intended confidentiality of the system was violated. In some cases, a software development team may define a specific security policy as a part of their requirements document. In most cases, however, software development teams have a simple “I’ll know it when I see it” policy regarding security.

Though vulnerabilities are considered to be a subset of faults, they are a different breed of faults altogether. A typical bug might be considered as some behavior where the system falls short, whereas a vulnerability is one where the system exhibits behavior beyond the specification. For example, if a user is allowed to type executable Javascript in a web application, then they can compromise other users who view that data, which also known as a cross-site scripting (XSS) vulnerability. Or, as another example, if an attacker is able to hijack another user’s session, then they have provided themselves with another means of authentication that goes beyond the system’s specification.

To visualize the difference between vulnerabilities and typical bugs conceptually, consider Figure 8.1. The perfectly-rounded circle is the system “as it is should be.” The squiggly circle is what the system actually is. In any practical setting the system will not match up perfectly to expectations, leading to two areas of mistakes: places where the system falls short of its expected functionality (typical bugs) and places where the system does more than what was specified. Vulnerabilities are in the areas where too much, or unintended, functionality is allowed.

f08-01-9780124115194
Figure 8.1 Conceptual difference between typical bugs and vulnerabilities.

We note that “as it should be” is not necessarily the system’s specifications. What the system ought to be is often a combination of explicit statements and implicit assumptions. In fact, vulnerabilities often exist where the specifications themselves can be insecure.

Thus, with security, just getting a system working is not enough. The system must also not do what it is not supposed to. This conceptual difference between vulnerabilities and typical bugs not only alters the way developers approach software quality, it also introduces some “gotchas” regarding data analysis.

8.1.1 Exploits

Vulnerabilities only become dangerous when they are actually taken advantage of with malicious intent. Exploits are the manifestation of that malicious intent. A single vulnerability can have many, potentially infinite exploits for it. In this chapter, our definition of an exploit is:

Definition 2

An exploit is a piece of software, a chunk of data, or a sequence of commands that takes advantage of a vulnerability in an effort to cause unintended or unanticipated behavior.

Exploits can come in many different forms. They can be a simple string that an attacker manually enters into a web application, or they can be sophisticated malware. One assumption that can be made about exploits is that a lack of exploits implies a lower risk. Just because no one has taken the time to write an exploit does not mean that a damaging one will not be written.

Exploit avoidance is a much different practice for software engineers than vulnerability prevention. Examples of exploit avoidance include intrusion detection systems and anti-virus systems that provide a layer of defense that can detect specific exploits. These systems, while important for users, cannot be relied upon fully.

8.2 Security Data “Gotchas”

Security data, especially vulnerability data, have many concepts that translate nicely from the software quality realm. Vulnerabilities can be tracked in the same way as bugs, e.g., using modern issue tracking systems. Vulnerabilities manifest themselves as design flaws or coding mistakes in the system, much like bugs. However, the malicious nature of their use and the conceptual difference of preventing unintended functionality means that any analysis of vulnerabilities are subject to a variety of caveats.

8.2.1 Gotcha #1. Having Vulnerabilities is Normal

A common outsider’s assumption is that admitting a large software product has vulnerabilities is a liability. After all, damage to a company brand is at stake, so why make a big deal about a few wrong lines of code.

However, companies have matured beyond this approach to practicing responsible disclosure, i.e., disclosing the details about a vulnerability after it is has been fixed. Responsible disclosure has led to a variety of benefits, such as the current cultural shift to the assumption that having vulnerabilities is normal. In fact, the practice of responsible disclosure has been a significant driver in modern vulnerability research as developers can learn from each other’s mistakes.

8.2.1.1 Analytical consequences of Gotcha #1

 Having no vulnerabilities reported does not mean no vulnerabilities exist.

 Having no vulnerabilities reported could mean the team is not focusing on finding, fixing, and preventing vulnerabilities

8.2.2 Gotcha #2. “More Vulnerabilities” Does not Always Mean “Less Secure”

In the age of responsible disclosure, we have found that vulnerabilities are actually quite common. In 2013 alone the Chromium (the basis of Google Chrome) project and the Linux kernel self-reported over 150 vulnerabilities each. More broadly, the US National Vulnerability Database (NVD) has increased in size dramatically over the past several years. If one were to strictly adhere to the assumptions of metrics such as “defect density,” one might assume that this influx of vulnerabilities means that software is becoming less secure, except that projects are simply keeping better records.

However, vulnerabilities are a unique defect due to several factors:

 Record-keeping practices have improved with the evolution of distributed version control systems, code review systems, and collaboration tools that maintain artifact traceability.

 Software projects are improving their responsible disclosure practices, leading to an increase in interest from the security enthusiast community.

 Due to the severe nature of vulnerabilities, prominent companies such as Google and Microsoft offer bounties in the thousands of US dollars for information leading to a vulnerability. Google currently pays out those bounties on a nearly monthly basis.

 Discovery of a single vulnerability often leads to the discovery of other, similar vulnerabilities since developers are learning security principles as they fix vulnerabilities.

 The availability and quality of comprehensive vulnerability taxonomies, such as the Common Weakness Enumeration have improved.

 Improved security awareness among developers has led to developers retroactively labeling traditional defects as vulnerabilities.

8.2.2.1 Analytical consequences of Gotcha #2

The main consequence is that, at face value, you cannot assume that an increase in vulnerability discovery implies decaying security. This consequence applies at both the micro and macro levels. Software development projects both mature and decay at the same time, so an increase or decrease in vulnerabilities overall is more often the result of external circumstances than intrinsic quality.

For individual source code files, the “vulnerability density” may not be as robust as its “defects density” counterpart. Since developers often discover vulnerabilities in batches based on their knowledge of prior vulnerabilities, the chances that a vulnerability is discovered is skewed toward particular vulnerability types.

To mitigate this density problem, many academic studies regarding vulnerabilities opt for a binary labeling of “vulnerable” and “neutral” files, as opposed to trying to predict the total number of vulnerabilities for a file. A vulnerable file is defined as a file that has been fixed by at least one vulnerability, and a neutral file is defined as a file that has had no known vulnerabilities fixed. This labeling alters the analysis to be more typically binary classification than regression techniques.

8.2.3 Gotcha #3. Design-Level Flaws are not Usually Tracked

Vulnerabilities come in all sizes. A small, code-level mistake such as a format string vulnerability can be easily remedied at the line level, for example. Lacking the ability to provide audit logs to mitigate repudiation threats, however, is a much bigger problem. Historically, most vulnerabilities reported in databases such as the NVD tend to be code-level vulnerabilities. Design flaws, security-related or not, are rarely tracked in any consistent way.

8.2.3.1 Analytical consequences of Gotcha #3

While design vulnerabilities are common they are often not tracked. Thus, most of the academic research surrounding vulnerability data focuses primarily on coding mistakes. With a lack of empirical results on security-related design flaws, research that provides security at the design level may not have any empirical support to validate against. Empirical studies into secure design are far behind studies in coding-level mistakes.

8.2.4 Gotcha #4. Security is Negatively Defined

The security of a software system is typically defined over three properties: Confidentiality, Integrity, and Availability. Confidentiality is the ability of a system to keep sensitive information from leaking out. Integrity is the ability of the system to prevent unauthorized tampering of data or functionality. Availability is the ability of the system to be continually accessible to the user.

Each of those properties, however, are defined according to what people should not be able to do. An attacker should not be able to steal passwords. An attacker should not be able execute arbitrary code.

From a requirements engineering point of view, security is considered to be a constraint on the entire system that does not trace to any one feature. Instead, security applies to all features. However, security is not alone in being negatively-defined. Other negatively-defined non-functional requirements include safety and resilience as they are properties the system must demonstrate in extreme circumstances.

Furthermore, security is an emergent property of software. An emergent property is one that builds upon many properties of the system, and can be brought down by a single flaw. Consider pitching a tent in the rain. The “staying dry” property is not a single feature of the tent, it’s a combination of many different factors: the tent must be leak-free, deployed properly, the flap closed, and not be placed in a lake. Security must be achieved through a wide variety of means and can be compromised by one problem.

For all negatively-defined properties, developers cannot simply execute a checklist to maintain those properties. Improving security does not mean “do A, B, and C”; instead, it means “nowhere should A, B, C or anything like them be allowed.”

8.2.4.1 Analytical consequences of Gotcha #4

Thus, many security practices today involve creating a potentially infinite list of past mistakes to avoid repeating. Avoiding past mistakes may improve the security of a system, but the overall security of a system cannot be fully defined in a single metric due to the above properties. Any methodology or metric provided for the assessment of security must account for more specific aspects of security. For example, metrics such as “the system was able to filter 96% of all exploit strings” must have the caveat that the list may be incomplete, and we don’t know how many more exploits strings may pass if we keep writing. Thus, any assessment method of security must account for the fact that security is an emergent, negatively-defined, non-functional requirement of software.

8.3 Measuring Vulnerability Severity

Like their defect counterparts, not all vulnerabilities are the same. A vulnerability that exposes already-public information should be considered a lower severity than one that allows arbitrary code execution on a server, for example. To quantify differences, several vulnerability scoring systems have emerged in recent years. In 2005, a group of security experts collaborated on the Common Vulnerability Scoring System (CVSS), which today has been adopted by NIST and the National Vulnerability Database.

8.3.1 CVSS Overview

The CVSSv2 breaks down its metrics in to into three groups: Base Metrics, Temporal Metrics, and Environmental Metrics. Base Metrics are intended to represent the unchanging characteristics of a vulnerability regardless of time or environment. Temporal Metrics are intended to represent characteristics that may evolve over time. Environmental Metrics are intended to provide context for a vulnerability in its given environment. All of the answers to the metrics are given an ordinal label. The vector of answers for a vulnerability may look something like this: “(AV:N/AC:M/Au:N/C:P/I:P/A:P)”

The CVSSv2 Base Metrics include Access Vector, Access Complexity, Authentication, and three Impact Metrics. Access Vector denotes that the vulnerability is potentially exploitable over a network (choices: Local, Adjacent Network, Network). Access Complexity (choices: high, medium, low) denotes the expected level of expertise required to construct an exploit for this vulnerability. Authentication denotes how many layers of authentication were required to exploit the vulnerability. Finally, the three Impact Metrics are Confidentiality Impact, Integrity Impact, and Availability Impact, each with choices of “none,” “partial,” or “complete.”

The CVSSv2 Temporal Metrics include measures of Exploitability, Remediation Level, and Report Confidence. Exploitability denotes whether or not an exploit is currently the wild. Remediation Level denotes what the vendor has done to fix, mitigate, or work around the vulnerability. Report Confidence denotes when information is currently still being gathered and investigation is developing. In all three cases, the intended understanding is to provide information as it is available so that the message can get out to users, system administrators, and other stakeholders as it becomes available.

The CVSSv2 Environmental Metrics are based on Collateral Damage Potential, Target Distribution, and the three Requirement Metrics of Confidentiality, Integrity, and Availability. Collateral Damage is about what other systems may be affected by the vulnerability. Target Distribution denotes whether a vulnerability only affects one release, or many releases of a given product. The Requirement Metrics are for Confidentiality, Integrity, and Availability and provide a way to denote the vulnerability’s severity in the context of a larger product. Each of the three Requirements Metrics have choices of “low,” “medium,” or “high.”

The CVSSv2 also provides a weighting scheme to combine all of the Base Metrics into a single number from 0 to 10, and to provide Temporal and Environment subscores.

8.3.2 Example CVSS Application

To demonstrate the application of the CVSS, consider the following vulnerability entry number CVE-2011-3607:

Integer overflow in the ap_pregsub function in server/util.c in the Apache HTTP Server 2.0.x through 2.0.64 and 2.2.x through 2.2.21, when the mod_setenvif module is enabled, allows local users to gain privileges via a .htaccess file with a crafted SetEnvIf directive, in conjunction with a crafted HTTP request header, leading to a heap-based buffer overflow.

The reported CVSSv2 base vector for this vulnerability was: (AV:L/AC:M/Au:N /C:P/I:P/A:P). The access vector was considered local since the integer overflow in this situation was through a utility function and not directly over the network. An attacker would have to have access to the server locally via an untrusted configuration file to exploit this. The access complexity was considered Medium, which according to the CVSSv2 guidelines indicates that the conditions would need to be somewhat specialized, such as a non-default configuration with mod_setenvif module enabled. Since local access would need to be through an .htacess file, authentication to HTTP Server would not be needed to access this vulnerability. The Confidentiality Impact was partial, since memory corruption vulnerabilities such as integer overflow can leak some information on how memory is laid out. The Integrity Impact is also Partial since remote code execution is technically feasible with heap-based buffer overflows, although since Apache HTTP Server employs distrustful decomposition the permissions of the exploited code would be limited. Finally, the Availability is also Partial since any memory corruption vulnerability can result in a segmentation fault on the server process, killing the process.

This example demonstrates the many different dimensions a vulnerability can be measured on. If we used the weighting scheme of the CVSSv2, then the measurement would be computed as 6.8 out of 10.

8.3.3 Criticisms of the CVSS

Given its widespread adoption into databases such as the NVD, researchers [25] have raised some concerns. Common criticisms of the CVSS include:

 High CVSS scores have not historically aligned with the availability of exploits [5].

 Subjectivity of the levels. However, the CVSSv2 specification does provide many historical examples on which to base one’s decision [4].

 Reporters of vulnerabilities are not always those most familiar with the vulnerabilities.

Another concern we raise here of the CVSS scoring system is the practice of using numerical weights. The weighted average of the CVSSv2 does not appear to cite rigorous research, thus making the weighting number arbitrary. Furthermore, vulnerability severity is a multi-dimensional concept, so distilling that complexity into a single number does not yield useful results. Thus, our recommendation for the CVSS is to use the vector labeling to compare two vulnerabilities in a pair-wise fashion. This makes analysis of CVSS data more complex, but closer to the original meaning of vulnerability severity.

8.4 Method of Collecting and Analyzing Vulnerability Data

Vulnerability data can provide us with a rich history of some of the nastiest, most insidious bugs we’ve missed. A post-release vulnerability to a large open source product represents a list of potential mistakes the team made along the way. Mining this vulnerability data can provide some valuable insights into how the bugs were missed, found, and then fixed.

In this section, we will be demonstrating how to aggregate and process vulnerability data for useful empirical analysis. We will be using the Apache HTTP server as an example.

8.4.1 Step 1. Trace Reported Vulnerabilities Back to Fixes

To analyze vulnerabilities in aggregate, we need to know the precise version control changes that fixed the vulnerability. With this fix, we will be able to see where vulnerabilities tend to reside, what source code issues were involved, and can then gather various metrics that tend to correlate with vulnerabilities.

While many software projects follow responsible disclosure practices and report their vulnerabilities, the data is not always consistent. In the case of vulnerability data, the situation is often urgent, handled outside of the usual defect tracking process, and kept a secret for a time until the fix is adequately disseminated. Thus, vulnerability data often requires extra manual treatment. Fortunately, vulnerabilities are also not so numerous that we can actually handle them individually.

In the case of the Apache HTTP Server, they provide a listing of vulnerabilities they have fixed. In each case, they have an entry in the NVD with a CVE identifier. In some cases, the HTTPD community has provided us with a link to the original fix in the source code. In other cases, we will need to do our own digging. Thus, tracing vulnerabilities back to their original fix in the version control system is often a manual process.

For a given vulnerability, many different pieces of information can be gathered as a part of this investigation:

 The NVD database stores confirmations from the vendors of the vulnerability. Often, these links do not tie directly into the fix commit, but can provide valuable information for the investigation.

 The version control logs, such as Git and Subversion, offer many ways of searching their archives. For example, searching the commit messages will sometimes reveal the CVE number or similar language. Limit the search in the Git and Subversion logs to the dates near where the CVE entry was created.

 Projects often maintain their own STATUS file or CHANGELOG file that records each major change. Use tools like Git Blame to examine what commit introduced a given change in the changelog, which sometimes leads to the fix commit.

 If the vulnerability description mentions a particular module or source code file, examine the history of that file around the dates the vulnerability was reported.

 Examine the logs from open source distributors such as Red Hat or Ubuntu. These companies have to maintain their own copies of commonly used open source systems and often keep their own patches to backport to their own versions. Their package management system will often have these patches as a part of source builds.

 If credit was given to an external person for finding this vulnerability, consider searching for that person’s online presence in a blog or other article to gather more technical details.

Ultimately, the outcome of this step is to provide the fix commit(s) for each vulnerability. With each fix commit, you will need to check if that commit was an upstream commit, or a backport. Understanding what versions are affected may impact your analysis later.

For example, the above vulnerability CVE-2011-3607 in the CVSS section mentioned the specific file that was affected by this vulnerability. Furthermore, the history of that file around when the vulnerability was fixed by the team shows a CHANGELOG entry that was modified in response to this vulnerability. Thus, the fix was easily found in the version control system.

8.4.2 Step 2. Aggregate Source Control Logs

Once we have all of our vulnerabilities tied to fixes in the version control system, we can begin to reconstruct the timeline of the software project using version control. Systems such as Git and Subversion provide rich history that allow us to understand who committed what changes and when. By aggregating the version control logs into a relational database, we can query the data for our metrics in later steps.

Using commands such as “git log –pretty” or “svn log –xml” allows us output information from the version control systems into text files. The output formats for version control systems are configurable, so we can make our outputs easily parsed by scripts. Using a scripting language like Ruby or Python, we can develop scripts that step through the version control logs and insert data into the database.

A typical schema for version control logs might involve three tables: Commits, CommitFilepaths, and Filepaths. The commits table will have the commit date, commit message, and author. From our prior step we also add a field called “VulnFix” to indicate that the commit was fixing a vulnerability. The Filepaths will be the table that stores the files and their paths and the CommitFilepaths table links the two tables together. For example, if one commit changed “util.c” and “log.c” and another commit changed just “util.c,” we would have two row in the Commits table, three rows in the CommitFilepaths table, and two rows in the Filepaths table.

If we are only looking at a limited period of time, at this point we must also take into account the fact that sometimes source code has no commits for a long time. A file might be untouched for months or years at a time. If so, be sure to have Filepath data that has no commits.

With that data and schema in place, we can now examine the history of a given file and determine if that file was later patched for a vulnerability. For example, a file patched for a vulnerability on January 20, 2011 would have had that vulnerability for some period prior to it and the team missed it. Thus, we can collect our metrics prior to January 20, 2011 and see if they correlate with the vulnerable files.

Regarding when a vulnerability was introducedrather than fixed, see the work by Meneely et al. [9] in Section 8.5.2.

8.4.3 Step 3A. Determine Vulnerability Coverage

Once we know what files were affected by vulnerabilities, we can now see what parts of the system were affected by them. Here are some relevant questions, keeping in mind our Gotchas from earlier in this chapter.

 What percentage of files were affected by at least one vulnerability? This number typically lies between 1% and 5% [6].

 What percentage of the subsystems was affected by vulnerabilities?

 What percentage of our developers have ever worked on a file with at least one vulnerability?

 Were there any files that had “bursts” of vulnerabilities fixed? What happened there?

 Were there any files that had been steadily, consistently fixed over time for a vulnerability? Why?

We note here that we are collecting coverage data, not necessarily vulnerability count data to keep in step with Gotcha #1 and Gotcha #2. Also, in the case of Apache HTTP Server, we cannot say that these are necessarily design flaws as those tend not to be tracked by project, as in Gotcha #3. We also recognize that files without vulnerabilities are not necessarily “invulnerable,” so we typically refer to them as “neutral” according to Gotcha #4.

Examining vulnerability coverage can provide an enormous benefit to understanding the security posture of a system. Entire sub-communities of developers can be very active in fixing vulnerabilities and end up learning from their mistakes, often leading to “bursts.” Or, some subsystems have a constant security concern (such as HTTPD’s “protocol.c” that parses untrusted network packet traffic), leading to more of a “steady trickle” effect.

Coverage data can also provide empirical insight into the system’s assets. An asset is an element of a running software system that has security consequences. A password table, for example, is an asset. The underlying file system for a web server is another asset. As more and more vulnerabilities are discovered, fixed, and aggregated, we can begin to see where the assets of the system tend to reside, and what elements of the architecture need restructuring.

8.4.4 Step 3C. Classify According to Engineering Mistake

Another consideration for using vulnerability fix data from Step 1 can be in classifying them by software engineering mistake. Developers are capable of making a wide variety of mistakes, and understanding the types of mistakes that you and your team tend to make can provide benefit. If you are collecting vulnerability data for your project, consider the following questions:

 Was this vulnerability the result of missing functionality or wrong functionality?

 Ignore refactoring in the fix

 Missing functionality involves adding a new method, new feature, or security-related check

 Wrong functionality means the low-level design was there, but implemented incorrectly

 How large was the source code fix for this vulnerability?

 Consider examining the Churn of the fix (number of lines added plus number of lines deleted in the patch)

 Large changes are difficult to inspect

 Was this vulnerability the result of poor input validation?

 If fixing the vulnerability involved reducing the input space in any way, then the answer is “yes”

 Input validation should be the first, but not only line of defense against vulnerabilities

 Was this vulnerability the result of lack of sanitization?

 If fixing the vulnerability involved altering the input to be safer (e.g., escaping characters), then the answer is “yes”

 Sanitization is another layer of defense in addition to input validation

 Was this vulnerability domain-specific?

 A domain-specific vulnerability is one that would not make any sense outside of the context of the current project. For example, a secretary having access to employee salaries in the HR system is a vulnerability entirely driven by the domain, not the technology. Examples of domain-independent vulnerabilities include buffer overflows, SQL injection, and cross-site scripting.

 Domain expertise may be needed beyond hiring penetration testers

 Was this vulnerability the result of incorrect configuration?

 If the fix involved altering a configuration file, such as upgrading a dependency or changing a parameter, then the answer is “yes”

 Configuration files should be inspected alongside source code files

 Was this vulnerability the result of incorrect exception handling?

 If the context of the fix was exception handling and the system did not react to an alternate subflow properly, then the answer is “yes”

 Exception handling is difficult to reproduce for testing, but often involves integrity and availability concerns

 Were any of these vulnerabilities entries in the CWE?

 Common Weakness Enumeration is a taxonomy of security-related mistakes

 Common vulnerabilities may mean some education of developers is needed

The above questions are based on our experience analyzing vulnerability fixes in large open source systems and can provide a basis for discussion. Finding large occurrences of vulnerabilities in some of the above questions can help find some of the weak spots in your process.

8.5 What Security Data has Told Us Thus Far

Analysis of security data has led to a variety of fascinating conclusions. The general methods outlined in this chapter have led to some interesting recent studies. We cover two lines of research here [69], but several other interesting empirical security studies have been conducted. For example:

 Neuhaus et al. [10] predicted vulnerable components in Mozilla based on import statements; Nguyen and Tran [11] did a similar study based on dependency graphs

 Manadhata and Wing [12], Manadhata et al. [13] and Howard et al. [14] provided measurements of the “attack surface” of a software system, which examines the “shape” of a system based on the avenues of attack (i.e., inputs and outputs)

 Scarfone and Mell [4] and Fruhwirth and Mannisto [2] provided an analysis on the robustness of the CVSS v2, examining flaws in the scoring process

 Bozorgi et al. [5] provided a security exploit classifier to predict whether a given vulnerability will be exploited

 DaCosta et al. [15] analyzed the security vulnerability likelihood of OpenSSH based on its call graph

 Cavusoglu et al. [16] analyzed the efficiency of vulnerability disclosure mechanisms

 Beres et al. [17] used prediction techniques and simulation to assess security processes

8.5.1 Vulnerabilities have Socio-Technical Elements

With each new developer to a software development team comes a greater challenge to manage the communication, coordination, and knowledge transfer amongst teammates. Lack of team cohesion, miscommunications, and misguided effort can lead to all kinds of problems, including security vulnerabilities. In this research, the authors focus on examining the statistical relationships between development team structure and security vulnerabilities. The statistical relationships demonstrated in this research provide us with (a) predictive models for finding security vulnerabilities in software prior to its release; and (b) insight into how effective software development teams are organized.

In several studies [68], Meneely et al. applied social network analysis techniques to open source software products and discovered a consistent statistical association between metrics measuring developer activity and post-release security vulnerabilities. In three case studies of Linux, PHP, and Wireshark, the authors analyzed four metrics related to Linus’ Law and unfocused contributions. An empirical analysis of the data demonstrates the following observations:

(a) source code files changed by multiple, otherwise-separated clusters of developers are more likely to be vulnerable than changed by a single cluster;

(b) files are likely to be vulnerable when changed by many developers who have made many changes to other files (i.e., unfocused contributions); and

(c) a Bayesian network predictive model can be used on one project by training it on other projects, possibly indicating the existence of a general predictive model.

For the clustering analysis (a), we used the metric Edge Betweenness when computed over a developer network. A developer network is a graph where developers are nodes that are connected to each other when they make commits to the same source code file in the same release cycle. Developers have been known to cluster [18, 19] in open source projects to form sub-communities within a larger project. Edge Betweenness is one metric from social network analysis that can be used to identify connections between larger clusters. Edge Betweenness is based on the proportion of shortest paths that include a given edge (e.g., highways have a higher betweenness than residential streets because they connect traffic from neighborhood “clusters”). In that analysis, “highway” files with high betweenness were more likely to have vulnerabilities.

For the unfocused contributions result (b), we used the node betweenness metric on a contribution network. We formed a bipartite graph where developers and files are nodes, and developers are connected to files when a developer committed to a file. Files with a high betweenness in a contribution network, then, are ones where developers were working on many other files at the time. When a file was worked on by many developers, the contribution it had was “unfocused” (note: the developers themselves were not necessarily unfocused, but the aggregate contribution was unfocused because the developers were working on many other files). Files with unfocused contributions also had a higher likelihood of having a vulnerability.

Overall, while the results are statistically significant, the individual correlations indicate that developer activity metrics do not account for all vulnerable files. From a prediction standpoint, models are likely to perform best in the presence of metrics that capture other aspects of software products and processes. However, practitioners can use these observations about developer activity to prioritize security fortification efforts or to consider organizational changes among developers.

8.5.2 Vulnerabilities have Long, Complex Histories

Even in open source projects, vulnerable source code can remain unnoticed for years. In a recent study, we traced 68 vulnerabilities in the Apache HTTP server back to the version control commits that contributed the vulnerable code originally [9]. Meneely et al. manually found 124 Vulnerability-Contributing Commits (VCCs), spanning 17 years. In this exploratory study, they analyzed these VCCs quantitatively and qualitatively with the over-arching question: “What could developers have looked for to identify security concerns in this commit?”

The methodology, which was adapted from related work [20, 21], can be summarized as follows:

1. Identify the fix commit(s) of the vulnerability;

2. From the fix, write an ad hoc detection script to identify the coding mistake for the given vulnerability automatically;

3. Use “git bisect” to binary search the commit history for the VCC; and

4. Inspect the potential VCC, revising the detection script and rerunning bisect as needed.

Specifically, the authors examined the size of the commit via code Churn Metrics, the amount developers overwrite each others’ code via interactive Churn Metrics, exposure time between VCC and fix, and dissemination of the VCC to the development community via release notes and voting mechanisms. Results from the exploratory study of Apache HTTPD [9] show that:

 VCCs are big commits: VCCs average 608.5 lines of churn (vs. 42.2 non-VCCs), or 55% VCC relative churn (vs. 23.1% for non-VCCs).

 VCCs are exposed for a long time.The median for Days from VCC to Fix was 853 days; Median for Commits between VCC and Fix was 48 commits, indicating significant missed opportunities to find the vulnerability. Figure 8.2 shows this timeline.

f08-02-9780124115194
Figure 8.2 Timeline for source code files in HTTPD with vulnerabilities.

 Few VCCs are the original baseline import.13.5% of VCCs were original source code imports.

 Few VCCs were “known offender” files. 26.6% of VCCs were to files that had already been fixed for a prior vulnerability, covering only 20% of the total vulnerabilities.

8.6 Summary

Just as security is a difficult challenge for developers to face in their development, so it also poses a challenge to development data analysts. Security is not a concept that is easily quantified holistically, rather, security is an overarching concern with many facets to be measured. Security data is very useful, but has limitations such as a lack of design flaw tracking, or security being negatively defined. History has shown us that actionable metrics can be extracted from project histories that are correlated with vulnerabilities, but with vulnerabilities being small, rare, and severe, they are difficult to predict. In the age of responsible disclosure we have an enormous outpouring of vulnerability histories to learn from, yet we must also responsibly analyze those histories, knowing our limitations, to better understand how security can be engineered to remain secure.

References

[1] Krsual IV. Software vulnerability analysis. PhD Dissertation; 1998.

[2] Fruhwirth C, Mannisto T. Improving CVSS-based vulnerability prioritization and response with context information. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, Washington, DC, USA. 2009:535–544.

[3] Houmb SH, Franqueira VNL, Engum EA. Quantifying security risk level from CVSS estimates of frequency and impact. J Syst Softw. 2010;83(9):1622–1634.

[4] Scarfone K, Mell P. An analysis of CVSS version 2 vulnerability scoring. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, Washington, DC, USA. 2009:516–525.

[5] Bozorgi M, Saul LK, Savage S, Voelker GM. Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, USA. 2010:105–114.

[6] Shin Y, Meneely A, Williams L, Osborne JA. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng. 2011;37(6):772–787.

[7] Meneely A, Williams L. Strengthening the empirical analysis of the relationship between Linus’ Law and software security. In: Empirical software engineering and measurement, Bolzano-Bozen, Italy. 2010:1–10.

[8] Meneely A, Williams L. Secure open source collaboration: an empirical study of Linus’ Law. In: International conference on computer and communications security (CCS), Chicago, Illinois, USA. 2009:453–462.

[9] Meneely A, Srinivasan H, Musa A, Tejeda AR, Mokary M, Spates B. When a patch goes bad: exploring the properties of vulnerability-contributing commits. In: Proceedings of the 2013 ACM-IEEE international symposium on empirical software engineering and measurement. 2013:65–74.

[10] Neuhaus S, Zimmermann T, Holler C, Zeller A. Predicting vulnerable software components. In: Computer and communications security, New York, NY, USA. 2007:529–540.

[11] Nguyen VH, Tran LMS. Predicting vulnerable software components with dependency graphs. In: Proceedings of the 6th international workshop on security measurements and metrics. 2010:3:1–3:8.

[12] Manadhata PK, Wing JM. An attack surface metric. IEEE Trans Softw Eng. 2011;37(3):371–386.

[13] Manadhata P, Wing J, Flynn M, McQueen M. Measuring the attack surfaces of two FTP daemons. In: Proceedings of the 2nd ACM workshop on quality of protection, New York, NY, USA. 2006:3–10.

[14] Howard M, Pincus J, Wing JM. Measuring relative attack surfaces. In: Lee DT, Shieh SP, Tygar JD, eds. Computer security in the 21st century. New York: Springer US; 2005.

[15] DaCosta D, Dahn C, Mancoridis S, Prevelakis V. Characterizing the ‘security vulnerability likelihood’ of software functions. In: Proceedings of the international conference on software maintenance, ICSM 2003. 2003:266–274.

[16] Cavusoglu H, Cavusoglu H, Raghunathan S. Efficiency of vulnerability disclosure mechanisms to disseminate vulnerability knowledge. IEEE Trans Softw Eng. 2007;33(3):171–185.

[17] Beres Y, Mont MC, Griffin J, Shiu S. Using security metrics coupled with predictive modeling and simulation to assess security processes. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement. 2009:564–573.

[18] Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P. Latent social structure in open source projects. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering (FSE), Atlanta, Georgia. 2008:24–35.

[19] Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A. Mining email social networks in postgres. In: International workshop on mining software repositories, Shanghai, China. 2006:185–186.

[20] Kim S, Zimmermann T, Pan K, Whitehead EJ. Automatic identification of bug-introducing changes. In: 21st IEEE/ACM international conference on automated software engineering, ASE ’06. 2006:81–90.

[21] Williams C, Spacco J. Szz revisited: verifying when changes induce fixes. In: Proceedings of the 2008 workshop on defects in large software systems, New York, NY, USA. 2008:32–36.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset