Chapter 7. CERT-RMM Perspectives

This chapter consists of short essays written by invited contributing authors. The essays discuss how CERT-RMM can be applied for different purposes and, in one case, how CERT-RMM’s “tires” were kicked to see how it could be used to help improve an existing program.

Unfortunately, there are a lot of stories that cannot be told in this chapter. This is because when organizations use CERT-RMM for benchmarking, appraisal, or improvement purposes, their experiences often tell a story that can expose their weaknesses or conversely make them a target. For example, an organization that reports a low level of capability in incident management as a result of using the model may be exposing a significant weakness in its security posture; an organization performing at a high level of capability in this area may be inviting unwanted “tests” of its resilience.

For this reason, the authors have decided not to develop and publish case studies on CERT-RMM adoption or use. While this denies potential users informative material to help their adoption process, it is also indicative of the level of valuable information attained by users of the model for helping them improve and reinforce their security, continuity, and IT operations management processes.

Using CERT-RMM in the Utility Sector

Reliability and Resilience

Every industry has its own vocabulary, and the electric power industry is no different. These unique vocabularies facilitate communication between industry members but can sometimes act as roadblocks when translating ideas between groups outside of a particular industry. The usage of the words reliability and resilience in the electric power industry present a potential roadblock for that sector in embracing a model for improving operational resilience such as CERT-RMM. The term reliability has a strong definition reinforced by regulatory structure. The term resilience, on the other hand, is not one that has significant resonance in this domain, although the generic concept is gaining some traction.

Effective transition of CERT-RMM to the stakeholders in this domain requires an understanding of the relationship between the concepts of reliability and resilience. This requires determining whether the concepts of reliability and resilience are equivalent and, if not, defining the relationship between them. Perhaps the best place to start is to look at how the terms are defined by their stakeholders. The North American Electric Reliability Corporation2 (NERC) defines reliability in terms of adequacy and security:

adequacy—the ability of the bulk power system to supply the aggregate electrical demand and energy requirements of customers at all times, taking into account scheduled and reasonably expected unscheduled outages of system elements

security—the ability of the bulk power system to withstand sudden disturbances such as electric short circuits or unanticipated loss of system elements from credible contingencies

CERT-RMM defines resilience as “the emergent property of an organization that can continue to carry out its mission after disruption that does not exceed its operational limit.”3

The definitions contain significant conceptual overlap; both require the protection and sustainment of a mission in the face of disruption. The definition of reliability perhaps goes slightly beyond the concept of resilience as defined by CERT-RMM, identifying the capacity for the system to provide electrical power based on demand. There is certainly enough conceptual overlap between the definitions that we do not think there would be much argument with the idea that a resilient system is more reliable than a system that is not resilient, nor with the idea that a system cannot be reliable unless it is resilient. It is probably not correct, however, to say that a resilient system is a reliable system, in the terms used by the power industry.

Resilience, however, is fundamental to reliability. This leads us to the conclusion that resilient operations are a necessity for the reliable operation of the electrical infrastructure. The bulk electric system will not be reliable if all of its supporting operations are not resilient. The focus of CERT-RMM, of course, is on understanding and improving operational resilience, and therefore it is potentially an important tool for a utility looking to ensure reliability of operations.

Regulation and Peer Pressure

The electric grid is a fundamental enabler of our nation’s economy and is recognized as an element of the nation’s critical infrastructure. Increasingly, our nation’s and the world’s electric infrastructure is becoming reliant upon distributed intelligence and communications capabilities as the grid continues to grow smarter. With these dependencies comes recognition of the need to protect the electrical infrastructure from not only physical but also cyber attack.

The United States faces a number of structural challenges to ensuring the adequate protection of its electrical infrastructure. One of the more significant challenges is that no single organization controls the operation of the electrical grid. The sector is made up of a number of public and private organizations, all of which must coordinate with the others to ensure the successful delivery of electric power to the customer. While the power system has been engineered to be reliable with outage events that are few and far between, the system is only as reliable as its weakest links. The failure in any one organization under the right circumstances can lead quickly to much larger impacts, as the blackout of August 2003 so kindly reminded us.

On January 17, 2008, the Federal Energy Regulatory Commission (FERC) took a number of steps that it hoped would increase the level of the cyber security and therefore the reliability in the Bulk Electric System (BES).4 FERC mandated compliance with eight Critical Infrastructure Protection (CIP) Reliability Standards developed by NERC. The fines for non-compliance with these standards can exceed over $1 million a day.

While most agree that the NERC CIP standards are a step in the right direction, significant debate continues as to whether standards alone are sufficient for ensuring the reliability of the grid. Requiring adherence to a standard can certainly raise the bar of practice within the sector, but ultimately all that can be measured is an organization’s adherence to the standard. It also often shifts an organization’s focus from managing its cyber security risks to managing compliance and the associated documentation. This focus can lead to a misapplication of limited organizational resources. Moreover, organizations may find themselves investing in technologies and other resources that help them maintain compliance instead of focusing on things that will help them manage their cyber security risks. This misdirection of resources can also incentivize organizations to take steps to avoid compliance requirements. Such behavior has already been seen with the NERC CIP standards, as some utilities have gone so far as to declare they have no critical cyber assets.

Mandatory standards are often necessary to raise the bar of a community, but caution is in order to avoid a “set it and forget it” mentality, which in many cases may be even more dangerous. Furthermore, the issues of compliance and the level of effort required to support audit are receiving increasing criticism.5

Utilities like nothing better than to simultaneously impress rate payers, utility commissions, shareholders, and regulators. Establishment of a common set of metrics for the electric utility industry would provide a means to do just that while also demonstrating satisfaction of the key criteria for NERC. CERT-RMM provides a metrics-driven method through which the electric utility community can view adequacy and reliability. For example, consider a simple subset of CERT-RMM process areas that address how an organization manages operational risk. These process areas might include

• Risk Management (RISK), to determine how operational risks are being identified, analyzed, and mitigated, and how the risk process is being managed

• Controls Management (CTRL), to determine how operational controls to protect and sustain infrastructure are being identified, implemented, managed, and continuously monitored

• Service Continuity (SC), to determine how service continuity is planned, tested, implemented, and improved as the infrastructure changes and the risk environment changes

Not only can determining the degree to which organizations that share the common bond of providing reliable electric service have matured their processes in these areas inform us about the strength of the reliability and resilience of the electric system, but individual operators would be able to do compensatory planning for interconnections with organizations that do not exhibit high levels of capability in these core processes. In other words, the “metric” can inform an organization about which organizations are the weaker links, enabling them to plan for addressing these weaknesses when necessary. In the end, the electric generation and transmission system benefits as a whole when all organizations raise their capabilities.

At the end of the day, what we all care about is that utilities are capable of managing their cyber security risks. Instead of providing a list of proscriptive controls with which a utility must comply (“one size fits all”), utilities would have the opportunity to demonstrate cyber security risk management capabilities. Use of CERT-RMM metrics would allow organizations to expend their energies on productive security measures and potentially obviate the need to spend so much effort on compliance and auditing.

Grid Modernization and Transformation

The energy delivery infrastructure in the United States and in much of the rest of the world is at the beginning of a significant modernization effort that will transform it from an aging analog technology base to a “smart grid” in which there is pervasive use of digital technology. This will enable features such as demand-side management, distributed generation, real-time pricing, and many others. As the infrastructure becomes increasingly dependent upon digital technologies for monitoring and control, requirements must likewise increase to ensure the resilient operations of the digital technology infrastructure.

A lot of work has been done to address the security, IT operations, and continuity issues for the digital technologies (technology assets, in CERT-RMM–speak) that are being deployed in the name of grid modernization, and it is recognized that much work remains to be done. In fact, both authors of this essay are involved in a number of efforts to help ensure the security and sustainability of technologies that will implement the smart grid. That being said, the ability of an organization to ensure the security and sustainability of these systems over time and with a reliable degree of predictability, particularly during times of stress, has not received sufficient attention at the community level, in the authors’ opinion.

CERT-RMM was built to help organizations improve their capability for managing operational resilience and therefore reliability. It provides a unique platform for an organization to identify, understand, and improve the organizational processes that support its operational resilience. By using CERT-RMM’s guidance for measuring current competencies across the entire scope of their resilience processes (security, IT operations, and business continuity), electric service organizations can understand their current state and identify a baseline for future evaluations. They can also use this guidance to set improvement targets and to establish plans for closing any gaps it identifies.

Grid modernization presents a unique opportunity for the electricity delivery infrastructure to embrace models such as CERT-RMM. Fully instantiating a process improvement model typically runs into organizational barriers that include issues such as the inability of the current organizational structure to support the necessary changes and the lack of funding resources. However, grid modernization is bringing significant changes to the organizational structures traditionally found in the utility space. We are seeing this today as organizations adapt and evolve to embrace and manage the new opportunities being created. The inclusion of process improvement efforts for resilient operations at the start of a project reduces the barriers to entry. The ability of the effort to be seen as an organizational investment and not as a costly commitment provides an opportunity to leverage support from senior managers. Smart grid projects are likely to lead to significant organizational change and will require support from senior managers. If the organization includes process improvement from the outset, senior managers’ support can ideally be leveraged as the organization changes.

Grid Modernization

The transformation to a smart grid will require changes not only to the structure of the electric power grid and to operational processes within electric utilities but also to the relationships between and among the electricity providers, consumers, and regulators. This grid modernization provides us with an excellent opportunity to examine, enhance, and perhaps enrich all of these relationships.

Addressing Resilience as a Key Aspect of Software Assurance Throughout the Software Life Cycle

Software assurance is defined as “the level of confidence that software is free from vulnerabilities, either intentionally designed into the software or accidentally inserted at any time during its life cycle, and that the software functions in the intended manner” [CNSS 2009].

In CERT’s work to develop a Master of Software Assurance curriculum, this definition is extended as follows:

Application of technologies and processes to achieve a required level of confidence that software systems and services function in the intended manner, are free from accidental or intentional vulnerabilities, provide security capabilities appropriate to the threat environment, and recover from intrusions and failures [Mead 2010].

Operational resilience is defined as “the organization’s ability to adapt to changing operational risk environments,” and operational resilience management is defined as “the process by which an organization designs, develops, implements, and manages the protection and sustainment of high-value services, related business processes, and associated assets such as people, information, technology, and facilities” [Caralli 2010].

Managing operational resilience is the focus of CERT-RMM. In comparing the definitions and intent, the connections and overlaps between software assurance and resilience become obvious when developing, acquiring, and operating software applications and software systems, two of the high-value technology assets addressed in CERT-RMM. Software cannot meet assurance requirements (security, safety, and reliability) if it is not resilient in the face of stress and disruption, including attack. Resilient software functions in the intended manner in the face of adversity, resists threats, provides features that support critical services when disrupted (sometimes in degraded mode), recovers from intrusions and failures, and is able to be restored to its pre-disruption state6 in a reasonable period of time.7 These are all quality attributes of assured and secure software.

Resilient software and systems do not become survivable and resistant to threat (that is, assured) without an organizational commitment to address resilience as part of assurance throughout development, acquisition, and operations life-cycle phases. These assets must be specifically designed, developed, and acquired with consideration of the types of threats they will face, the operating conditions and changing risk environment in which they will operate, and the priority and sustainment needs of the services they support. Typical software and system development and acquisition life cycles understandably focus on identifying and satisfying functional requirements; that is, most of the effort goes into defining what the software or system must do to fulfill its use case, purpose, objectives, and, ultimately, its mission. However, quality attributes such as security, sustainability, availability, performance, and reliability can in the long run be equally important to the usability and longevity of software and system assets and require considerable resources to address in the operations phase if they are not considered early in the development and acquisition life cycles.

Unfortunately, requirements for quality attributes such as assurance and resilience can be harder to define, design, and implement, and in many cases they require significant business impact and cost analysis up front to ensure that they are worth investing in. This leads to a tendency to ignore these requirements early in the development and acquisition life cycles and to bolt on solutions to address them in later life-cycle phases, when they are more costly, less effective, and typically harder to manage and sustain in an operational mode. The failure to consider quality attributes is a primary reason why software and systems in operation are subject to high levels of operational risk resulting from failed technology and processes. In essence, ignoring quality attributes creates additional security, continuity, and other related operational risks that must be managed in the operations phase of the life cycle, typically at higher cost, lower efficacy, and potentially increased consequences to the organization. In some cases, these problems may be so significant as to shorten the expected life of the software and systems, diminish the organization’s confidence in their ability to perform, and result in cumulatively lower than expected return on investment.

As an element of software assurance, developing and acquiring resilient software and systems requires a dedicated process that encompasses the asset’s life cycle. As described in CERT-RMM’s Resilient Technical Solution Engineering (RTSE) process area, the process is as follows:

• Establish a plan for addressing resilience as part of the organization’s (or supplier’s) regular development life cycle and integrate the plan into the organization’s corresponding development process. Plan development and execution include identifying and mitigating risks to the success of the project.

• Identify practice-based guidelines, such as threat analysis and modeling, that apply to all phases, as well as those that apply to a specific life-cycle phase.

Elicit, identify, develop, and validate assurance and resilience requirements (using methods for representing attacker and defender perspectives, for example). Such processes, methods, and tools are performed alongside similar processes for functional requirements.

• Use architectures as the basis for design that reflect a resilience and assurance focus, including security, sustainment, and operations controls.

• Develop assured and resilient software and systems through processes that include secure coding of software, software defect detection and removal, and the development of resilience and assurance controls based on design specifications.

• Test assurance and resilience controls for software and systems and refer issues back to the design and development cycle for resolution.

• Conduct reviews throughout the development life cycle to ensure that resilience (as one aspect of assurance) is kept in the forefront and given adequate attention and consideration.

• Perform system-specific continuity planning and integrate related service continuity plans to ensure that software, systems, hardware, networks, telecommunications, and other technical assets that depend on one another are sustainable.

• Perform a post-implementation review of deployed systems to ensure that resilience (as well as assurance) requirements are being satisfied as intended.

• Monitor software and systems in operation to determine if there is variability that could indicate the effects of threats or vulnerabilities and to ensure that controls are functioning properly.

• Implement configuration management and change control processes to ensure that software and systems are kept up-to-date to address newly discovered vulnerabilities and weaknesses (particularly in vendor-acquired products and components) and to prevent the intentional or inadvertent introduction of malicious code and other exploitable vulnerabilities.

In addition to RTSE, there are a number of goals and practices in other CERT-RMM process areas that organizations should consider when developing and acquiring software and systems that need to meet assurance and resilience requirements. These are as follows:

• Resilience requirements for software and system technology assets in operation, including those that may influence quality attribute requirements in the development process, are developed and managed in the Resilience Requirements Development (RRD) and Resilience Requirements Management (RRM) process areas, respectively.

• Identifying and adding newly developed and acquired software and system assets to the organization’s asset inventory are addressed in the Asset Definition and Management (ADM) process area.

The management of resilience for technology assets as a whole, particularly for deployed, operational assets, is addressed in the Technology Management (TM) process area. This includes, for example, asset fail-over, backup, recovery, and restoration.

• Acquiring software and systems from external entities and ensuring that such assets meet their resilience requirements throughout the asset life cycle are addressed in the External Dependencies Management (EXD) process area. That said, RTSE-specific goals and practices should be used to aid in evaluating and selecting external entities that are developing software and systems (EXD:SG3.SP3), formalizing relationships with such external entities (EXD:SG3.SP4), and managing an external entity’s performance when developing software and systems (EXD:SG4).

• Monitoring for events, incidents, and vulnerabilities that may affect software and systems in operation is addressed in the Monitoring (MON) process area.

• Service continuity plans are identified and created in the Service Continuity (SC) process area. These plans may be inclusive of software and systems that support the services for which planning is performed.

In terms of other model connections, RTSE is strongly influenced by two SEI Capability Maturity Model Integration (CMMI) process areas [CMMI Product Team 2006]:

• Requirements Development (RD), the purpose of which is to produce and analyze customer requirements and software and system product and product component requirements.

• Technical Solution (TS), the purpose of which is to design, develop, and implement solutions to software and system requirements. (Solutions, designs, and implementations encompass software and system products, product components, and product-related life-cycle processes, either singly or in combination as appropriate.)

RTSE is also strongly influenced by CERT’s ongoing research in software assurance and the work of other leaders in the software assurance and software security communities. There is a growing number of reputable sources to consider when identifying and selecting candidate guidelines for the development and acquisition of resilient software and systems across the life cycle, particularly for software security and assurance, such as

• Building Security In Maturity Model (BSIMM2) v2.0; http://bsimm2.com/

• Open Web Applications Security Project (OWASP) Software Assurance Maturity Model (SAMM) v1.0; www.owasp.org/index.php/Category:Software_Assurance_Maturity_Model

Microsoft’s Security Development Lifecycle, Version 4.1; www.microsoft.com/security/sdl/

• Department of Homeland Security Assurance for CMMI Process Reference Model; https://buildsecurityin.us-cert.gov/swa/procwg.html

The Department of Homeland Security Assurance for CMMI Process Reference Model (PRM) is the result of synthesizing a variety of existing software security life cycles, practices, and research into a set of practices for assurance that can be embedded within a diverse set of existing development approaches and processes. The practices identify recommended activities (the “what”) for enhancing a product or service that relies on software and systems. Organizations can use the model for a self-assessment or to aid in developing specifications for external entities that may be providing a technology solution.

Organizations using CERT-RMM can use the Assurance for CMMI PRM as a framework for enhancing their resilience technology solutions. Through a simple gap analysis, a project or organization can identify and prioritize areas of risk in the engineering of resilient technical solutions that PRM practices can help mitigate.

There are a variety of ways (the “how”) organizations can implement the assurance practices identified in the Assurance for CMMI PRM. The Building Security In Maturity Model (BSIMM2) v2.0, Open Web Applications Security Project (OWASP) Software Assurance Maturity Model (SAMM) v1.0, and Microsoft’s Security Development Lifecycle, Version 4.1, are examples of how the practices can be implemented.

For example, an organization with a weakness associated with the Assurance for CMMI practice “Establish and maintain the strategic assurance training needs of the organization” could consider using detailed implementation approaches in supporting codes of practice, such as

• the Microsoft SDL training guidelines on basic concepts, common baseline, and custom training

• OpenSAMM practices for technical security awareness training, role-specific guidance, and comprehensive security training and certifications

• BSIMM2 training-related practices for creating the software security satellite, making customized, role-based training available on demand, and providing recognition for skills and career path progression

Similarly, an organization implementing secure coding practices may identify an improvement opportunity associated with the Assurance for CMMI practice “Identify deviations from assurance coding standards.” Detailed practices on maturing the use of static analysis tools during coding are provided in

• the Microsoft SDL practices for basic code scanning tools, use of static analysis tools, and in-house security tool customization

OpenSAMM practices for creating review checklists from known security requirements, using automated code analysis tools, and customizing code analysis for application-specific concerns

• BSIMM2 practices for providing easily accessible security standards and (compliance-driven) requirements, enforcing standards through mandatory automated code review and centralized reporting, and building an automated code review factory with tailored rules

Organizations are continuing to gain a better understanding of techniques that successfully mitigate resilience risks related to the increased use of technology solutions (software and systems) to support organizational missions. As specific practices are identified to solve emerging challenges, the CERT-RMM RTSE process area and the Assurance for CMMI PRM can provide a framework for connecting the resilience management and technology development practices of an organization.8

Raising the Bar on Business Resilience

Introduction

As part of our journey toward improving Lockheed Martin Corporation’s resilience to disruptive events, small or large, natural or man-made, intentional or accidental, we have been in search of innovative techniques that we could add to our existing proven collection of tools in our “toolbox.” This journey has led us to examine a variety of new and not-so-new methods from such domains as disaster recovery, business continuity, crisis management, and related preparedness planning arenas. One such tool that we discovered, studied, tested, and have since added to our resilience toolbox is the CERT Resilience Management Model (CERT-RMM). This is a short description of our successful encounter with CERT-RMM.

Our Definition of Business Resilience

Enterprises, large or small, public or private, civilian or federal, continue to invest in a variety of preparedness planning activities, including IT disaster recovery, business continuity, pandemic planning, crisis management, and emergency management. Prior to encountering CERT-RMM, we had determined that one of the changes that we had to institutionalize across the enterprise was to approach all preparedness activities in an integrated fashion, as opposed to independent pursuits. We refer to this integrated approach to all these aspects of preparedness as “business resilience.”

We define business resilience management (BRM) as the practice of planning, developing, executing, and governing activities to ensure that an enterprise

• identifies and mitigates operational risks that can lead to business disruptions before they occur

• prepares for and responds to disruptive events (natural or man-made, accidental or intentional) in a manner that demonstrates command and control of incident response

• recovers and restores mission-critical business operations following a disaster within acceptable time frames

For us, BRM comprises such components as business continuity, IT disaster recovery, crisis management, emergency management, and pandemic planning.

Disruptive events may include fire, flood, earthquakes, severe weather, power outages, IT failures, data corruption, strikes or other labor actions, terrorist attacks, civil unrest, and chemical, biological, and nuclear hazards. Incidents requiring crisis management may include employee kidnappings, workplace violence, minor weather events, and business crises (for example, a product failure or the loss of a key customer, trading partner, or service provider).

The Need for a Management/Maturity Model

Given Lockheed Martin’s long history of process orientation and its extensive experience with CMMI, it was a natural step for us to identify the need for a management model that could be used to guide our journey in raising the bar on business resilience.

Operational management or maturity models are structured collections of elements that describe certain aspects of maturity within an organization. The principle behind maturity models is that an organization develops and adopts new processes and practices from which it learns, optimizes, and moves to the next level. The concept was proven with the inception of CMM and CMMI, created by Carnegie Mellon University’s Software Engineering Institute originally for government contractors to help evaluate and improve processes related to software engineering.

For our business resilience proposes, a maturity model would serve several purposes:

• To assess current level of competencies—A maturity model would serve as a common “ruler” across the enterprise to gauge the current posture of individual business entities and/or the enterprise as a whole through such techniques as self-assessment, assessment by an internal appraiser, assessment by an independent third party, and audits.

• To guide future direction and investments—A maturity model would facilitate business-centered objective setting by encouraging such questions as: Where do we want to be? and How good do we need to get? It would assist in determining the investment required to reach the next and/or desired state. This is a critical step, since it is not necessary for all organizations to reach the highest levels of maturity.

• To measure progress toward the desired goal—A maturity model would act as a program management instrument to ensure that investments are turning into improved capabilities.

• To ensure that plans and processes evolve to stay at the desired level—Once the desired level is reached, a maturity model facilitates implementation of necessary operation and maintenance activities to ensure that the organization stays at the desired level.

Selecting a Model

Following an environmental scan of the relevant fields and industries, we identified several existing maturity models that could potentially be applicable to or had been specifically designed for business resilience applications. Following our well-established and implemented systems engineering practices, we then performed a detailed trade study in which we performed a comparative analysis of the candidate models.

Our trade study considered such criteria as

• applicability to Lockheed Martin’s business model

• completeness and comprehensiveness of the framework

• expandability beyond its current scope

ease of customization

• integrated approach to components of business resilience

• applicability to other operations processes than business resilience

• openness of the framework

• consistency with national and/or international standards

• availability of a variety of assessment methodologies

• availability of full documentation

• availability of training material

• addressing the complete range of assets

• addressing governance and management structures

• familiarity of the model to other management/maturity models being used for other purposes

• usage by other industries of interest

Our analysis identified CERT-RMM as the most promising model for use within our enterprise. These are some of the characteristics that set CERT-RMM apart from others:

• It promotes the convergence of information security, business continuity, and IT operations activities as a means to actively direct, control, and manage operational resilience and risk.

• It comprehensively models an enterprise from the perspective of the interrelationships among its mission, services and products, business processes, and assets, capturing the needs of large enterprises like ours.

• It has a strong risk management approach where operational risks resulting in asset disruption are well captured.

• It considers risks associated with the protection of assets (through information security techniques) and risks associated with the sustainment of assets (through disaster recovery, business continuity, pandemic planning, and crisis management techniques).

• It treats resilience activities (e.g., disaster recovery, business continuity, pandemic planning, and crisis management planning) as yet another class of business processes intended to manage operational risks.

• It focuses on measuring and institutionalizing resilience processes.

Its developers captured best practices from the financial industry, which is known for its high-quality and effective resilience practices.

Kicking the Tires

As a large, distributed, complex, and process-matured enterprise, it is critical for Lockheed Martin to ensure that any potential new process-oriented management model is truly applicable to our business entities. In order to do so, we performed a trial during which we applied a subset of CERT-RMM process areas to disaster recovery operational processes and command media at one of our business entities.

The overall goal of the pilot was to evaluate the applicability and utility of CERT-RMM for use at Lockheed Martin with the specific goal of answering such questions as these:

• Does CERT-RMM align well with Lockheed Martin’s business model and operational practices?

• How can lessons learned from use of CERT-RMM in an organization that has attained CMMI Maturity Level 5 be understood?

• Would the use of CERT-RMM benefit attainment and maintenance of business continuity and disaster recovery readiness posture?

• What appraisal methods are efficient and economical enough for use along with CERT-RMM?

• Would a SCAMPI C appraisal using CERT-RMM be useful for evaluating business continuity and disaster recovery readiness?

• How well can CERT-RMM identify disaster recovery planning implementation gaps?

As part of the preparation for the pilot, we conducted a week-long CERT-RMM training, which was delivered by members of the CERT-RMM development team at CERT. This provided an opportunity to both train the business resilience subject matter experts and the pilot appraisal team about CERT-RMM.

At the time of our pilot, CERT-RMM did not define or specify an accompanying assessment methodology, but it had the flexibility that it could be deployed along with a variety of assessment methodologies. For the subject pilot, the traditional CMM SCAMPI C assessment methodology was used.

The pilot was very successful in the sense that it clearly demonstrated that the model and the appraisal process were successful in revealing insights about both the disaster recovery practices at the piloted business unit as well as the applicable local and corporate command media.

Going Forward

We have identified a variety of ways in which CERT-RMM will have a role in our journey toward the goal of raising the bar on Lockheed Martin’s resilience to disruptive events and continually reducing the associated operational risks to our employees, our customers, and our assets. In particular, we envision CERT-RMM

• contributing to our common business resilience taxonomy and nomenclature

• serving as a contributing reference model for our integrated business resilience framework

serving as a maturity model to gauge the preparedness posture of individual business entities and/or the enterprise as a whole in the areas of disaster recovery and business continuity

• serving as a mechanism to reveal insights about existing policies and guidelines

• serving as a guiding tool in the developing of new command media

• serving as a means to communicate key harmonization and convergence across business resilience and information security

In summary, we have found that CERT-RMM is a comprehensive and flexible framework that, in conjunction with project management methodologies, could provide an efficient and economical way to assist in improving the principles and practices associated with the protection of an enterprise.

Measuring Operational Resilience Using CERT-RMM

How might business leaders go about answering these two key questions:

• How resilient is my organization?

• Have our processes made us more resilient?

And to inform these, answering this question:

• What should be measured to determine if performance objectives for operational resilience are being achieved?

Consistent, timely, and accurate measurements are important feedback for managing any activity, including operational resilience. When conducting measurement and analysis (as defined in the Measurement and Analysis process area), the organization establishes the objectives for measurement (i.e., what it intends to accomplish) and determines the measures that are useful for managing operational resilience as well as for providing meaningful data to manage key processes such as governance, compliance, monitoring, and improvement.

The first step in defining a meaningful measurement program is to determine the required or desired level of operational resilience for the organization. The “organization” may be the enterprise, any line of business or organizational unit of the enterprise, or a supply chain or other form of business relationship that includes external entities. CERT-RMM provides a process-based structure of goals and practices at four levels of capability (incomplete, performed, managed, and defined) and a companion appraisal method. Defining the required or desired level of capability establishes the baseline against which operational resilience can be measured. Ideally, the required level for each process area is established during strategic and operational planning as well as when planning for continuity of operations, not as an afterthought during times of stress and service disruption. The required level should be no less than, and no more than, that which is required to meet business mission objectives.

An effective measurement and analysis process includes the following activities and objectives:

• specifying the objectives of measurement and analysis such that they are aligned with identified information needs and objectives

• specifying the measures, analysis techniques, and mechanisms for data collection, data storage, reporting, and feedback

• implementing the collection, storage, analysis, and reporting of the data

• providing objective results that can be used in making informed decisions, and taking appropriate corrective actions

Integrating measurement and analysis into the operational resilience management system supports

• planning, estimating, and executing operational resilience management activities

• tracking the performance of operational resilience management activities against established plans and objectives, including resilience requirements

• identifying and resolving issues in the operational resilience management processes

• providing a basis for incorporating measurement into additional operational resilience management processes in the future

Example metrics for answering the questions posed above appear in CERT-RMM v1.1 (refer to generic goal 2, generic practice 8, in each process area). They include key process metrics and performance indicators that can be used, in whole or in part, to demonstrate that the required level of operational resilience has been achieved for a given process area. Examples of metrics for 5 of the 26 process areas are included in Table 7.1.

Table 7.1. Examples of Metrics in Selected CERT-RMM Process Areas

image

As the model is used by more organizations, it will be increasingly possible to identify, define, deploy, pilot test, and measure effective security and resilience metrics, as well as collect measurement experiences in support of benchmarking. These metrics may be performance- or process-based. We will be working with collaborators and customers to determine what metrics are most useful for determining process effectiveness, to develop metrics templates and structured definitions, and to update the model to reflect results. Automated approaches for collecting and reporting metrics will be essential for long-term use and success.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset