CHAPTER 
7

The Edifice

Service Management Architecture

SUMMARY

This chapter describes two types of service management applications. The first type is an application that follows the source, collector, interpreter, and display architectural design pattern. A service knowledge management system (SKMS) is used as a detailed example. Most service management applications follow this pattern. The other type described is a policy or business process management application. This also follows the source, collector, interpreter, and display pattern, but it also has more complicated transactions that may require more complex interfaces. Cloud implementations both benefit and challenge implementations of both types.

This chapter describes the architectural considerations in building a service knowledge management system in an enterprise environment that combines cloud and on-premises architecture. Although an SKMS is interesting and valuable on its own, architecting an SKMS is an engineering project that exemplifies the challenges presented when building software to support cloud service management. This chapter explores service management architecture in a cloud environment using an SKMS as an example that ties together the entire service management superstructure.

Service Knowledge Management System

ITIL v3, published in 2007 and revised in 2011, added several new concepts to service management. One of those concepts was the service knowledge management system. The SKMS covers almost every aspect of IT and ITIL practices. It is also seldom, if ever, implemented with the full breadth described in the ITIL volumes.1

As you might guess, constructing SKMS architecture is of particular interest to architects because the SKMS connects all the critical pieces of software used in managing the IT service infrastructure and ITIL practice. Cloud implementations complicate the challenge because the IT service infrastructure is installed both on the enterprise premises and on clouds of all varieties; private, public, community and hybrid clouds all may appear in the service management infrastructure. In addition, the cloud services used for implementation can be of any variety; IaaS, PaaS, SaaS, Storage as a Service, Network as a Service, and all the other “as a Services” that have appeared can and are used in implementing service management. An enterprise with a firm grasp on the architecture for its SKMS also has faced the challenges of an architecture to support coordination of a collection of IT services.

This is distinctly not to say that building a software SKMS is the same as implementing ITIL or a service management practice. It is not. An organization is free to implement any or all ITIL practices. An SKMS ties together the service management practices and the software that supports them and is an ITIL practice itself, but software implementations are not required for ITIL practices; software makes implementing ITIL practices easier by helping tame the complexity and volume of services. However, even an SKMS ITIL practice could be implemented without software. In addition, an SKMS could be built to support an IT service system that avoids ITIL practices. The order in which practices are implemented is not set. One enterprise may choose to tie its SKMS tightly to existing ITIL practices. Others may choose to make an SKMS their first ITIL practice and use it to aid in operating and coordinating existing services that are not constructed following ITIL principles.

An SKMS need not be implemented with software. However, for organizations of any size, the volume of information and rate of change makes it difficult to implement the practices without some assistance from software. AXELOS, the current owner of ITIL, has a program for endorsing software that supports ITIL practices. The program works through licensed software assessors who review software submitted to them and determine whether the software possesses the features that the ITIL books require and supports ITIL practices and terminology.

AXELOS-accredited assessors certify off-the-shelf software as supporting different ITIL practices, such as incident management, problem management, capacity management, event management, and so on.2 An off-the-shelf package may be the best choice for most organizations, although an off-the-shelf SKMS may not be suitable for organizations with unique needs.

Here, I am not discussing building an SKMS to advocate taking on such a project. Building an SKMS illustrates most of the choices that must be made when planning any cloud service management software project. Since it is described in fair detail in the ITIL literature, it makes a good practical example of cloud service management development architecture.

SKMS Architecture

An SKMS brings together data from many sources that affects service management throughout its strategy, design, transition, and operation lifecycle, providing the information to support an effective continuous improvement Deming cycle. SKMS information is available and used at each stage of the service lifecycle. Information passes from one service management function to another via the SKMS. For example, the SKMS may collect and display information from capacity management tools. This information can help make day-to-day decisions in operations, scale testing during service transition, shape the design phase, and help determine long-term strategy.

The SKMS is a single collection point intended to synergistically combine information from different service management data sources. For example, an SKMS could combine data in service incidents with performance data to help determine appropriate service capacities.

The components of an SKMS can be located on a cloud or on the organization premises. From an architectural standpoint, the decision to deploy a given component on a cloud usually boils down to a few considerations. Clouds can provide greater compute and storage capacity than on-premises hardware. Clouds offer elasticity; in other words, clouds can provide increased capacity when loads go up. Clouds also are generally more accessible to users outside the corporate perimeter than on-premises installations. Whenever the need for any of these three capabilities appears, a cloud implementation may be the best choice.3

An SKMS has four architectural layers. A four-layer SKMS is similar to the classic Model-View-Controller (MVC) pattern long used in user interfaces. Like the MVC pattern, a four-layer SKMS separates the data sources from business logic and data display. The SKMS layers separate the data sources from the consolidation and interpretation of the data, and the display of the interpreted data is isolated from the interpretation. When implemented, an SKMS can become quite complex, but its basic architecture is straightforward (Figure 7-1).

9781430261667_Fig07-01.jpg

Figure 7-1. A basic SKMS architecture

Data Sources

A typical MVC architecture differs from an SKMS architecture in the multiplicity of data sources and the fact that many of these data sources will be legacy systems that may require special integration, which is the job of the data collection layer.

Enterprises follow many different road maps in evolving their service management practices. A “greenfield” implementation of a service management system, including an SKMS, built with all new components and infrastructure is ideal but rare in practice. Most service management practices are built up over time. Sometimes services are first manual and then automated gradually by inserting applications into the service process where they are most cost effective. Services that began as a paper system often turn into fully automated electronic systems through this evolution.

Usually, an automated management system also grows over time. For example, a growing printing business may begin with a trouble-ticketing system that is just 3-by-5 cards in a shoebox that the maintenance staff uses to keep track of equipment problems. As the organization grows, to handle the volume, the maintenance department replaces the shoebox with an issue-tracking database. To speed response times, software is added to open tickets. Gradually, a section of the maintenance department transforms itself into an IT department. Eventually, the trouble-ticketing system is merged with internal and external customer service, and what was a shoebox becomes the core of a complete service desk system. In real life, the ghost of that shoebox lives on in the names and choice of fields in service incident reports. Quite possibly, the custom software written long ago to instrument the original printing press (as per our example) may still be there, generating events when bearings overheat or when the press stops suddenly. This pattern of gradual evolution from manual to automated with older technology wrapped into newer technology is frequently seen in IT. Managers are often disinclined to replace software that works, even if the overall system has advanced. Consequently, IT departments often deal with software and platforms of many vintages.

Suboptimal components linger for many reasons. Often budget constraints block their replacement. Sometimes a poorly performing component lives on because it supplies functionality that the enterprise needs and new components do not replace. A range of platforms and a mixture of outsourced and in-house services, each with their own requirements, often increase the obstacles. These services and platforms often have constituencies that resist new architectures. In some cases, training costs for new software or hardware can exceed the benefits of the new infrastructure. Even when new software or hardware is clearly advantageous, organizational resistance to change can be hard to oppose. In practice, addressing only the most egregious problems and most desperate needs are often all a technologist can expect. Patience and perseverance generally triumphs but only on the long haul.

Consequently, an SKMS architect must often weave together legacy software and platforms with new software to consolidate and interpret data. The data source layer is often where the greatest difficulty occurs, but even the information display may sometimes have to be assembled from legacy components.

The data source layer corresponds roughly to the model in an MVC pattern, although the data structure of individual SKMS data sources can vary widely in complexity and sophistication. At one extreme, the model may be unstructured data that requires extensive processing to yield meaningful information, such as big data projects that extract significant patterns from unstructured documents. At another extreme, the model may be an object-oriented application that presents a complex object model, such as the model exposed by some service applications, through a well-structured and easily used web service. Somewhere in the middle, there are many structured relational databases with diverse schemas that can be queried with SQL or other standard query mechanisms. At the worst, there may be no practical API for a source, and the data model may be inconsistent and idiosyncratic, which can be the case for legacy technology that predates data and API structuring principles and transport design. Fortunately, there is usually a way around all these obstacles.

There are two major classes of data source from the viewpoint of the SKMS architect: those with and without APIs. A data source with an API has some intentional means of communicating with other programs rather than users. Applications with APIs are easier to work with. Fortunately, good software engineering practice is to provide APIs to almost all applications. With the rise of the Internet, best practice has been to provide not only an API but an API that is accessible through the web called a web service.

Unfortunately, there are still data sources in use that do not have APIs. Legacy components without APIs can usually be made to participate in the SKMS. The least automated and least satisfactory is the “yellow pad” method. A person physically goes to the source, copies a number from a gauge to a yellow pad, and goes back to their cubicle and enters the data into the SKMS. This method is slow, error-prone, and most likely expensive. However, when the number on that gauge is the key to a critical service’s performance, changes infrequently, and cannot be acquired any other way, the effort may be worth the trouble. With the advent of the Internet of Things (IoT), instrumentation has become easier since the number of physical devices that are connected to the network is increasing.

Next up the scale is the venerable and universally excoriated practice of screen scraping. Scraping a screen means to convert data intended to be read by humans into data that can be used programmatically. Mainframe programs from the 1970s and 1980s often displayed data only onscreen and on paper; transferring data directly to other programs was often limited to transferring tapes. A screen scraper cobbles together an API, reading screen buffers and programmatically converting them to usable data structures using screen positions and searching for labels. Needless to say, the process is slow, difficult to program, and brittle. Minor screen changes can break the pattern and cause a complete and tedious rewrite of the scraping code. Nevertheless, it works. Screen scraping is a possibility when nothing more reliable and efficient is available, which may be the case for older mainframe systems. A form of screen scraping is also useful for extracting data from web interfaces by parsing the HTML. Conceivably, a SaaS application that offers no usable APIs could be cracked by some form of screen scraping.4

File sharing is next up the scale and superior to yellow pads and screen scrapers when it is available. Applications sometimes write their data to files. More sophisticated applications write to databases, now most often relational databases. Although these files and databases may not have been intended to be read by external programs, they often can be. This is a much more reliable and robust way of extracting data from a source, but it requires understanding of the internals of the application and may break if the application is updated.

Data sources with intentional APIs are much easier to work with. Applications designed before web services became popular rely on Transmission Control Protocol/Internet Protocol (TCP/IP)5 or similar lower-level protocols that deal directly with the network. If a development team is prepared to work with the appropriate protocol stack, the major difficulty in tapping these applications for information is obtaining sufficient documentation. Unfortunately, the documentation for these APIs is often an internal technical manual or, worse, comments in code. Lacking documentation can be a nearly insurmountable obstacle.

When an application with an API based on a socket-level protocol, like TCP/IP, is deployed on a cloud, there can be difficulties in treating it as a data source. Communication between clouds is usually based on Hypertext Transfer Protocol (HTTP)–based APIs.6 Administrators have good reason to suspect that direct TCP/IP communication with an entity outside the corporate perimeter is a security breach waiting to happen. Consequently, connecting to a data source using a low-level protocol is likely to have issues with firewalls and other security mechanisms. Usually, the firewall may have to open additional sockets, which administrators are loathe to do, as they should be. In high-security situations, changes or additional layers may have to be added to the architecture to avoid the extra open sockets. When Network Address Transformation (NAT) is present, and it is almost always used now, an additional layer that retains network addresses in some form may also have to be added to handle transformed addresses. These difficulties should not be underestimated, but they have the advantage of being relatively reliable and sustainable after they are in place.

Web-era applications are usually the easiest to deal with. They are likely to have HTTP-based APIs, which usually means REST or, to a lesser extent, SOAP-based7 APIs. If the application and the API are even moderately well designed, these present relatively few difficulties for data collection. SOAP- and REST-based APIs are by far the easiest way to collect data from applications deployed on clouds.

In the ITIL SKMS practice, any entity that contributes to the service management lifecycle can be an SKMS source. That is a wide range of sources. In practice, the scope of SKMS data sources depends on the data an enterprise has available, the effort required to acquire the data for the SKMS, and the enterprise goals for the SKMS. Typically, an enterprise can prioritize critical services and issues. Those priorities then determine the scope of the SKMS data sources.

The data content also varies widely. For example, some organizations may want to include data from the operating systems of critical servers that run service software. This data could come from real-time queries or from events from the operating system. If near-real-time data is not needed, the information could come from operating system log files. Any of these methods can be useful. Operations is likely to find the more immediate data from a direct query useful in making on-the-spot decisions, but strategic decisions are likely to find trends extracted from logs more revealing.

Many organizations choose to concentrate on data closer to the services themselves rather than the infrastructure that supports the service, but the decision should always be driven by the enterprise goals for the SKMS. It is not an either-or decision. More data is better, but only if it is properly digested and interpreted. A wealth of data that is meaningfully combined, interpreted, and displayed is ideal. System monitoring facilities such as network monitors, security breach detection, and performance monitoring all are candidates as SKMS data sources.

At a minimum, most SKMSs include service portfolio systems; service desk incident, request, and problem management applications; configuration management databases; and change management records. Other sources, such as performance management, capacity management systems, asset management systems, and financial management may be used to focus on specific aspects of service management that are important to the enterprise. Practically, the investment needed to collect the data often helps decide which sources to include.

Data Collection and Consolidation

The SKMS collection layer brings the data from all the sources together. In a simple situation, the collection layer can be a pure pass-through, taking data from the data sources and passing it on to the upper layers without changing it in any way. The data sources and the data collection layers combined are the model in a Model-View-Controller (MVC) system. Unfortunately, data collection is often more complicated than a simple pass-through that homogenizes the individual application programming interfaces of the data sources into a single API that can be called by the upper layers. Usually, in an MVC, if there is more than one source of the data in the model, those sources are designed into the model rather than seen as separate data feeds. That is usually impractical for an SKMS because the data sources have their own structure that must be incorporated by the SKMS data collection layer, and the intention is to provide a window into service knowledge that may provide different views as IT services evolve. This usually involves something more complex than a simple pass-through.

The data collection layer must deliver data from all the data sources to the next layer up, the data interpretation layer, in a manner that the interpretation layer can use. The interpretation layer applies business logic to transform the collected data into meaningful business information. The collection layer may exist as a data warehouse or cache, or it may access the data sources directly. The collection layer may provide some data transformation. Translation to uniform measurement units is usual, and the data may have to be plugged into a common model.8 There are many variations and combinations, but the architect must choose between caching data and querying for data.

A basic heuristic says caching consumes resources but improves performance; querying is more flexible and usually requires fewer computing and storage resources but may perform and scale poorly.

A data collection layer that passes queries for data on to the data sources, perhaps as SQL queries or using Open Data Protocol (OData),9 can usually be constructed quickly. Using a protocol like OData, unanticipated requests for information may require no changes to the data collection layer, only a new query from the next layer up. However, this kind of data collection layer depends on the speed of the underlying data source and the transport of data from the source to the collection layer. This can be dishearteningly slow or erratic, and the SKMS architect has little control over the cause of the issues. It also may mask problems with differing data models.

An alternative is to construct a cache that is filled by the source on the source’s schedule, which may be whenever the data is created or changes, or periodically at some appropriate interval. Using a cache, a query to the data collector depends on the speed of accessing the cache, not the performance of the data source. Usually, the cache will be much faster than the data source.

A cache gives the SKMS architect better control of performance, but the lunch is not free. There may be a lag between a value changing on the source and the value obtained from the cache, although a reasonably short lag is usually not important in SKMS. In addition, maintaining a consistent cache is often more difficult than it first appears. It often requires an in-depth knowledge of the inner workings of the data source. The cache itself requires resources, both in storage and computing capacity. Finally, when users request data not previously in the cache, the cache must support the new data. A simple and quickly built cache may require revision of the cache data layout to incorporate new data. If the cache code expects a fixed-data layout, coding may be required. In a dynamic organization, this can turn into a nightmare. That nightmare can be avoided with careful designs. Caches, often better called data warehouses, can be built to accommodate changing data schemas, but this can become a major project in itself.

The architecture of the data collection layer is often the most significant element in an overall SKMS design. There is no single best practice because it depends on the data sources, which vary from site to site.

In a greenfield implementation in which the architect can choose or build all the applications that contribute to service management, data sources can be chosen that have a uniform API for obtaining data. A protocol like OData is an excellent choice. The choice of OData also determines the transport used to transfer data from the source to the data collection layer because OData is generally considered a Representational State Transfer protocol, although it can be used in other contexts.

Other choices include SOAP,10 which is connection oriented.11 There are many SOAP toolkits, and many existing applications have SOAP interfaces. SOAP is also usually implemented over HTTP,12 although, like REST, other transports are possible. In the past few years, REST has often been favored over SOAP because REST implementations have a simpler software stack. In theory, a REST interface can be used with nothing more than a simple command-line tool like cURL.13 In a powerful development environment, SOAP APIs are easy to write, but the hidden complexity of the SOAP stack can be an obstacle to debugging and maintenance.

SOAP’s complexity also permits greater flexibility. REST servers do not retain client application state, which is not so important for SKMS but is important to other service management applications.14 A call to a REST server depends only on the state of the server, not the state of the client. In other words, the server treats each call as independent and does not keep track of previous calls. Of course, if a call from a client changes the state of the application on the server, say by inserting a row into a database, the state change may affect the server’s response to later calls, but that is the application state, not the server state. If, however, the client issues the identical GET commands repeatedly, the client will receive identical responses as long as the application state does not change. If the server were stateful, the server might keep track of the calls, and the client might get the first hundred rows, the next hundred rows, and so on, with each successive call.

The lack of server state makes writing REST servers somewhat simpler and more scalable. A REST server does have to contain code to manage client state, and it does not have to unwind the client call stack for error recovery, which makes REST servers easier to write and REST stacks lighter weight. The stateless REST servers are more easily load balanced because clients can easily switch between the servers. The flip side of these arguments is that REST clients may contain more code to keep the client state, and SOAP supports extended transactions, which can reduce the number of messages sent between the client and the server and are necessary in some cases. Avoiding excessive short exchanges between client and server instead of a few messages with more content can be important for performance over a network. In addition, developers trained in object-oriented programming often find SOAP transactions easier to understand when the complexity of the SOAP stack is hidden.

The distinctions between REST and SOAP become more significant when cloud implementations enter the picture. The “Cloud and Data Collection” section of this chapter discusses some of these challenges.

When considering an SKMS data collection architecture, an SKMS architect should begin by asking what APIs are available for the data sources that will be incorporated in the SKMS. Asking what data sources might be employed in the future is also worthwhile. Although all APIs using a single protocol is ideal, building new APIs takes time and resources. Often a new API is impossible. For example, building a new API for an off-the-shelf proprietary application is completely infeasible in most cases. Even an open source or in-house application can present formidable obstacles to new API building.

As a consequence, the architect must, as usual, perform a balancing act by doing the minimum of work to make the data available and build a data collection layer that meets both performance and content requirements. Most often that ends up a conglomeration of data protocols and direct queries mixed with some form of data warehouse.

Figure 7-2 is an example of an SKMS data collection layer that combines proprietary, SOAP, REST, and REST OData protocols. The example also combines both data cached in a data warehouse and data that is queried directly from the data source. In the Figure 7-2 example, the Uniform Access Facade is shown using SOAP. SOAP will work as the data collection layer protocol, as will REST or many other protocols. Currently, REST would be theoretically favored in many IT departments. However, in an environment where SOAP or some other protocol is used frequently, it is often a good choice to stick with the protocol best known among the developers who will work with it.

9781430261667_Fig07-02.jpg

Figure 7-2. For an SKMS data collection layer that combines several protocols, direct querying and caching often is a good solution

The figure illustrates that much of the code required in data collection follows either the adapter or the facade pattern. The adapter pattern usually refers to code that translates one API protocol into another. A facade pattern usually describes an entity that provides a single unified interface to a number of sources. See Figure 7-3.

9781430261667_Fig07-03.jpg

Figure 7-3. Facades and adapters perform different functions

Most SKMSs have to address translating objects from one scheme to another. For example, one data source may have something called storage that models both internal and external disk drives that may be attached via a TCP/IP network or a Small Computer System Interface (SCSI) network.15 Another data source may have separate direct attached storage (DAS), network attached storage (NAS), and storage area networks (SAN). Some code must bring this data together in a comprehensible fashion.

This vexing problem can become complicated. A basic strategy for dealing with the complexity is to choose a common model for data collection and translate all the data source models into a common model. This requires some advanced planning, but a common model is a scalable approach that makes adding data sources relatively easy.

“Relatively” is a carefully chosen word. Adding a new data source can be easy, but it also can be maddeningly difficult, no matter how well-designed the data collection layer. If both the data model and the transport protocol are idiosyncratic, which happens all too often, developers are challenged to be smart and creative. However, it is always easier to add a data source to an established data collection layer based on a strong common model. Ad hoc model translation mechanisms without a common model are invariably fragile and difficult to understand and work with when a new source is added.

Some of these problems are addressed in dealing with configuration management databases (CMDBs),16 and SKMS architects can sometimes leverage federated CMDBs. Information models such as the Distributed Management Task Force (DMTF) Common Information Model (CIM)17 are also useful.

Adopting one of these general models as the standard for the facade can save work. CIM, for example, is already used in some IT management tools. If the data collection layer uses CIM as the model that is exposed in the façade, including management tools that use CIM in the SKMS is simplified because there is no translation between the source and the façade. A model that addresses services as well as infrastructure and software may be useful here.

Figure 7-4 shows data models in an SKMS data collection layer. In this example, a common model was chosen as the model behind the façade. The facade provides a uniform interface to other applications such as dashboards that will use the collected data. The data sources are in several forms, but they are all transformed so that they can be accessed through the uniform façade. A common model simplifies the facade and makes the system easier to modify by isolating conversions to the adapters and avoiding cross-model conversions. The example uses two data warehouses that are both addressed through the façade.

A system of adapters such as this may not need to be developed by each site. There are products, such as Teiid, an open source product, that provides software for much of this process.18

9781430261667_Fig07-04.jpg

Figure 7-4. Using general data models strategically can reduce coding

Designing and building a scalable SKMS data collection layer is challenging because IT service management has developed gradually and coalesced from many independent applications. The data collection layer is where much of the difficulty surfaces.

Cloud and Data Collection

So far, this discussion has not mentioned cloud computing. Today, in many IT departments, service management is likely to consist of a mixture of applications that are deployed locally, deployed virtually on local servers that are not organized as a cloud, and deployed on public and private clouds. Although cloud acceptance has increased steadily since the concept’s introduction, organizations are often hesitant to make radical and rapid changes to their infrastructures. CEOs and CFOs still fret over cloud security and governance. SKMS data collection layers will have to collect data from cloud and noncloud sources for at least the near future, although the number of noncloud sources are likely to decrease as time goes on. On-premises implementations may never completely disappear from systems with extraordinary security and secrecy requirements.

The architecture of a mixed local and cloud environment can be complex. Figure 7-5 is an example of a mixed environment. Several data sources are implemented on clouds; other data sources are implemented locally. In the example, the data collection layer is implemented on both cloud and local environments. The data collector uses a data warehouse implemented on a private IaaS cloud that was built by the enterprise, exists on the enterprise premises, and is administered as a cloud by enterprise IT personnel. Likely, a private cloud was chosen rather than a public cloud because the data in the SKMS warehouse is sensitive and management was unwilling to place it in a third party’s hands.

9781430261667_Fig07-05.jpg

Figure 7-5. The topology of an SKMS in a mixed local and cloud environment can be complex

It is also likely that this private cloud evolved from a simple virtual environment on a cluster of servers that was originally deployed to run a number of applications utilizing the combined computing power of the cluster more efficiently.

These private clouds can be hard to distinguish from simple virtualization. The key difference is primarily administrative. A private cloud offers an on-demand environment, usually in the form of virtual machines and storage. The consumers of the on-demand environment configure and use it as they want, installing and running their applications as if it was their own physical infrastructure. The mechanism by which the consuming service is charged for the use of the infrastructure and the control the user is given over the infrastructure are important characteristics of clouds. There is no great value in making sharp distinctions, but whether an application is running in a virtual environment or a private cloud can make a difference for integrating an application as a data source for an SKMS.

A mixed environment like Figure 7-5 poses several challenges. SaaS integration depends on the extent of the APIs that the SaaS provider is willing to supply. Old-style, on-premises, enterprise applications often provide access to a rich array of data and events in the application. This has proven to be a mixed blessing: opening up application data and events widens the surface on which consumers interact with the application. This level of access can enable tailored customer implementations that exactly match enterprise needs. However, future development of an application with a wide surface must continue to support the exposed surface without changes or risk breaking backward compatibility with existing installations.19 The wider the exposed surface, the harder to maintain complete backward compatibility. Without easy backward compatibility, the result is often customers who refuse to upgrade to new releases or difficult and brittle scripted upgrades that are expensive to build and test and often annoy customers. Over time, this can cripple the advancement of the application. Many SaaS providers have taken an opportunity to reduce this exposure by offering narrower APIs with less access to the internals of products. If the APIs are carefully chosen, this works well, but it can also make data collection for an SKMS difficult.

Issues with IaaS and PaaS cloud implementations are similar. Both are subject to the APIs of the applications running on the cloud. Most of these are HTTP-based REST or SOAP. However, older applications may have been ported directly to the cloud without changing their APIs. These may not be HTTP-based.

The cloud implementation itself may have data that must be collected for the SKMS. For example, a typical IaaS cloud charges consumers for the virtual machines that run on the cloud. These charges may be critical information for service management because they contribute to the cost of providing a service. The amount of cloud storage used, virtual network utilization, and other metrics all may be of value to the SKMS users. Therefore, the SKMS may be required to collect data from the cloud providers. Most providers may make this information available through some kind of web service. The DMTF Cloud Infrastructure Management Interface (CIMI) provides a REST API for accessing such metrics.20 Proprietary APIs usually also provide similar APIs for accessing this sort of information.

Data Interpretation

The data collection layer is the workhorse of the SKMS, but the data interpretation layer is the business intelligence of the system. Data interpretation transforms raw data into information that can be used by the SKMS consumer. Data interpretation relies on the business rules and practices of the enterprise to give meaning to the data that is collected. Data interpretation can range from a simple pass-through of data values to an intense analysis of large amounts of structured and unstructured data. There are many possibilities: a data series can be examined for trends, big data–style analysis may reveal hidden relationships in seemingly unconnected data, or commonly used ratios such as mean time between failure (MTBF) can be calculated.

Perhaps the most important capability of an SKMS is to combine similar data from disparate sources. Such information might contribute to strategic and design decisions. An SKMS can make it possible to compare various aspects of productivity for different services. For example, an SKMS can normalize metrics for customer satisfaction for similar services and compare the services, judging them on their relative contributions to the enterprise reputation. The total cost of ownership for service assets used for similar purposes in different services can be compared, and better purchase strategies can be developed for the future.

The testing history and rollout issues in one service may be useful in planning transitions for another service. Combining with data from similar technical implementations can provide insight to operations personnel into overlaps and conflict in resource allocations that affect service delivery. Sharing service incidents from the service desk can yield clues to incipient weaknesses in similar services. Information in one service’s solutions database can often be useful to other services with similar issues. The possible combinations and benefits grow as further data sources are added to the SKMS.

Bringing together information from different sources and synergistically combining them into insights that are greater than the sum of the parts is exactly why enterprises choose to build SKMSs. Combined information can be important for making strategic decisions and even low-level operational decisions that affect more than one service. These comparisons can be critical to strategic decisions for enhancing or replacing services.

Data interpretation depends upon a thorough understanding of the business. Without that understanding, the data from the collectors are facts that cannot be transformed into information useful to the enterprise. The business of the enterprise determines what is useful and what is not. The architecture of the data collection system must not hinder business input into the interpretation of the data. This is probably more of a documentation challenge than a technical challenge: the business side of the house must be able to understand exactly what the data is in order to decide how it is to be interpreted. The architect and developer must work closely with the business side in order to provide useful interpretation. This is often frustrating because neither understands the other well and may have little motivation to make the effort to understand.

Both sides must be patient and keep an open mind. It is hard to overemphasize the importance of mutual understanding in the interpretation layer. Without substantial understanding, the value of an SKMS will be limited to the business imagination of the developers of the system because the developers are the implementers. Without meaningful business input, the implementation can be a catastrophe because its information will not be of value to its business consumers who are likely to be the majority of users and the most influential.

Presentation

The two ingredients that make up a good SKMS display are convenience and understandability for its users. The lower levels (the data sources and the data interpretation and collection) are technical, unseen, unappreciated, largely unknown, and nonetheless essential.

No matter how well-designed, efficient, and maintainable the data collection and display layers, an SKMS without a good display will be a failure. Developers often assume that the SKMS display is only “eye-candy” and not necessary to the effective use of the SKMS. This has a grain of truth; a good SKMS cannot succeed without effective data collection and interpretation. A great display cannot compensate for inadequate data. The harmonious colors, elegant fonts, graphs, bubble charts, and dynamic screens are all useless without the right information to display. On the other hand, excellent data that is not presented as clear and useful information is equally useless.

An SKMS must be conveniently accessible. For an SKMS, display means more than the traditional liquid crystal or cathode ray tube monitor. There are more display media available today than in the past. An SKMS can make use of many different modes of display–-some cutting-edge and others more traditional. Each has its place, and a well-designed SKMS display can use them all.

Paper reports, the oldest of computer displays, are still often favored for displaying long-term trends for planners and executives developing strategic policies and evaluating service portfolios. Traditional “fat client” displays in cubicles, on the racks in the data center, and on large displays in control rooms are important in some organizations. However, web browser–based displays, which are more device-independent, have largely replaced fat clients. Web apps and mobile apps on portable devices such as mobile phones and tablets offer real-time insight into service performance to roving managers and operators and untether employees for offsite work. The multiplicity of displays can be a temptation to poor presentation when developers and designers attempt to cut corners by not using the characteristics of displays to advantage. Displaying a PDF of a paper report unreadably on a mobile device and claiming to have a mobile app is an example of a misdirected rush to a new means of display.

An SKMS architect should plan for all of these types of displays. SKMS architecture does not differ from any other application that must display information to users. However, because the SKMS can often display information from legacy sources that do not have flexible user interface, the value of an SKMS increases when it provides a window into older sources whose information is harder to get to.

Service Workflow and Business Process Management

Not all service management applications fit the source, collector, interpreter, and display pattern. In addition to displaying information used for management, some service management applications manage business processes, enforcing and executing policies and actively managing objects that make up services.

Management and control are usually based on events and alerts. (See the following sidebar.)

ALERTS AND EVENTS

Following ITIL terminology, events are significant changes of the state of a system. Any change that affects the management of the system can be called an event. An event might occur when a file system reaches 80 percent capacity, or an event might be a periodic reading of CPU temperature or even a measurement of current inventory. An important aspect of events is that they indicate the state of the system at a specific time. Not all events indicate an actual or imminent malfunction. Some events indicate benign conditions that may be useful for management.

Event is occasionally used interchangeably with alert, but usually an alert is the notification that may be generated when an event occurs. Also, alert is often reserved for notifications that indicate an actual or imminent malfunction or situation that requires attention.

Service incident management is an example of event management. Many incident management applications, usually part of a service desk, automatically create incident reports based on policies or business rules and input from automated instrumentation of the virtual or physical infrastructure. Human sources and security systems also create and provide events for incident reports. The incident management application displays the information collected from users and automated agent, but it also automatically manages incidents, following policy rules to determine priority, assign technicians based on expertise and availability, escalate issues, and manage other aspects of the incident lifecycle. Policies may automatically close issues. Policy management, enforcement, and execution are often performed by a workload, workflow, or business process management component. Sometimes these policies are called workflow policies to distinguish them from other types of policies such as security policies; sometimes they are simply called business rules. Policy determines actions taken on the incident, and incident management applications have a more active role than collection, interpretation, and display applications.

Change management and other aspects of service desks such as problem and request management also enforce and execute policies. Other service management applications, such as financial asset, service portfolio, service-level agreement management, or service catalog may also include a business process management component.

These applications face all the challenges of an SKMS because they collect, interpret, and display information from data sources. They must integrate with different data sources with different architectures and different APIs or lack of APIs. And they confront differing data models that must be reconciled and rationalized. They gain the same advantages and disadvantages from cloud deployments.

Figure 7-6 represents a general architecture for a process control system. The architecture is actually similar to the SKMS architecture in Figure 7-1. Events are a special form of data, and they are consolidated just as data is collected and consolidated in an SKMS. Event consolidation is usually called correlation. The correlation function in event management attempts to identify events that stem from the same originating condition, often called the root cause. Sometimes that is easily done when a storm of events repeatedly contain the same information from the same source. Other times correlation reaches to the level of artificial intelligence and sophisticated data analysis when the linkage between events is subtle.

9781430261667_Fig07-06.jpg

Figure 7-6. Service management and control service overview

Event correlation and data consolidation face some of the same challenge with different emphasis. SKMS tends to be challenged by reconciling different data models and repositories that refer to the same physical or virtual object. Event correlation has the same challenge, but usually the greatest challenge for event correlation is attaining a deep understanding of the dynamic relationships between events as they occur. For example, a rise in network traffic through a certain router may not be significant unless there is also an increase in the load on a certain server, which might indicate a denial-of-service attack on the server, rather than a general increase in activity. This level of inference goes far beyond the data in the event.

The most significant difference between Figure 7-1 and Figure 7-2 is the control arrows pointing back to the controlled application or component. An SKMS is a data collection mechanism that ties together data from many elements of the IT system. A system management application is a feedback loop that takes in event data and uses the data to exercise intelligent control of the managed applications and components.

The control is usually guided by management policies and rules. The management and control module must interpret incoming events and apply the policies and rules to the ongoing process. Sometimes this is a manual process performed by operators to view processed event data and use their experience and expertise to exercise the appropriate controls. In other cases, especially when instant reaction is required, the management and control module will respond automatically. An example is a system that automatically activates a fire control system when excess temperature is detected.

Unique management and control application challenges stem from the complex interaction that management requires. Often, control consists of a workflow rather than a single control message. In a workflow, the control activity may involve many steps, each one of which may generate its own events and require a change in the direction of the flow. All transactions in a “collect-and-display” application move data from the source to the destination. Either the source or destination may start the transaction. When the destination requests data, the transaction is a pull. When the source sends the data to the destination without a request, the transaction is a push. When executing a workflow, transactions are more complicated, involving more than a single push or pull step. Often, the managing application sends a request to a worker application and then waits for the worker to reply with a status on the effect of the request. These interactions can be complicated, especially when the system or network has elements of unpredictability such as a network that can slow down or even break. Management and control applications can be hypercritical as in our fire control system example.

The topology of a cloud implementation can critically affect control mechanisms, especially intricate control workflows. Interaction between public clouds and on-premises components are affected by the network that connects them. When the network is the Internet, the workflow must be designed to be resilient to slowdowns and interruptions. One way of addressing this issue is in the placement of components so that a network interruption can be compensated for with a component with an uninterrupted path, which may mean designing in redundancy and parallel processing. Often, parallel processing can serve a dual purpose of also improving scalability.

As much as possible, workflow transactions should be designed to be stateless and idempotent.21 Even within a corporate network, mishaps sometimes occur, and the target of a message does not receive the request, the target never responds, or the sender dies before it gets its response. A stateless transaction can be repeated with the same effect each time, which makes recovery from interrupted transactions much easier. When some components of the architecture are on clouds, the likelihood of some kind of network interruption increases. Therefore, stateless transactions are even more important when clouds are present.

When choosing between REST and SOAP, it is important to keep the statelessness of REST architectures in mind. It is also wise to remember that well-designed REST APIs have a stateless server, but sometimes the label REST is used loosely and API is not stateless. On the other hand, SOAP architectures can be designed to be stateless.

Sometimes designing a stateless transaction is simply not possible. A classic all-or-nothing accounting transaction that must complete or be completely reversed as if the transaction never happened is difficult to implement as a stateless transaction. If the instigator, and perhaps other players in the transaction, does not acknowledge the success of the transaction, the transaction must fail completely and return the server to the state before the transaction started. Recovering the prior state is difficult if the system must remain stateless. Usually, service management transactions are not as strict as accounting transactions, so statelessness is not as hard to achieve, but when stateful transactions are not avoidable, SOAP is connection oriented and will support stateful transactions. In that case, SOAP may be the best choice for APIs with the service management application.

Management of stateful transactions is the most difficult difference between source, collector, interpreter, and display patterns and applications that manage workflow and policy.

Although clouds raise the threat of more difficult recovery from network interruption, clouds may still be even more beneficial to workflow management applications than “collect-and-display” applications. Service desks, for example, are often in need of cloud elasticity. Like many other service management applications, service desk volumes reflect the volume of activity in the enterprise, which varies by season, account-closing dates, and even day of the week. Since a service desk is often a mission-critical service that is most needed during flurries of activity when volume is at its peak, a service desk application must be provisioned to perform well when it is heavily used, even though that may result in over-provisioning the vast majority of the time. Cloud elasticity makes over-provisioning unnecessary, if the application is designed to scale by increasing the number of virtual machines.

The accessibility offered by a cloud implementation is often the most important benefit of a cloud deployment when management and control applications are used widely, both inside and outside the enterprise.

Conclusion

Clouds can reduce up-front capital investments, make service management applications more accessible outside the organization perimeter, and increase flexibility and scalability, but they can also make integration more challenging and complicate governance and security. An SKMS is an information collection, interpretation, and display application that faces many integration challenges and is characteristic of many service management applications. Other applications, such as service desk or service catalog, implement policies and manage workflows and business processes. These applications face additional challenges in a cloud environment, but they also gain similar benefits from cloud deployment.

EXERCISES

  1. What is a service knowledge management system?
  2. Describe the four architectural layers that make up an SKMS.
  3. What is the general architectural pattern of an SKMS?
  4. Describe another service management application that follows the same pattern as an SKMS.
  5. Describe some of the challenges presented by legacy applications to implementing an SKMS.
  6. List some challenges presented by cloud implementation of SKMS data sources.
  7. List some advantages provided by cloud implementations.
  8. Discuss some service management applications that do not follow the SKMS pattern.

1ITIL Service Transition. London: TSO, 2011. 181–195.

2For the full list, see www.itil-officialsite.com/SoftwareScheme/EndorsedSoftwareTools/EndorsedSoftwareTools.asp. Accessed June 2014.

3These are architectural reasons for using a cloud. There are, of course, also business and administrative reasons, such as capitalization and amortization strategies.

4Tapping into the Domain Object Model (DOM) in the browser is usually more effective, but that is another subject entirely.

5Marvin Waschke. Cloud Standards. New York: Apress, 2012. Pages 226–232 discuss TCP/IP.

6Marvin Waschke. Cloud Standards. New York: Apress, 2012. Pages 245–259 are a summary of the HTTP.

7SOAP once stood for Simple Object Access Protocol. SOAP has evolved to be not simple, not confined to object access, and not exactly a protocol. Therefore, the World Wide Web Consortium (W3C) SOAP working group declared SOAP no longer an acronym. REST stands for Representational State Transfer. Both these are discussed in more detail in the next section, “Data Collection and Consolidation.”

8More on this in a moment, but a simple example of a model problem is a financial asset application that differentiates between servers and desktops and a configuration management application that treats both as computers. If the data collector is simply a pass-through, the data interpretation may have three entities (servers, desktops, and computers) that are hard to interpret.

9OData is a standard initially developed by Microsoft and now maintained by Organization for the Advancement of Structured Information Standards (OASIS). See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=odata . Accessed July 2014.

10SOAP once stood for Simple Object Access Protocol. As SOAP developed, it became neither simple nor primarily used for object access, so now SOAP officially is no longer an acronym. The protocol is now a World Wide Web Consortium (W3C) standard. See www.w3.org/TR/soap/. Accessed July 2014.

11See Marvin Waschke. Cloud Standards. New York: Apress, 2012, pp. 229–232, for some of the issues involved in connection-oriented and non-connection-oriented interaction.

12HTTP has drifted almost as far from its roots as SOAP. HTTP is now used in ways that do not much involve hypertext, passing everything from relational data to blocks of code.

13cURL is a command-line client for Uniform Resource Locators (URLs) written in C and available on many platforms. On the command line, it is spelled curl. cURL sends and receives raw HTTP from a command line, and it is all that is needed to use REST, although for most purposes, more elaborate software is employed. Because it is simple and offers a transparent window into activity on the wire, cURL is useful for testing and debugging. For further information, see http://curl.haxx.se/. Accessed September 2015.

14Chapters 11 and 13 also discuss state.

15Marvin Waschke. Cloud Standards. New York: Apress, 2012. Pages 174–185. This book describes SCSI and various forms of network storage.

16The Distributed Management Task Force has published a standard that addresses some aspects of this problem. See http://dmtf.org/standards/cmdbf. Accessed July 2014.

17See http://dmtf.org/standards/cim. Accessed July 2014.

18For more on Teiid, see http://teiid.jboss.org/about/. Accessed September 2015.

19In this case, backward compatibility means that an SKMS data collector built to work with the API of an early release of an application should work equally well with a later release without modification to the collector. In addition, the information in the SKMS display from a later release should be as valuable as the information from an earlier release. The later requirement is often more difficult for an application builder to maintain than the former. The first requirement places constraints only on the form of the API. The latter requirement also places constraints on changes to the data structure of the application, which can seriously hold back the evolution of the application.

20See http://dmtf.org/standards/cmwg. Accessed July 2014.

21Statelessness and idempotency are discussed in Chapter 13. An idempotent message can be repeated with the same effect as issuing it only once. Web applications that request that you click only once when submitting a transaction are not idempotent. The danger is that the second click will cause the transaction to be repeated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset