DevOps perspective
This chapter reviews how Walmart uses cloud services to address the challenge of variable speed IT.
To allow rapidly evolving service consuming applications to move from development to production while also evolving those services in response to consumer feedback required the platform engineers to take on the responsibility for release management.
This process, also known as DevOps, sits at the cross roads between development, product management, quality assurance, and other engineering efforts, including automation.
This chapter includes the following topics:
7.1 The challenge of variable speed IT
Walmart is very much a technology company. In fact, Walmart uses many innovative technologies in some capacity. Walmart’s rich history of technology investment continues unabated through today and provides a good basis for a discussion about the concept of variable speed IT.
If you consider the requirements that are associated with current trends, such as e-commerce and mobile, you can see a recognizable example of variable speed IT. These requirements are highly variable and dynamic in functionality and demand. They also interact with core systems that support brick and mortar traditional systems, such as inventory and replenishment.
Figure 7-1 shows the different perspectives of the two worlds of new applications and traditional applications and some broadly generalized categorizations.
Figure 7-1 Variable speed IT: Two worlds?
The two worlds of variable speed IT exhibit the following contrasting characteristics:
New applications versus traditional applications.
Systems of engagement (directly accessible and tied to retail customer experience) versus systems of record (core applications, master data management, back office, human resources, and so on).
New applications that tend to be more modularized versus traditional applications that tend to be more monolithic and intertwined with other core systems.
The primary goal of a new application is to enable experimentation versus the goals of sustaining functionality and managing risk, which are predominant in a traditional application.
Work management in a new application tends to be accomplished with agile methodologies versus waterfall methodologies in a traditional application.
Although these characteristics provide a nice and neat delineation between new applications and traditional applications, this delineation does not necessarily reflect the real world in the experience of the authors of this book. In reality, various enterprise systems fall somewhere on a spectrum between application or system types.
It is for this reason that the term variable speed IT is preferred over bimodal or two-speed IT. The term “variable speed IT” is a more accurate term, and it is descriptive of the approach that Walmart took to tackle these challenges from an IBM z/OS perspective. However, maintaining a simple, two-world view such as this one is beneficial for our purposes.
The challenge is dealing with the disparity in speed when involvement from both types of approaches is needed to achieve an objective. In a theoretical example, systems of engagement developers who are responding to a market demand might need to quickly add or enhance a mobile application feature that depends on certain systems of record components.
Aversion to operational risk, complexity, or highly stringent governance around the traditional system of record components hinders the ability to move with the necessary speed. This aversion is not good for the business and it is all the more troublesome that each of these IT speeds is valid and appropriate for what they represent. So, what can you do? Embracing DevOps provides a solution.
7.2 The role of DevOps
DevOps is not a technology challenge. Although technology plays a part, DevOps is largely about people, culture, and process. Walmart does not endorse any specific implementation or particular tool, but offers some points to consider based on their experiences.
Shifting to an organizational perspective, it is worth noting that the natural alignment between the speeds of IT and the groups that are associated with the development and operations teams is a logical and obvious one. Figure 7-2 on page 70 shows the priorities that are associated with each team.
Figure 7-2 DevOps implications
The development and operations teams are concerned with a particular set of attributes that are relevant to their respective areas of responsibility. Consider the following points:
The development team must progress and evolve applications, while the operations team needs to ensure that capabilities are maintained.
Development must be agile and quick, while operations must ensure that reliability, availability, and serviceability (RAS) is not compromised.
Development must provide new features, while operations must deliver scalability and stability.
Development must ensure quality in new business services, while operations must ensure quality in capabilities.
Both teams are focused on the same attributes. These teams might approach the achievement of these characteristics in a different manner, but they both care about quality of service. In fact, both teams (and all other siloed factions within the organization) ultimately care about the same goal: to provide high quality, reliable, added value technology solutions that support and enhance their business. But, limitations on visibility and focuses of specialization result in a disconnectedness regarding the tasks at hand that can affect end-to-end quality. To improve this situation, these groups must be brought together and converge around their shared goals and perspectives.
DevOps idea now focuses on the philosophy built around enhancing communication and collaboration between development and operations areas. This philosophy is supported by automation and tools that enable tracking and feedback. The result is the regular delivery of high-quality, reliable, scalable technology solutions that enhance the business.
This is also where the z/OS cloud services team at Walmart focuses and where contributions become relevant. How do you support rapid development to meet new business objectives (while preserving the sanctity of core systems) given that many of the intellectual property assets within those systems remain highly valuable to existing business? How do you engage the personnel responsible for these systems when they are often exclusively operationally focused? Walmart chose to develop cloud services.
7.3 Cloud services
The approach at Walmart is to establish a layer that acts as a differential to accommodate the different speeds of new development versus the needs of traditional systems. As shown in Figure 7-3, this layer consists of the following primary components:
Figure 7-3 Supporting isolation, abstraction, and services with feedback, culture, integration, and automation
Isolation
Walmart used the advanced virtualization capabilities of the z/OS platform to establish hosting environments for new workloads that are isolated from traditional workloads, while maintaining proximity to valuable data assets in core systems.
These isolation zones consist of separate sysplexes and hosting environments (such as dedicated region clusters) within sysplexes that are hosting traditional workloads. Virtualized isolation gives the operational staff the confidence that core systems are protected. At the same time, virtualized isolation provides locally optimized access to the related assets.
Abstraction
Walmart used various mechanisms to abstract the complexities of the core systems and the platform from the developer community to address skills gaps and promote integration.
The primary mechanism is a RESTful API. Generally, developers do not need to know that the services are hosted on a mainframe. This approach enabled Walmart to move services from one data center to another with no effect on users. This configuration provides the freedom for consumers (the developers) and service providers to change their implementations as needed.
Services
Walmart established a suite of utility services that was designed and developed following the cloud computing service delivery model, as defined by National Institute of Standards and Technology (NIST) publication SP 800-1451, and including on-demand self-service and rapid elasticity.
All services are provisioned and made available by the use of RESTful APIs, which provide accessibility from any application platform with abstraction. Providing capabilities in this manner also promotes more loosely coupled application architectures that result in increased agility.
The approach of using z/OS and CICS to deliver the cloud services solution provides developers with quick, easy access to an array of services that give them more autonomy when interacting with core systems. Previous sections in this book described caching, object stores, and ID generators, but Walmart developed more services, including data access services and messaging. By providing these services by using the fully automated delivery of standardized components, the risk is reduced, while maintaining a level of operational control and confidence.
Some of the services provide lightweight entry points for data assets and can be used in place of heavier, more resource-intensive calls to the true back-end systems of record. Walmart noted a reduction in net CPU consumption by as much as 70% for some applications. Consuming these resources more efficiently leads to reduced operational costs, more stable environments, and frees up resources to satisfy other business needs.
As described in 7.2, “The role of DevOps” on page 69, DevOps is not only about technology; it also is about people, culture, and process, which can be addressed by the following factors:
Feedback
Feedback is important in various ways. The cloud services team spent many hours engaging directly with application developers to get their input and feedback about services and features.
The application developers are guiding the development efforts of the cloud services team. The team also works closely with the systems administrators to ensure that their needs and concerns are addressed.
The cloud services team, and some of the operations groups, developed dashboard services to provide immediate feedback about system and service activities. Communication and information were critical.
Integration
You must understand how services are to be consumed. Application developers might not be familiar with the mainframe, so choose open standards that are familiar to a wide audience. For more information, see Chapter 2, “The service consumer” on page 7.
Culture (or attitude)
This issue was likely the most challenging aspect of the work to develop cloud services, mostly in the operational groups due largely to all of the issues that were described in this chapter regarding the role of DevOps, such as maintaining the integrity of systems. Despite the challenges, Walmart made, and continues to make, progress.
Automation
Automation is vital. Self-service provisioning must work at the speed of IT. For more information, see Chapter 3, “The service provider” on page 13.
By providing on-demand, self-service access to services, the cloud services team allows developers to move faster. They have experience of new applications being deployed to production in as little as one week after acquiring resources in the development and quality assurance (QA) environments. This time frame is expected for all new development.
The ability to move fast enables companies to become agile. By using automation, the time and effort that are invested in acquiring a service instance are nearly eliminated, which leads to a degree of freedom that promotes experimentation and rapid, iterative development. Some of the services were used in ways that were never considered during design and creation.
The cloud services team at Walmart is small (only a few people) but with much talent and a good mix of skills. Through this approach and with the correct attitude, they made a substantial impact with relatively little investment. This benefit is realized downstream and through removal of waste from processes.
7.4 Promotion to production
As described in Chapter 2, “The service consumer” on page 7, application developers at Walmart use a web portal to provision an instance of a cloud service for development. At the same time, the automation behind the portal also provisions an instance of the service in a QA environment. This approach enables the application to be submitted for testing without any further interaction. When developers who requested that the service be ready to go live, they approach the cloud services team with an approved change control record.
Resources, such as CPU and storage that is consumed by a service instance in QA, are used as a model to provision a service instance in production quickly. The process is automated, but currently requires a member of the cloud services team to start it. The long-term plan is for the provisioning process to automatically forward the usage estimates that are provided by the developer (for example, transactions per second) to the capacity planning team.
After the service instance is available in production, the developer can promote the application. To access this production version, all a developer must do is change the host and port for the service. As described in Chapter 3, “The service provider” on page 13, the rest of the Uniform Resource Identifier (URI) stays the same.
The development and QA instances of cloud services are free in Walmart’s environment, a practice that is in line with most public cloud providers with which the on-premises Walmart services compete. This practice, along with the speed of provisioning, removes a critical barrier to adoption by developers.
At Walmart, all development service instances are hosted in the same pool of CICS regions. There is no security (authentication or SSL) in this environment, the configuration of which can be another potential barrier to adoption by developers. Full security is available as an option in QA for testing a configuration that is then carried forward into production. Although this configuration puts much responsibility on the developers, it enables them to move faster. In production, applications might be allocated to their own pool of CICS regions or in close proximity to their data source for improved performance. For more information, see Chapter 6, “Operational considerations” on page 59.
7.5 Process and governance
Before requesting a cloud service in production, there are some governance steps a developer must follow. At Walmart, developers must obtain approval from a senior developer and their management. They must also demonstrate that funds are available by providing a valid project ID. The process today uses change control procedures and is not fully automated. However, the provisioning process takes only seconds.
Walmart plans to streamline the process by generating change control records by using an auto-progressing workflow. This workflow will also notify resource owners (for example, storage and CPU owners) of the requirements for each new service instance.
 
Note: IBM UrbanCode™ Deploy provides automation for the process and governance of moving application changes from development through QA to production, including any required approvals by team members. CICS provides a plug-in for deploying all styles of application. For more information, see CICS and DevOps: What You Need to Know, SG24-8339.
7.6 Service versioning
With the two-week sprints that are used at Walmart, it is necessary to deliver capability in small, reliable pieces, which means regularly updating each service.
Changes are first pushed to development and then QA before production. All code is under version control, with access limited to the cloud services team. Today, all service instances and hence all users, receive the update at the same time. To date, these changes are incremental and compatible with an earlier version. In the future, it might be necessary to make a significant non-backwards compatible change to a service interface, such as a new parameter or data format, which requires developers to change their applications. So, how can changes be made without disrupting developers and users of the applications they provide?
The solution that is being considered at Walmart is semantic versioning. This solution includes version, release, and patch identification, which relates to the implementation of the provided service. For more information, see this Semantic Versioning technical white paper:
At a minimum, the version and release levels will be included in new URIs going forward, as shown in the following example:
(Host:port)/root/type/version/org/tenant/resources
This approach would allow the developer to choose when to upgrade. It would also allow the cloud services team to introduce breaking changes. The ability to independently deploy services and the applications that use them is the most important principle of microservices architectures.
 
Note: Walmart is evaluating CICS application multi-versioning. This capability was introduced in CICS TS V5.2 and allows several different versions of an application or service to be installed onto the same set of CICS regions without changes to load module names. Users can access different versions of a service by using URI naming conventions, as described in this section. This use allows the introduction of breaking changes without disruption to developers or the use of more infrastructure. For more information, see Cloud Enabling IBM CICS, SG24-8114.
7.7 Summary
It might be said that Walmart adopted a DevOps-style approach to address the variable speed IT challenges. At a minimum, Walmart tried to address many of the same components that are regularly highlighted in the DevOps vernacular. Ultimately, Walmart is not too interested in the label. They are interested in the results, and they achieved some positive results by using this approach.
 

1 For more information about the NIST Definition of Cloud Computing, see this publication: http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset