Introduction to technical cloud computing
This chapter introduces the concept of technical computing, the value of cloud computing, and the types of cloud for enterprises.
This chapter includes the following sections:
1.1 What is Technical Computing
This section describes Technical Computing.
1.1.1 History
This section introduces the history of high-performance computing (HPC) and how Technical Computing became mainstream.
Traditional high-performance computing (HPC)
The IT Industry has always tried to maintain a balance between demands from business to deliver services against cost considerations of hardware and software assets. On one hand, business growth depends on information technology (IT) being able to provide accurate, timely, and reliable services. On other hand, there is cost associated with running IT services. These concerns have led to the growth and development of HPC.
HPC has traditionally been the domain of powerful computers (called “supercomputers”) owned by governments and large multinationals. Existing hardware was used to process data and provide meaningful information to single systems working with multiple parallel processing units. Limitations were based on hardware and software processing capabilities. Due to the cost associated with such intensive hardware, the usage was limited to a few nations and corporate entities.
The advent of the workflow-based processing model and virtualization as well as high availability concepts of clustering and parallel processing have enabled existing hardware to provide the performance of the traditional supercomputers. New technologies such as graphics processing units (GPUs) have pushed power of the existing hardware to perform more complicated functions faster than previously possible. Virtualization and clustering have made it possible to provide a greater level of complexity and availability of IT services. Sharing of resources to reduce cost has also become possible due to virtualization. There has been a move from a traditionally static IT model based on maximum load sizing to a leaner IT model based on workflow-based resource allocation through smart clusters. With the introduction of cloud technology, the resource requirement is becoming more on-demand as compared to the traditional forecasted demand, thus optimizing cost considerations further.
These technological innovations have made it possible to push the performance limits of existing IT resources to provide high performance output. The technical power to achieve computing results can be achieved with much cheaper hardware using smart clusters and grids of shared hardware. With workflow-based resource allocation, it is possible to achieve high performance from a set of relatively inexpensive hardware working together as a cluster. Performance can be enhanced by breaking across silos of IT resources, lying dormant to provide on-demand computing power wherever required. Data intensive industries such as engineering and life sciences can now use the computing power on demand provided by the workflow-based technology. Using parallel processing by heterogeneous resources that work as one unit under smart clusters, complex unstructured data can be processed to feed usable information into the system.
Mainstream Technical Computing
With the reduction in the cost of hardware resources, the demand for HPC has spread technical computing from scientific labs to mainstream commercial applications (Figure 1-1 on page 3). Technical computing has been demanded from sectors such as aerodynamics, automobile design, engineering, financial services, and oil and gas Industries. Improvement in cooling technology and power management of these superfast computing grids have allowed users to extract more efficiency and performance from existing hardware.
Increased complexity of applications and demand for faster analysis of data has led Technical Computing to become widely available. Thus, IBM Technical Computing is focused on helping clients to transform their IT infrastructure to accelerate results. The goal of Technical Computing in mainstream industries is to meet the challenges of applications that require high performance computing, faster access to data, and intelligent workload management.
Figure 1-1 Technical Computing goes mainstream
Defining cluster, grids, and clouds
The following provides a description of the terminology used in this book.
Cluster Typically an application or set of applications whose primary aim is to provide improved performance and availability at a lower cost as compared to a single computing system.
Grid Typically a distributed system of homogeneous or heterogeneous computer resources for general parallel processing of related workflow that is usually scheduled using advanced management policies.
Cloud A system (private or public) that allows on-demand self service such as resource creation on demand, dynamic sharing of resources, and elasticity of resource sizing based on advanced workflow models.
IBM Platform Computing solutions have gone through the evolution from cluster to grid to cloud due to its abilities to manage heterogeneous complexities of distributed computing resources. Figure 1-2 shows the evolution of clusters, grids, and HPC clouds.
Figure 1-2 Cluster, grid and High Performance Computing (HPC) cloud evolution
IBM Platform Computing provides solutions for mission-critical applications that require complex workload management across heterogeneous environment for diverse industries from life sciences to engineering and financial sectors that involve complex risk analysis. IBM Platform Computing has a 20-year history of working on highly complex solutions for some of the largest multinational companies. It has proven examples of robust management of highly complex workflow across large distributed environments that deliver results.
1.1.2 Infrastructure
This section provides a brief overview of the components (hardware, software, storage) available to help deploy a technical computing cloud environment. The following sections provide a subset of the possible solutions.
Hardware (computational hardware)
IBM HPC and IBM Technical Computing provide flexibility in your choice of hardware and software:
IBM System x
IBM Power Systems
IBM General Parallel File System (GPFS)
Virtual infrastructure OpenStack
Software
In addition to this list, IBM Platform Computing provides support to heterogeneous cluster environments with extra IBM or third-party software (Figure 1-3):
IBM Platform LSF®
IBM Platform Symphony®
IBM Platform Computing Management Advanced Edition (PCMAE)
IBM InfoSphere BigInsights
IBM GPFS
Bare Metal Provisioning through xCAT
Solaris Grid Engine
Open Source Apache Hadoop
Third party schedulers
Figure 1-3 Overview of Technical Computing and analytics clouds solution architecture
Networking (high bandwidth, low latency)
IBM Cluster Manager tools helps use the bandwidth of the network devices to lower the latency levels. The following are some of the devices supported:
IBM RackSwitch™ G8000, G8052, G8124, and G8264
Mellanox InfiniBand Switch System IS5030, SX6036, and SX6512
Cisco Catalyst 2960 and 3750 switches
Storage (parallel storage and file systems)
IBM Cluster Manager tools use storage devices capable of high parallel I/O to help provide efficient I/O related operations in the cloud environment. The following are some of the storage devices that are used:
IBM DCS3700
IBM System x GPFS Storage Server
1.1.3 Workloads
Technical computing workloads have the following characteristics:
Large number of systems
Heavy resource usage including I/O
Long running workloads
Dependent on parallel storage
Dependent on attached storage
High bandwidth, low latency networks
Compute intensive
Data intensive
The next section provides a few technologies that support technical computing workloads.
Message Passing Interface (MPI)
HPC clusters frequently employ a distributed memory model to divide a computational problem into elements that can be simultaneously run in parallel on the hosts of a cluster. This often involves the requirement that the hosts share progress information and partial results by using the cluster’s interconnect fabric. This is most commonly accomplished by using a message passing mechanism. The most widely adopted standard for this type of message passing is the MPI standard, which is described on the following website:
IBM Platform MPI is a high-performance and production-quality implementation of the MPI standard. It fully complies with the MPI-2.2 standard, and provides enhancements such as low latency and high bandwidth point-to-point and collective communication routines over other implementations.
For more information about IBM Platform MPI, see the IBM Platform MPI User’s Guide, SC27-4758-00, at:
Service-oriented architecture (SOA)
SOA is a software architecture in which the business logics are encapsulated and defined as services. These services can be used and reused by one or multiple systems that participate in the architecture. SOA implementations are generally platform-independent, which means that infrastructure considerations do not get in the way of deploying new systems or enhancing existing systems. Many financial institutions deploy a range of technologies, so the heterogeneous nature of SOA is particularly important.
IBM Platform Symphony combines a fast service-oriented application middleware component with a highly scalable grid management infrastructure. Its design delivers reliability and flexibility, while also ensuring low levels of latency and high throughput between all system components.
For more information about SOA, see:
MapReduce
MapReduce is a programming model for applications that process large volumes of data in parallel by dividing the work into a set of independent tasks across many systems. MapReduce programs in general transform lists of input data elements into lists of output data elements in two phases: Map and reduce.
MapReduce is widely used in the data intensive computing such as business analytics and life science. Within IBM Platform Symphony, the MapReduce framework supports data-intensive workload management using a special implementation of service-oriented application middleware to manage MapReduce workloads.
Parallel workflows
Workflow is a task that is composed by a sequence of connected steps. In HPC clusters, many workflows run in parallel to complete a job or to respond to a batch of requests. As the complexity increases, workflows become more complicated. Workflow automation is becoming increasingly important for these reasons:
Jobs must run at the correct time and in the correct order
Mission critical processes have no tolerance for failure
There are inter-dependencies between steps across systems
Clients need an easy-to-use and cost efficient way to develop and maintain the workflows.
Visualization
Visualization is a typical workload in engineering for airplane and automobile designers. The designers create large computer-aided design (CAD) environments to run their 2D/3D graphic calculations and simulations for the products. These workloads demand a large hardware environment that includes graphic workstations, storage, and software tools. In addition to the hardware, the software licenses are also expensive. Thus, the designers are looking to reduced costs, and expect to share the infrastructure between computer-aided engineering (CAE) and CAD.
1.2 Why use clouds?
Implementing a cloud infrastructure can be the ideal solution for companies who do not want to invest in a separate cluster infrastructure for technical computing workloads. It can reduce, among other things, extra hardware and software costs and avoid the extra burden of another cluster administration. Cloud also provides the benefits of request on demand and release on demand after the work is completed, which saves time for deployments and the expenses to a certain extent. For technical computing, the hardware requirements are usually large considering the workloads that it must manage. Although the physical hardware runs better in HPC environments, evolving virtualization technologies have started to provide room for HPC solutions as well. Using a computing cloud for HPC environments can help eliminate the static usage of the infrastructure. It can also help provide a way to use the hardware resources dynamically as per the computing requirements.
1.2.1 Flexible infrastructure
Cloud computing provides the flexibility to use the resources when required. In terms of a technical computing cloud environment, cloud computing not only provides the flexibility to use the resources on demand, but helps to provision the computing nodes as per the application requirement to help manage the workload. By implementing and using IBM Platform Computing Manager (PCM), dynamic provisioning of the computing nodes with the wanted operating systems is easily achieved. This dynamic provisioning solution helps to better use the hardware resources and fulfill various technical computing requirements for managing the workloads. Figure 1-4 shows the infrastructure of an HPC cloud.
Figure 1-4 Flexible infrastructure with cloud
1.2.2 Automation
Cloud computing can significantly reduce manual effort during installation, provisioning, configuration, and other tasks that were performed manually before. When done manually, these computing resource management steps can take a significant amount of time. A cloud-computing environment can dramatically help reduce the system management complexity by implementing automation, business workflows, and resource abstractions.
IBM PCMAE provides many automation features to help reduce the complexity of managing a cloud-computing environment:
Rapidly deployment of multiple HPC heterogeneous clusters in a shared hardware pool.
Self-service, which allows users to request a custom cluster, specifying size, type, and time frame.
Dynamically grow and shrink (flex up and down) the size of a deployed cluster based on workload demand, calendar, and sharing policies.
Share hardware across clusters by rapidly reprovisioning the resources to meet the infrastructure needs (for example, Windows and Linux, or a different version of Linux).
These automation features reduce the time that is required to make the resources available to clients.
1.2.3 Monitoring
In a cloud computing environment, many computers, network devices, storage, and applications are running. To achieve high availability, throughput, and resource utilization, clouds have monitoring mechanisms. Monitoring measures the service and resource usage, which is key for charge back to the users. The system statistics are collected and reported to the cloud provider or user, and based on these figures, dashboards can be generated.
Monitoring provides the following benefits:
Avoids outages by checking the health of the cloud-computing environment
Improves resource usage to help lower costs
Identifies performance bottlenecks and optimizes workloads
Predicts usage trend
IBM SmartCloud Monitoring 7.1 is a bundle of established IBM Tivoli infrastructure management products, including IBM Tivoli Monitoring and IBM Tivoli Monitoring for Virtual Environments. The software delivers dynamic usage trending and health alerts for pooled hardware resources in the cloud infrastructure. The software includes sophisticated analytics, and capacity reporting and planning tools. You can use these tools to ensure that the cloud is handling workloads quickly and efficiently.
For more information about IBM SmartCloud Monitoring, see the following website:
1.3 Types of clouds
There are three different cloud-computing architectures:
Private clouds
Public clouds
Hybrid clouds
A private cloud is an architecture where the client encapsulates its IT capacities “as a service” over an intranet for their exclusive use. The cloud is owned by the client, and is managed and hosted by the client or a third party. The client defines the ways to access the cloud. The advantage is that the client controls the cloud so that security and privacy can be ensured. Also, the client can customize the cloud infrastructure based on its business needs. A private cloud can be cost effective for a company that owns many computing resources.
A public cloud provides standardized services for public use over the Internet. Usually it is built on standard and open technologies, providing web page, API or SDK for the consumers to use the services. Benefits include standardization, capital preservation, flexibility, and improved time to deploy.
Clients can integrate a private cloud and a public cloud to deliver computing services, which is called hybrid cloud computing. Figure 1-5 highlights the differences and relationships of these three types of clouds.
Figure 1-5 Types of clouds
Why an IBM HPC cloud
IBM HPC clouds can help enable transformation of both your IT infrastructure and business. Based on an HPC cloud’s potential impact, clients are actively evolving their infrastructure toward private clouds, and beginning to consider public and hybrid clouds. Clients are transforming their existing infrastructure to HPC clouds to enhance the responsiveness, flexibility, and cost effectiveness of their environment. This transformation helps clients enable an integrated approach to improve computing resource capacity and to preserve capital. Eventually the client will access extra cloud capacity by using the cloud models described in Figure 1-5.
In a public cloud environment, HPC must overcome a number of significant challenges as shown in Table 1-1.
Table 1-1 Challenges of HPC in a public cloud
Challenges in a public cloud
 
Security
Cloud providers do not provide guarantees for data protection
IP in-flight outside the firewall and on storage devices
Application licenses
Legal agreements (LTUs) can limit licenses to geographic areas or corporate sites
Unlimited licenses can be significantly more expensive
Business advantage
Cloud resources are expensive compared to local resources if used incorrectly
Building and automating business policy for using cloud can be difficult
Performance
If applications run poorly in a private cloud, the applications will not improve in public clouds
Data movement
Data must be replicated in the cloud before jobs can run
Providers charge for data in/out and storage
When using private clouds, HPC might not suffer from the public cloud barriers, but there are other common issues as shown in Table 1-2.
Table 1-2 Issues that a private cloud can address for High Performance Computing (HPC)
Issues
Details
Inefficiency
Less than fully used hardware
High labor cost to install, monitor, and manage HPC environments
Constrained space, power, and cooling
Lack of flexibility
Resource silos that are tied to a specific project, department, or location
Dependency on specific individuals to run technical tasks
Delayed time to value
Long provisioning times
Limited ability to fulfill peak demand
Constrained access to special purposes devices (for example, GPUs)
Figure 1-6 shows the IBM HPC cloud reference model.
Figure 1-6 IBM HPC cloud
The HPC private cloud has three hosting models: Private cloud, managed private cloud, and hosted private cloud. Table 1-3 describes the characteristics of these models.
Table 1-3 Private cloud models
Private cloud model
Characteristics
Private cloud
Client self hosted and managed
Managed private cloud
Client self hosted, but third-party managed
Hosted private cloud
Hosted and managed by a third party
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset