Chapter 2. IBM Spectrum Scale Erasure Code Edition use cases

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

IBM Spectrum Scale Erasure Code Edition use cases

Many uses cases are well-suited for IBM Spectrum Scale Erasure Code Edition (ECE) storage. In this chapter, we focus on several specific examples that best take advantage of the unique features of ECE.

The uses cases that are presented in this chapter is not an exhaustive list of the applications of ECE. In general, any workload that is suited for IBM Spectrum Scale works well with ECE storage if the ECE servers are configured with the appropriate combination of storage, compute, and network resources.

Contact IBM Support for more information about how to configure ECE for your storage use cases.

This chapter includes the following topics:

•2.1, “High-performance tier for ML/DL and analytics” on page 14

•2.2, “High-performance file serving with CES protocol nodes” on page 14

•2.3, “High-capacity data storage” on page 16

2.1 High-performance tier for ML/DL and analytics

Machine learning (ML), deep learning (DL), and analytics applications that are running across multiple processors and GPUs that demand high I/O performance. ECE’s ability to provide a reliable and efficient storage system from NVMe and SAS SSD devices in storage rich servers make it an excellent choice to meet these needs.

By combining fast storage with low-cost storage and by using IBM Spectrum Scale tiering or AFM, data can automatically be moved to this high-speed tier as required. This combination can provide cost-effective capacity with high-speed file access, all in the same global namespace.

To take advantage of low-latency storage, such as NVMe, a low-latency network is critical. InfiniBand and RDMA are best suited for a high-speed low-latency connection by eliminating overhead that is involved in TCP connections. If an InfiniBand network is not possible, Ethernet can be used. With Ethernet, a dedicated high-speed network with minimal switch hops and no competing traffic provides the best performance.

A typical high-performance solution consists of one or more ECE building blocks, each containing servers with multiple NVMe drives. A building block must have at least 12 devices and can contain up to 512 devices. Because of the parallel operation of IBM Spectrum Scale, multiple building blocks can be used to scale out capacity and performance because data is be striped across the building blocks.

High-capacity, low-cost HDD drives can be used to extend file system capacity with a tier for cold data, which can be provided by ECE, or by another subsystem, such as a high capacity model of ESS. When deploying a mixed-storage system, two options are available: HDD and NVMe/SSD storage can be mixed in the same ECE building block, or HDD and NVMe/SSD can be contained in separate ECE building blocks.

When configuring building blocks, it is important to remember that all the nodes within a single building block must contain identical hardware. Splitting HDD and NVMe devices into separate building blocks can include a greater up-front hardware cost. However, it provides more flexibility for future growth because the two storage tiers can be grown independently of each other.

2.2 High-performance file serving with CES protocol nodes

We typically recommend the use of the native IBM Spectrum Scale Network Shared Disk (NSD) protocol whenever possible, which means installing the IBM Spectrum Scale native client on your application nodes. However, certain scenarios exist in which this configuration is not practical. For these situations, IBM Spectrum Scale provides the Cluster Export Services (CES) functionality to support data access to users outside of the cluster by using industry standard protocols.

The CES nodes work as highly available gateways and provide multiple front end protocols. always using the NSD protocol on the backend. CES protocol nodes allow clients to access an IBM Spectrum Scale file system by using Network File System (NFS), Server Message Block (SMB), and OpenStack Swift as front end protocols.

For more information about the CES protocol nodes, see IBM Knowledge Center.

Consider the following points as guidelines for your solution:

•Separate the front end protocol traffic from the back end NSD traffic by using different network interfaces.

•The backend network must provide at least as much bandwidth as the front end network.

• Although running CES within an ECE node is supported by using the RPQ process, we recommend dedicated CES protocol nodes for high-performance workloads.

•When CES nodes are separated from ECE, you can connect CES nodes to multiple ECE nodes that belong to different building blocks or Recovery Groups (RGs).

•ECE nodes within a recovery group must be configured alike. If you run CES on the ECE nodes, you must run it on all of them. Because SMB limits the number of protocol nodes to 16, the use of SMB on ECE nodes limits the size of the recovery group to 16 nodes.

•Clients that are using different protocols (NSD, NFS, SMB, SWIFT Object, HDFS) all can share access to the same data, which is subject to some limitations because of the differences between the protocols.

Note: Hadoop HDFS protocol also is supported natively in IBM Spectrum Scale, which allows clusters of Hadoop nodes to access data that is stored in an IBM Spectrum Scale filesystem. This configuration does not rely on the use of CES protocol nodes.

Figure 2-1 shows an example setup with 16 ECE nodes that form the backend cluster.

Figure 2-1 ECE nodes and separate CES nodes

The configuration that includes 16 ECE nodes and three CES nodes that is shown in Figure 2-1 is only an example. Up to 32 ECE nodes can be in each recovery group, and multiple recovery groups can be used. Up to 16 CES nodes can be used if SMB is used, or up to 32 if SMB is not used.

The back-end network (green in Figure 2-1 on page 15) must provide at least as much bandwidth as the front-end network (orange in Figure 2-1 on page 15. On the CES nodes, the front-end and back-end networks must be on separate network ports.

Although we recommend separating CES and ECE functions onto different servers when possible, we also support running them on the same servers. Figure 2-2 shows an example of CES and ECE functions running on the same nodes.

Figure 2-2 ECE nodes and converged CES nodes

Because all nodes in an ECE recovery group must be configured alike, all of them must run the CES function in the converged configuration. When SMB protocol is used, the number of nodes is limited to 16.

For performance reasons, it is advantageous to separate the front-end and back-end networks, as shown in Figure 2-2.

For more information about protocol nodes, see IBM Knowledge Center.

2.3 High-capacity data storage

ECE can scale out to multiple petabytes of usable storage that is presented as a single unified namespace. This scale out is possible by using a large number of high-density storage nodes with high capacity hard drives that are configured in multiple building blocks.

When planning a high-capacity system, it might be necessary to divide the system into multiple ECE building blocks. Each building block consists of 4 - 32 nodes, and up to 512 drives. As of this writing, a limit of 24 internal drives per node is imposed.

VDisks and declustered arrays span the nodes of the building block and provide failure protection for the associated recovery group. For example, an 8+3P vdisk in a 16-node recovery group can tolerate failure of three of the 16 nodes. If the cluster consists of more than one of these recovery groups, each of them can independently tolerate failure of three of its 16 nodes.

By distributing the nodes of the recovery groups across racks or other high-level hardware failure domains, you can take advantage of this property to implement rack-level or higher fault tolerance. When planning such configurations, request assistance from IBM to ensure that the failure domains are understood and configured.

In a high-capacity deployment, it is recommended to include a smaller high-speed tier of storage for Spectrum Scale file system metadata. A metadata tier improves the performance of file system scans, file creates and deletes, directory listings, and other metadata intensive workloads. Lower cost storage can then be used for most of user data.

It is also possible to use tape or cloud storage as an even less expensive tier for archival data. The IBM Spectrum Scale Information Lifecycle Management (ILM) function automatically manages these tiers, moving data between them according to changeable policies.

High-capacity ECE systems can be easily expanded without interruption to user applications. Expansion can be done by adding nodes to recovery groups, or by adding recovery groups to the system. The nodes within each recovery group must have identical hardware configurations; however, no such restriction exists between recovery groups. Therefore, newly added recovery groups can take advantage of capacity and speed improvements as new technologies become available.

Spectrum Scale storage pools can be used to organize and move data seamlessly between different types of hardware and storage tiers, which gives users flexibility in managing the storage while keeping everything together in a single file system namespace.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2. IBM Spectrum Scale Erasure Code Edition use cases

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 2. IBM Spectrum Scale Erasure Code Edition use cases