Chapter 6. IBM Platform High Performance Computing

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

IBM Platform High Performance Computing

In this chapter, we introduce and describe the IBM Platform High Performance Computing (HPC) product offering. IBM Platform HPC is a complete management product that includes elements of the IBM Platform cluster and workload management capabilities integrated into an easy-to-install, easy-to-use offering.

IBM Platform HPC facilitates quick, efficient implementation of clusters typically used for traditional HPC applications or other applications requiring a managed set of similar computers connected by a private network. Rather than have to install multiple packages and integrate them, IBM Platform HPC provides a single installer for creating a cluster, and a “kit” packaging concept that simplifies the addition of new functions or resources. It also provides a single unified web portal through which both administrators and users can access and manage the resources in the cluster.

The following topics are discussed in this chapter:

•Overview

•Implementation

6.1 Overview

Clusters based on open source software and the Linux operating system dominate high performance computing (HPC). This is due in part to their cost-effectiveness and flexibility and the rich set of open source applications available. The same factors that make open source software the choice of HPC professionals also make it less accessible to smaller centers. The complexity and associated cost of deploying open source clusters threatens to erode the very cost benefits that made them compelling in the first place. It is not unusual for a modest-sized cluster to be managed by someone who is primarily a user of that cluster and not a dedicated IT professional, whose time is much better spent on their primary responsibilities.

IBM Platform HPC enables clients to sidestep many overhead cost and support issues that often plague open source environments and enable them to deploy powerful, easy to use clusters without having to integrate a set of components from different sources. It provides an integrated environment with a single point of support from IBM. Figure 6-1 shows the relationship of the components of IBM Platform HPC.

Figure 6-1 IBM Platform HPC components

6.1.1 Unified web portal

The most frequently used functions of IBM Platform HPC are available through a unified web-based administrator and user portal. This “single pane of glass” approach gives a common view of system resources and tools, rather than requiring the use of multiple interfaces for different functions. The web portal is based on the common Platform Management Console elements that are used in other IBM Platform interfaces, but is tailored specifically to small to moderate-sized HPC clusters. The interface includes an integrated help facility that provides a rich, hypertext set of documentation on the configuration and use of the product.

We advise reviewing the online help before using and configuring the product. It is available through the web portal GUI on the master host immediately after installation, by directing your web browser to the address of the master host. Figure 6-2 shows the administrator’s initial view of the GUI with the help panel drop-down menu highlighted.

The web portal includes vertical tabs on the left for sections that relate to jobs (work submitted to the cluster), resources (elements of the cluster), and settings (relating to the web portal itself). A user without administrator privileges sees only the Jobs tab.

Plug-ins: The web portal uses Adobe Flash and Java Runtime browser plug-ins to render graphs and control job submission. For proper operation of the Portal on a Linux x86_64 machine, we used Flash 11.2 and either IBM or Oracle Java 7 Update 5 plug-ins with the Firefox 10.0.5 browser. If you are installing Flash on a Microsoft Windows system, read the installation dialog carefully or you might install unwanted additional software that is not required for IBM Platform HPC.

Figure 6-2 Administrator’s initial view of the GUI

6.1.2 Cluster provisioning

There are a number of software elements that must be installed and managed to successfully operate a cluster. These elements include the Linux operating system, drivers and software for an InfiniBand or other high-speed network fabric, message passing libraries, and the applications that are used on the cluster. It is essential that each of these elements is installed in a consistent manner on every system in the cluster, and that these configurations can be easily reproduced in the event of a hardware failure or the addition of more hardware to the cluster. You might also need to support different versions of these elements to support different applications or different users.

This cluster provisioning function is provided by the elements of IBM Platform Cluster Manager included in the IBM Platform HPC product. As with other IBM Platform Computing provisioning tools, physical machines (hosts) are provisioned via network boot (Dynamic Host Configuration Protocol (DHCP)) and image transfer (TFTP/HTTP).

Important: IBM Platform HPC does not support provisioning of virtual machines.

Provisioning can be done to the local disk of the host either by the native package manager of the distribution (a “packaged” installation) or by installing a predefined image from the master host (a “disked” or “imaged” installation). You might also install a predefined image from the master host directly into a memory resident disk image on the target host, leaving the contents of the local disk undisturbed (a “diskless” installation). Table 6-1 lists the advantages and disadvantages of each provisioning method.

Table 6-1 Provisioning method advantages and disadvantages

Method	Advantages	Disadvantages
Packaged	One template can cover different hardware. Non-disruptive package additions.	Slower than image-based methods.
Imaged	Fast provisioning.	Requires reprovisioning to add packages. Might carry hardware dependencies.
Diskless	Fast provisioning. Eliminates requirement for disk. Might leave existing OS in place on disk.	Same as Imaged, plus: Reduces memory available for applications. Requires careful tuning to minimize memory footprint.

The provisioning method, as well as the specific packages and configuration to be used on the host, are controlled by a “Provisioning Template” (in the web portal) or “node group” (in the command-line interface (CLI) interface). These terms are equivalent. Several templates are provided with the product; create your own custom templates by copying and editing one of the provided templates, either through the web portal (Figure 6-3 on page 185) or through the ngedit CLI (Figure 6-4 on page 185).

Figure 6-3 Selecting and editing a custom template

Figure 6-4 Editing the template via the CLI

6.1.3 Workload scheduling

To effectively share the resources of a cluster among multiple users, and to maintain a queue of work to keep your cluster busy, some form of batch scheduling is needed. IBM Platform HPC includes batch job scheduling and workload management with the equivalent scheduling functions of the IBM Platform Load Sharing Facility (LSF) Express Edition. However, unlike Express Edition, it is not limited to 100 nodes in a cluster. For a complete description of the features of IBM Platform LSF Express Edition, see Chapter 4, “IBM Platform Load Sharing Facility (LSF) product family” on page 27.

Integrated application scripts and templates

IBM Platform HPC includes a facility for defining templates for frequently used applications to simplify the submission of jobs using these applications. This is a simpler version of the IBM Platform Applications Center that is discussed in 4.2.1, “IBM Platform Application Center” on page 36. This version does not support complex job flows as provided by IBM Process Platform Manager. A set of sample application templates shown in Figure 6-5 is provided with the installation. Use the “Save As” and “Modify” controls to create your own application templates from these samples.

Figure 6-5 Application templates

6.1.4 Workload and system monitoring and reporting

After a cluster is provisioned, IBM Platform HPC provides the means to monitor the status of the cluster resources and jobs, to display alerts when there are resource shortages or abnormal conditions, and to produce reports on the throughput and utilization of the cluster. With these tools, you can quickly understand how the cluster resources are being used, by whom, and how effectively the available capacity is utilized. These monitoring facilities are a simplified subset of those facilities provided by the IBM Platform Application Center that is described in 4.2.1, “IBM Platform Application Center” on page 36.

6.1.5 MPI libraries

HPC clusters frequently employ a distributed memory model to divide a computational problem into elements that can be simultaneously in parallel on the hosts of a cluster. This often involves the requirement that the hosts share progress information and partial results using the cluster’s interconnect fabric. This is most commonly accomplished through the use of a message passing mechanism. The most widely adopted standard for this type of message passing is the Message Passing Interface (MPI) interface standard, which is described at this website:

http://www.mpi-forum.org

IBM Platform HPC includes a robust, commercial implementation of the MPI standard, IBM Platform MPI. This implementation comes pre-integrated with the LSF workload manager element of IBM Platform HPC, giving the workload scheduler full control over MPI resource scheduling.

6.1.6 GPU scheduling

The use of special-purpose computational accelerators that are based on high performance graphics processing units (GPUs), which are also sometimes designated as general-purpose graphics processing units (GPGPUs), is popular for HPC applications. The optional IBM Platform HPC GPU Scheduling kit adds the component-platform-lsf-gpu component to recognize and classify NVIDIA GPUs as LSF resources for scheduling purposes. This kit also adds monitoring of GPU temperature and error correction code (ECC) counts.

6.2 Implementation

IBM Platform HPC utilizes a single unified installer for all of the standard elements of the product. Instead of having to install the Cluster Manager (PCM), Workload Manager (LSF), and MPI library, this unified installer approach speeds up implementation and provides a set of standard templates from which a cluster can be built quickly. The installation is handled as a set of “kits”, and the standard kits are included in the unified installer. Other kits can be added later, for example to upgrade the LSF component to a more advanced edition. A kit can be though of as a meta-package that can include RPMs and rules to describe their relationships.

To IBM Platform HPC, a base OS distribution is abstracted into a kit just as are the elements that are added to that base OS. Figure 6-6 on page 188 illustrates the composition of a kit. Related kits are collected into named repositories that are anchored around a specific OS distribution. You can take a snapshot image of a repository at any time to create a reference point for a specific deployment configuration. You can create your own software kits to automate the installation of specific functions and all of their dependencies. For example, you might want to bundle an ISV software application with the library RPMs that are required to support it.

The IBM Platform HPC distribution includes the document, IBM Platform Cluster Manager Kit Builders Guide, which describes how to create your own software kits.

Figure 6-6 Software kits

6.2.1 Installation on the residency cluster

This section describes the installation of the residency cluster.

Preparing for installation

We followed the instructions that are provided in Installing IBM Platform HPC, SC23-5380-00. In the section “Configure and Test Switches”, this document describes the use of the PortFast setting on Cisco switches. Other switch manufacturers can have different names for this, but it involves enabling or disabling the Spanning Tree Protocol (STP) on switch ports. STP is intended to prevent routing loops in a complex switch fabric, but it can add a considerable delay between the time that a server activates its Ethernet port and the time that port is ready to accept traffic. Setting PortFast or the equivalent setting eliminates this delay. This setting needs to be done globally on a switch only if it connects to no switches or to switches that are one level higher. Otherwise, set it only on ports that connect to hosts and STP must be left enabled on ports that connect to other switches.

Installing the software

We installed IBM Platform HPC on a cluster of IBM dx360m3 iDataPlex servers, described in Figure 3-2 on page 21. In addition to the public and private networks that are shown in that diagram, each of our servers has a hardware management connection that is implemented through a shared access VLAN on the public network. The basic installation process is shown in Example 6-1 on page 189 with typed inputs shown in bold.

Example 6-1 Installation process

[root@hpcmh01 HPC-3.2]# python pcm-installer

Preparing PCM installation... [ OK ]

International Program License Agreement

Part 1 - General Terms

BY DOWNLOADING, INSTALLING, COPYING, ACCESSING, CLICKING ON

AN "ACCEPT" BUTTON, OR OTHERWISE USING THE PROGRAM,

LICENSEE AGREES TO THE TERMS OF THIS AGREEMENT. IF YOU ARE

ACCEPTING THESE TERMS ON BEHALF OF LICENSEE, YOU REPRESENT

AND WARRANT THAT YOU HAVE FULL AUTHORITY TO BIND LICENSEE

TO THESE TERMS. IF YOU DO NOT AGREE TO THESE TERMS,

* DO NOT DOWNLOAD, INSTALL, COPY, ACCESS, CLICK ON AN

"ACCEPT" BUTTON, OR USE THE PROGRAM; AND

* PROMPTLY RETURN THE UNUSED MEDIA, DOCUMENTATION, AND

Press Enter to continue viewing the license agreement, or

enter "1" to accept the agreement, "2" to decline it, "3"

to print it, "4" to read non-IBM terms, or "99" to go back

to the previous screen.

Checking the exist of entitlement file [ OK ]

Checking hardware architecture [ OK ]

Checking for OS compatibility [ OK ]

Checking if SELinux is disabled [ OK ]

Checking for presence of '/depot' [ OK ]

Checking for presence of kusudb database [ OK ]

Checking for presence of Kusu RPMs [ OK ]

Checking for required RPMs [ OK ]

Checking for at least 2 statically configured NIC [ OK ]

Checking for the public hostname [ OK ]

Checking for md5 password encryption algorithm [ OK ]

Checking for NetworkManager service [ OK ]

Checking for existing DNS server [ OK ]

Checking for existing DHCP server [ OK ]

Probing for the language/locale settings [ OK ]

Probing for DNS settings [ OK ]

Checking if at least 2.5GB of RAM is present [WARNING]

Select one of the following interfaces to use for the provisioning network:

1) Interface: eth1, IP: 192.168.102.200, Netmask: 255.255.0.0

2) Interface: eth2, IP: 192.168.102.201, Netmask: 255.255.0.0

Select the interface to be used for provisioning [1]: 2

Select one of the following interfaces to use for the public network:

1) Interface: eth1, IP: 192.168.102.200, Netmask: 255.255.0.0

Select the interface to be used for public [1]: 1

Specify private cluster domain [private.dns.zone]: clusterhpc.itso.org

Do you want to set up HPC HA now? (Y/N) [N]: N

Checking for valid mount point for '/depot' [ OK ]

Checking for valid mount point for '/var'

Select one of the following mount points where Kusu should place its '/depot':

1) mount point: '/' FreeSpace: '44GB'

Select the mount point to be used [1]: 1

Adding Kit: 'base'... [ OK ]

Adding Kit: 'os-ofed'... [ OK ]

Adding Kit: 'pcm'... [ OK ]

Adding Kit: 'platform-hpc-web-portal'... [ OK ]

Adding Kit: 'platform-isf-ac'... [ OK ]

Adding Kit: 'platform-lsf'... [ OK ]

Adding Kit: 'platform-lsf-gpu'... [ OK ]

Adding Kit: 'platform-mpi'... [ OK ]

Select the media to install the Operating System from. The Operating System version

must match the installed Operating System version on the installer:

1) DVD drive

2) ISO image or mount point

[1] >> 1

Insert the DVD media containing your Operating System. Press ENTER to continue...

Verifying that the Operating System is a supported

distribution, architecture, version...

[rhel 6 x86_64] detected: [ OK ]

Copying Operating System media. This may take some time [ OK ]

Successfully added Operating System to repository.

Choose one of the following actions:

1) List installed kits

2) Delete installed kits

3) Add extra kits

4) Continue

[4] >> 1

Installed kits:

base-2.2-x86_64

os-ofed-3.0.1-x86_64

pcm-3.2-x86_64

platform-hpc-web-portal-3.2-x86_64

platform-isf-ac-1.0-x86_64

platform-lsf-8.3-x86_64

platform-lsf-gpu-1.0-x86_64

platform-mpi-8.3.0-x86_64

rhel-6-x86_64

Choose one of the following actions:

1) List installed kits

2) Delete installed kits

3) Add extra kits

4) Continue

[4] >> 4

Refreshing the repository [rhel-6.2-x86_64].

This may take some time... [ OK ]

Installing Kusu RPMs. This may take some time... [ OK ]

Running kusurc scripts to finalize installation.

Setting up Kusu db: [ OK ]

Setting up hostname: [ OK ]

Starting initial network configuration: [ OK ]

Setting up High-Availability service: [ OK ]

Setting up httpd: [ OK ]

Setting up dhcpd: [ OK ]

Generating hosts, hosts.equiv, and resolv.conf: [ OK ]

Setting up iptables: [ OK ]

Config mail mechanism for kusu: [ OK ]

Setting up named: [ OK ]

Setting up ntpd: [ OK ]

Preparing repository for compute node provisioning: [ OK ]

Setting up rsyncd for Kusu: [ OK ]

Setting up rsyslog: [ OK ]

Setting up passwordless SSH access: [ OK ]

Setting up SSH host file: [ OK ]

Setting up user skel files: [ OK ]

Setting up xinetd: [ OK ]

Setting up yum repos: [ OK ]

Setting up network routes: [ OK ]

Setting up shared home NFS export: [ OK ]

Setting up syslog on PCM installer: [ OK ]

Set up kusu snmpd configuration.: [ OK ]

Setting up CFM. This may take some time...: [ OK ]

Post actions when failover: [ OK ]

Setting up default Firefox homepage: [ OK ]

Setting up minimum UID and GID: [ OK ]

Setting up fstab for home directories: [ OK ]

Synchronizing System configuration files: [ OK ]

Creating images for imaged or diskless nodes: [ OK ]

Setting appglobals variables: [ OK ]

Disabling unneeded services: [ OK ]

Patch kusu pxe files: [ OK ]

Starting initial configuration procedure: [ OK ]

Setting up motd for PCM: [ OK ]

Running S11lsf-genconfig: [ OK ]

Running S12lsf-filesync.sh: [ OK ]

Increasing ulimit memlock: [ OK ]

Running S55platform-isf-ac-lsf.sh: [ OK ]

Setting npm service for HPC HA: [ OK ]

Running S70SetupPCMGUI.sh: [ OK ]

Running S97SetupGUIHA.sh: [ OK ]

Running S99IntegratePCMGUI.sh: [ OK ]

All existing repos in /etc/yum.repos.d have been disabled. Do re-enable any required repos manually.

The os-ofed kit installs some new kernel modules, you must reboot the installer node to load the new modules.

The installation of Platform HPC is complete.

A complete log of the installation is available at /var/log/pcminstall.log

Run 'source /opt/kusu/bin/pcmenv.sh' to source the required environment variables for this session. This is not required for new login sessions.

Notes on the installation

The installation instructions indicate that the /home directory on the master host must be writable. If this is an NFS-mounted directory, it must be writable by the root account on the master host (exported no-root-squash, or equivalent). After the installer creates the hpcadmin account, this root access is no longer required. For instructions about how to provision an externally hosted NFS directory on your cluster hosts, see “Configuring additional shared directories” on page 196.

Creating a rack configuration

A simple four-rack configuration is included with the base product. You probably want to modify this rack configuration file by using the hpc-rack-tool CLI command and a text editor to match your environment. Extract the current configuration by using hpc-rack-tool export. Edit the file to match your configuration and restore it by using hpc-rack-tool import. We named our rack positions to match the iDataPlex nomenclature, designating the two columns of the iDataPlex rack 1A and 1C, according to iDataPlex conventions. The rack configuration file for our cluster is shown in Example 6-2, and the resulting web portal display is shown in Figure 6-7 on page 193.

Example 6-2 Rack configuration

<?xml version=”1.0” encoding=”UTF-8”?>

</layout>

By default, a rack cannot exceed 42U in height. If your racks are larger, you can change the limit RACKVIEW_MAX_UNIT_NUMBER in the file /usr/share/pmc/gui/conf on the master host. You can then place your provisioned hosts in the correct rack locations. Unmanaged devices, while they reserve an IP address, cannot be assigned a position in your rack.

Figure 6-7 Web portal display of the cluster

Configuring hardware management

Most modern server hardware includes some form of automated management for monitoring and controlling the server independently of the installed operating system. This functionality generally includes power on and off, hardware reset, temperature and power monitoring, hardware error logging, and remote (serial-over-LAN) console. The most common standard is the Intelligent Platform Management Interface (IMPI) and configuring additional shared directories.

The management element in a server is designated as the Baseboard Management Controller (BMC), and this term is generally used to describe such embedded control points even if they do not conform to the IPMI standard. Power control can also be accomplished through smart power distribution units that implement their own protocols. IBM Platform HPC supports server management via standard IPMI BMC interfaces, as well as other BMC protocols and managed power distribution units through plug-ins. These plug-ins are written in python and can be found in /opt/kusu/lib/python/kusu/powerplugins. Our IBM iDataPlex hardware uses the IPMI v2 plug-in.

To implement hardware management, you need to define a BMC network (see next section) and possibly edit the power control configuration files that are located in the directory /opt/kusu/etc. The /opt/kusu/etc/kusu-power.conf table is populated when a host is added, using the management type that is defined in /opt/kusu/etc/power_defaults, and the management password that is defined in /opt/kusu/etc/.ipmi.passwd. The management user ID is currently fixed at “kusuipmi”. During the provisioning network boot process, this user ID and password combination is added to the BMC on the host in addition to the default account and any others that might be defined. If you have security concerns, you might still need to remove any default accounts or change default passwords. The power table file for our iDataPlex servers is shown in Example 6-3. When a new host is added to the cluster, an entry is created in the table if an entry does not exist for that host name. Old entries are not automatically purged when you remove hosts. Note the extra entry that was added automatically for our unmanaged host i05n36 by using a dummy IP address value.

Example 6-3 The kusu-power.conf file

device ipmi20 ipmi lanplus

# Dynamic adds

node i05n44 ipmi20 129.40.127.44 kusuipmi xs-2127pw

node i05n39 ipmi20 129.40.127.39 kusuipmi xs-2127pw

node i05n38 ipmi20 129.40.127.38 kusuipmi xs-2127pw

node i05n40 ipmi20 129.40.127.40 kusuipmi xs-2127pw

node i05n41 ipmi20 129.40.127.41 kusuipmi xs-2127pw

node i05n42 ipmi20 129.40.127.42 kusuipmi xs-2127pw

node i05n36 ipmi20 IP.Of.Power.Mgr kusuipmi xs-2127pw

node i05n37 ipmi20 129.40.127.37 kusuipmi xs-2127pw

Configuring auxiliary networks

At a minimum, your cluster must include a provisioning network common to all hosts, and a public network on the master hosts. It can also include other networks, including management networks, InfiniBand networks, or other Ethernet networks. To provision these additional networks, you must provide network descriptions and add them to the provisioning template for your hosts. This can be done either through the web portal or through a CLI interface. We found that the web portal did not allow us to add our InfiniBand network to the master host and that the CLI tool kusu-net-tool does not provide an option to define a BMC network. Because it is generally not a good idea to power manage the master host from within the web portal, not having the BMC network defined on the master host is not a problem, unless your master BMC IP address falls within the range of your compute hosts. While IP addresses for the provisioning network can be directly specified in a host configuration file, all other network addresses are assigned sequentially according to the network definition that you provide. You must carefully consider this when defining networks and adding hosts to your cluster. We suggest that you ensure that your master host and any unmanaged devices are outside of the IP address range that is used by your compute hosts to avoid addressing conflicts. The definition dialog that we used for our BMC network is shown in Figure 6-8 on page 195.

Figure 6-8 Network definition dialog

We started at the IP address that corresponds to the first compute host in our cluster. From there, it is important to add the hosts in ascending order to keep the IP addresses aligned. It is not possible to reserve IP addresses on other than the provisioning network using the Unmanaged Devices dialog. In our configuration, we needed to add two additional networks: a BMC network for hardware control and an IP over IB (IPoIB) network on the QLogic IB fabric. The CLI commands that we used to create the IPoIB network and connect it to the master host are shown in Example 6-4. There are differences between this example and the example that is shown in the Administering Platform HPC manual, SC22-5379-00.

Example 6-4 Defining an IPoIB network and connecting the master host

[root@i05n43 ~]# cat /sys/class/net/ib0/address

80:00:00:03:fe:80:00:00:00:00:00:00:00:11:75:00:00:78:3a:a6

kusu-net-tool addinstnic ib0 --netmask 255.255.255.0 --ip-adddress=129.40.128.43 --start-ip=192.40.128.37 --desc "IPoIB" --other --macaddr="80:00:00:03:fe:80:00:00:00:00:00:00:00:11:75:00:00:78:3a:a6"

Added NIC ib0 successfully

...

Device: ib0

Description: "IPoIB"

Inet addr: 129.40.128.43 Bcast: 129.40.128.255 Mask: 255.255.255.0

Gateway: 192.40.128.37

Type: provision

Network ID: 10

Please do the following steps:

- Restart network service

- Run "kusu-addhost -u" to update the configuration for installed kits

- Reboot installer nodeu

Our complete network definition is shown in Figure 6-9. You might also need to provide name resolution for hosts on your external networks by using the procedure that is described in the “Make external hosts to the cluster visible to compute hosts” section of Administering IBM Platform HPC, SC22-5379-00.

Figure 6-9 Network definitions

Configuring additional shared directories

By default, the non-HA installation assumes that the /home directory is local to the master host and is to be NFS exported to the compute cluster. This might not be appropriate in all instances, and there might also be a requirement to mount other shared directories on the compute hosts. This can be accomplished by adding the /etc/cfm/<template name>/fstab.append file as shown in the script in Example 6-5. (This is incorrectly identified as fstab.kusuappend in the initial version of Administering IBM Platform HPC, SC22-5379-00).

Example 6-5 Configuring additional NFS mount points

[root@io5n43]# cd /etc/cfm/xs-2127-compute-diskless-rhel-6.2-x86_64

[root@i05n43]# cat etc/fstab.append

# Local path to file server

10.0.1.36:/home /home nfs defaults 0 0

Configuring LDAP users

It is often a preferred practice in an HPC cluster to maintain user IDs and group IDs on the master host either manually or by a distributed password system, such as Lightweight Directory Access Protocol (LDAP) or Network Information Service (NIS). These IDs and passwords can then be statically distributed to the compute hosts by using the cfm facility, eliminating any overhead user ID synchronization through the master gateway. This is particularly appropriate where there are relatively few users of the cluster, and changes are infrequent. In other environments, it might be necessary to extend an enterprise password system directly to the compute hosts. In our test environment, normal user accounts are managed through an LDAP server. Administering IBM Platform HPC, SC22-5379-00, offers some guidance on integrating LDAP accounts, but we found those instructions were not sufficient to enable LDAP on our RH 6.2 test environment. This is a good example of how the cfm facility and post-installation scripts that are provided by IBM Platform HPC can be used to customize your installation. Example 6-6 on page 197 shows the links that we created to enable LDAP in our provisioning template xs-2127-compute-diskless-rhel-x86_64 as well as the installation postscript enable_ldap.sh that we added to enable the required subsystem.

Example 6-6 Elements that are used to enable LDAP authentication on our compute hosts

[root@i05n43 ~] # cd /etc/cfm/xs-2127-compute-diskless-rhel-6.2-x86_64/etc

[root@i05n43 etc]# ls -latrR

lrwxrwxrwx 1 root root 15 Aug 6 11:07 nslcd.conf -> /etc/nslcd.conf

lrwxrwxrwx 1 root root 18 Aug 2 18:35 nsswitch.conf -> /etc/nsswitch.conf

lrwxrwxrwx 1 root root 18 Aug 2 18:43 pam_ldap.conf -> /etc/pam_ldap.conf

./openldap:

lrwxrwxrwx 1 root root 23 Aug 2 18:35 ldap.conf -> /etc/openldap/ldap.conf

./pam.d:

lrwxrwxrwx 1 root root 22 Aug 2 18:36 system-auth -> /etc/pam.d/system-auth

lrwxrwxrwx 1 root root 25 Aug 2 18:36 system-auth-ac -> /etc/pam.d/system-auth-ac

./sysconfig:

lrwxrwxrwx 1 root root 25 Aug 2 18:32 authconfig -> /etc/sysconfig/authconfig

[root@i05n43 hpcadmin]# cat /home/hpcadmin/enable_ldap.sh

#!/bin/bash

# Enables daemon required for LDAP password

chkconfig nslcd on

service nslcd start

Adding additional software kits, OS packages, and scripts

As discussed in 6.1.2, “Cluster provisioning” on page 184, IBM Platform HPC includes a set of basic provisioning templates. For our implementation, we needed to add support for InfiniBand as well as additional networks. We chose to work with diskless images, so we started by creating a copy of the provisioning template compute-diskless-rhel-6.2-x86_64, which is created by the installation scripts. Using the web portal, we created a copy of this template named xs-2127-compute-diskless-rhel-6.2-x86_64 as shown in Figure 6-10.

Figure 6-10 Copying the provisioning template

After creating our new provisioning template, we accepted it and modified it to add the software kits for OFED support (Figure 6-11). Through the Packages tab on this dialog, we can also specify kernel modules, RPM packages from the base OS distribution, additional networks, and post-installation scripts. This tab is where we added our post-installation script as well as the apr-util-ldap, compat-openldap, nss-pam-ldapd, openldap-clients, and openldap-devel RPMs from the operating system repository.

On the General tab, we also added the kernel boot parameter console=ttyS0,115200 to allow us to observe the boot process through the BMC terminal that is provided by our BMC connection. (This terminal can be found under the Console drop-down list on the detailed host description under the Devices → Hosts section of the web portal.)

Figure 6-11 Adding components to the provisioning template

Adding other RPM and scripts

In addition to the software components that are provided by IBM Platform Computing HPC and the base operating system, you might need to add other RPM-based elements to your cluster. For our cluster, we installed the IBM General Parallel File System (GPFS) client code on our image. Because of the special installation requirements of the GPFS RPMs, we used both a standard RPM method and a post-installation script to illustrate both methods. GPFS requires the installation of a base level of GPFS before a fix level can be applied, which means installing two RPMs with the same package name, which cannot be done in a single instance of the yum or rpm commands. We installed the base 3.4.0.0 level GPFS package gpfs.base and the service levels of gpfs.doc and gpfs.msg via the repository, and then applied the gpfs.base service level and the gpfs.gplbin kernel abstraction layer through a post-installation script.

To add extra RPMs to the operating system repository, copy them to the /depot/contrib directory that corresponds to the desired operating system repository. In Example 6-7 on page 199, we listed the repositories, determined that directory 1000 contains our rhel-6.2-x86_64 repository, and copied our GPFS RPMs to the operating system repository.

Example 6-7 Adding external RPMs to the operating system repository

[root@i05n43 contrib]# kusu-repoman -l

Repo name: rhel-6.2-x86_64

Repository: /depot/repos/1000

Installers: 129.40.126.43;10.0.1.43

Ostype: rhel-6-x86_64

Kits: base-2.2-2-x86_64, os-ofed-3.0.1-2-x86_64,

pcm-3.2-1-x86_64, platform-hpc-web-portal-3.2-1-x86_64,

platform-isf-ac-1.0-3-x86_64, platform-lsf-8.3-1-x86_64,

platform-lsf-gpu-1.0-3-x86_64, platform-mpi-8.3.0-1-x86_64,

rhel-6.2-x86_64

Repo name: rhel-63

Repository: /depot/repos/1006

Installers: 129.40.126.43;10.0.1.43

Ostype: rhel-6-x86_64

Kits: base-2.2-2-x86_64, os-ofed-3.0.1-2-x86_64,

pcm-3.2-1-x86_64, platform-lsf-8.3-1-x86_64,

platform-mpi-8.3.0-1-x86_64, rhel-6.3-x86_64

[root@i05n43 contrib]# cd 1000

[root@i05n43 1000]# cp /home/GPFS-base/gpfs.base-3.4.0-0.x86_64.rpm .

[root@i05n43 1000]# cp /home/GPFS-fixes/RPMs/gpfs.docs-3.4.0-14.noarch.rpm .

[root@i05n43 1000]# cp /home/GPFS-fixes/RPMs/gpfs.msg.en_US-3.4.0-14.noarch.rpm .

[root@i05n43 1000]# kusu-repoman -ur rhel-6.2-x86_64

Refreshing repository: rhel-6.2-x86_64. This may take a while...

[root@i05n43 1000]#

Next, we used the web portal to add these GPFS RPMs to our provisioning template, as shown in Figure 6-12.

Figure 6-12 Adding RPMs to a provisioning template

Finally, we used the installation post-installation script in Example 6-8 to complete the update of GPFS and install the required kernel abstraction layer. Because we are using stateless nodes in this system, it is also necessary to restore the node-specific GPFS configuration database each time that the node is rebooted.

This will not work for new nodes, which have not yet been defined to GPFS. For such nodes, you can use the same post-installation script, but remove the files in /var/mmfs/gen before you attempt to add the nodes to your GPFS cluster for the first time. Otherwise, GPFS will determine that these nodes are already members of a GPFS cluster.

This simple example assumes that the host was previously defined to GPFS and is just being re-imaged. This example also assumes that you have added the appropriate entries to /root/.ssh/authorized_keys on your GPFS primary and secondary configuration server hosts to allow password-less Secure Shell (ssh) from the newly provisioned hosts.

Example 6-8 Post-installation script for GPFS

#!/bin/bash

# GPFS Patch level

GPFSVER=3.4.0-14.x86_64

# Probable GPFS primary configuration server

PRIMARY=i05n67.pbm.ihost.com

# Determine kernel level

KVER=$(uname -r)

# Need home to get files

mount /home

# Update gpfs.base code

rpm -Uvh /home/GPFS-fixes/RPMs/gpfs.base-${GPFSVER}.update.rpm

# Remove any existing gpl layer for the current kernel

rpm -e $(rpm -qa| grep gpfs.gplbin-${KVER})

# Install the correct gpl layer for this GPFS build and kernel

rpm -ivh /home/GPFS-fixes/RPMs/gpfs.gplbin-${KVER}-${GPFSVER}.rpm

# force addition of the GPFS configuration server to known_hosts

# while obtaining list of members of the GPFS cluster; update P & S

CLUSTER=$(ssh -o “StrictHostKeyChecking=no” ${PRIMARY} /usr/lpp/mmfs/bin/mmlscluster)

# Determine if current node is a member of the GPFS cluster

MYIPS=$(ip addr | grep “inet “ |awk ‘{print $2}’ | cut -f 1 -d /)

for IP in $MYIPS

GPFSHOSTNAME=$(echo “$CLUSTER”| grep $IP | awk ‘{print $2}’)

if [ “$GPFSHOSTNAME” ]

then

break

done

if [ “$GPFSHOSTNAME” ]

then

# This node is defined to GPFS; restore the GPFS database

/usr/lpp/mmfs/bin/mmsdrrestore -p ${PRIMARY} -F /var/mmfs/gen/mmsdrfs -R /usr/bin/scp

Post-installation scripts: Post-installation scripts are stored in the provisioning engine’s database. To modify a post-installation script, you must delete it and re-add the modified copy, either with a different name or in a different Portal modify step. For imaged provisioning modes, changing the post-installation scripts requires rebuilding the host image. Using a different name avoids waiting for a second image rebuild. Packaged installs do not require this image rebuild, so it might be quicker to test post-installation scripts on packaged images first. To use the same script for multiple provisioning templates and to meet the unique name requirement that is imposed by the database, we created a separate unique symbolic link for each revision, for the provisioning template, all pointing to a common script.

Adding hosts to the cluster

After you define an appropriate provisioning template for your HPC hosts, you are ready to add these hosts to your cluster. IBM Platform HPC can add new hosts through either the Auto-detect or Pre-defined file procedures. This mode is chosen through the Devices → Hosts → Add → Add Hosts dialog (Figure 6-13).

Figure 6-13 Adding a host with auto detection

In Auto-detect mode (Figure 6-13), you simply specify the provisioning template to be used and optionally the physical rack location of the server. IBM Platform HPC then monitors Dynamic Host Configuration Protocol (DHCP) requests on the provisioning network, and it adds the next unknown host that asks for an IP address assignment to the cluster using the specified values. If your hosts are set to automatically network boot, this occurs the next time you power up or reboot the targeted host. This is a very quick and simple way to add hosts, but you must be careful to add the hosts in ascending order if you want to maintain consistent and ascending IP addresses.

Figure 6-14 Adding hosts with a pre-defined host file

In the pre-defined file mode (Figure 6-14), a set of hosts are added sequentially by using the Media Access Control (MAC) addresses on the provisioning network that are provided in a text file, one line per host server. If the MAC addresses are provided to you, for example on the configuration CD that is provided with an IBM Intelligent Cluster iDataPlex solution, you can very quickly add a full rack of servers. With this file, you can assign host names and rack locations. The file format is described in the online help and the file that we used is shown in Example 6-9. Comments can be included only at the beginning of the file by prefixing lines with the # character. The host names that are specified must comply with the Host Name Format that is defined in the provisioning template, or they are replaced by generated names that use the template format and the next available host numbers.

Example 6-9 Pre-defined host list

#Format: MAC Address, IP, Name, uid, bmc_ip, rack, chassis, starting unit, server height

#============= ====== ======= ========== ======

E4:1F:13:EF:A9:D5,10.0.1.37,i05p37, ,129.40.127.37,Rack1A,,37,1

E4:1F:13:EF:AE:D3,10.0.1.38,i05p38, ,129.40.127.38,Rack1A,,38,1

E4:1F:13:EF:73:27,10.0.1.39,i05p39, ,129.40.127.39,Rack1A,,39,1

E4:1F:13:EF:AA:77,10.0.1.40,i05p40, ,129.40.127.40,Rack1A,,40,1

E4:1F:13:EF:BF:C5,10.0.1.41,i05p41, ,129.40.127.41,Rack1A,,41,1

E4:1F:13:EF:A9:69,10.0.1.42,i05p42, ,129.40.127.42,Rack1A,,42,1

E4:1F:13:EF:96:D9,10.0.1.44,i05p44, ,129.40.127.44,Rack1C,,2,1

BMC addresses: At the time of writing this book, IBM Platform HPC did not honor the bmc_ip that is specified in this file. BMC network address assignment is done as described in “Configuring auxiliary networks” on page 194. BMC addresses are assigned sequentially from the starting address that is specified in the network definition.

In either mode, power on or reset these hosts to cause a Preboot Execution Environment (PXE) network boot, and they are provisioned with the template that you selected. After a few moments, these hosts appear as “Installing” in the web portal (Figure 6-15). When the installation is complete, the host status box changes to a green “OK”.

Figure 6-15 Compute hosts in the Installing state

Diagnosing problems during installation

While defining and adding hosts and images, the provisioning engine records logs in the /var/log/kusu directory on the master host. The kusu-events.log file records events that result from provisioning activities. The cfmclient.log file records events that are associated with synchronizing files from the master host to the provisioned hosts. During the host provisioning process, the provisioning engine maintains a log on the host that is being installed in the /var/log/kusu directory. The kusurc.log shows the sequence of operations and contains error output from any post-installation scripts that you defined. There are also logs in the /var/log/httpd directory on the master host that record file transfers during provisioning.

6.2.2 Modifying the cluster

This section describes how to modify the cluster.

Implementing high availability

IBM Platform HPC can be configured to provide failover services to protect both the provisioning engine and the workload scheduler. This can be defined at the time of the initial installation, or added later by provisioning an additional installer candidate host. We used the process in the Administering IBM Platform HPC manual in the section “Enable HA post-installation” to convert one of our hosts to a failover master.

Moving to a HA configuration requires that you place the configuration and software repository on a shared (NFS) device. When converting from a standard configuration to a HA one, the repository previously created on the local disk is copied to the NFS server that is designated during the installation. We found that if both the NFS server and the host OS supported NFS V4, installation defaulted to that format and failed due to authentication problems. To resolve this situation, we disabled NFSV4 client protocol on the master and failover hosts by uncommenting the line “# Defaultvers=4” in the file /etc/nfsmount.conf and changing it to “ Defaultvers=3”.

To enable an HA configuration, you must define virtual IP addresses on your provisioning and public networks to which service requests are directed. Those IP addresses are assumed by whichever host is the active installer and master, across the same network connections that are used for the static IP configuration. These addresses can be any otherwise unused address on the same subnet. You need to choose them to not interfere with the range that you intend to use for your compute hosts. After converting to HA, direct your web browser to the virtual public IP address to access the web portal.

After following the HA installation procedure and reprovisioning the failover host by using the installer-failover-rhel-6.2-x86_64 template that is provided with the base installation, we were able to force the new failover host to take over as the active installer as shown in Example 6-10.

Example 6-10 Manually failing over the active provisioning host

[root@i05n44 hpcadmin]# kusu-failto

Are you sure you wish to failover from node ‘i05n43’ to node ‘i05n44’? [<y/N>]: y

Installer Services running on ‘i05n43’

Syncing and configuring database...

Starting kusu. This may take a while...

Starting initial network configuration: [ OK ]

Generating hosts, hosts.equiv, and resolv.conf: [ OK ]

Config mail mechanism for kusu: [ OK ]

Setting up ntpd: [ OK ]

Setting up SSH host file: [ OK ]

Setting up user skel files: [ OK ]

Setting up network routes: [ OK ]

Setting up syslog on PCM installer: [ OK ]

Setting HPC HA: [ OK ]

Running S11lsf-genconfig: [ OK ]

Increasing ulimit memlock: [ OK ]

Setting npm service for HPC HA: [ OK ]

Running S70SetupPCMGUI.sh: [ OK ]

Post actions when failover: [ OK ]

Setting up fstab for home directories: [ OK ]

Running S97SetupGUIHA.sh: [ OK ]

Synchronizing System configuration files: [ OK ]

Starting initial configuration procedure: [ OK ]

Restart kusu service on ‘i05n43’, it may take a while...

Installer Services now running on ‘i05n44’

[root@i05n44 network-scripts]# kusu-failinfo

Installer node is currently set to: i05n44 [Online]

Failover node is currently set to: i05n43 [Online]

Failover mode is currently set to: Auto

KusuInstaller services currently running on: i05n44

[root@i05n44 hpcadmin]#

6.2.3 Submitting jobs

After you add and provision a set of hosts, your cluster is ready for job submission. The installation process includes the installation of the IBM Platform LSF batch scheduler and the configuration of a basic set of job queues. The IBM Platform HPC web portal includes an integrated Jobs function that allows users to submit generic or specific application jobs into the job management system. This is described in “Submitting jobs” on page 58. The Administrator is also provided the facility to define or modify custom jobs through the Application Template function that is described in 4.3.2, “IBM Platform Application Center implementation” on page 54.

In addition to the web portal, IBM Platform HPC (and LSF) support job submission via a traditional command-line interface (CLI) by using the bsub command, as well as job and queue monitoring using bjobs and bqueues. The bsub command can process job scripts as an input stream that uses the syntax bsub <jobscript>, or cat jobscript | bsub. This provides powerful scripting options and the ability to embed LSF job control commands in your scripts by using the “# BSUB” notation. You can also use the more familiar bsub jobscript syntax but in this mode “# BSUB” controls are not processed.

6.2.4 Operation and monitoring

The IBM Platform HPC web portal also includes monitoring functions for cluster resources. Through the web portal, you can easily see what jobs are running on your cluster, what the usage pattern has been, and what classes of work are being scheduled.

Monitoring workload throughput

The Job Reports section of the Jobs tab provides both administrators and users tools for historical reporting on job throughput. These reports are similar to the reports that are described for LSF Platform Application Center (PAC) in 4.2.1, “IBM Platform Application Center” on page 36, but these reports are located on the Jobs tab instead of having their own tab (Figure 6-16 on page 206), and a smaller set of default reports is provided.

Figure 6-16 Job reports

Monitoring availability

The resource reports feature of the web portal provides a means of reporting on resource availability. Figure 6-17 shows a graph of a number of available hosts over time.

Figure 6-17 Host resource report

Monitoring resources

The web portal Dashboard can be configured to display a number of specific host resource values using a color scale on the graphical representation of your cluster. In Figure 6-7 on page 193, the default values of CPU Usage and Baseboard Temperature are shown. On our iDataPlex dx360M3 hardware, temperature monitoring is the function of a shared chassis resource, so a temperature value only appears on the odd-numbered hosts. You can change the resource that is being displayed by using the drop-down menu on each color scale, as illustrated in Figure 6-18. With the Dashboard, you can select a condensed view without host labels, which presents more hosts on a window, or the heatmap view, which further increases density by dropping the labels and displaying only a single resource value per host.

Figure 6-18 Resource drop-down menu

Modifying alerts

IBM Platform HPC provides a comprehensive set of pre-defined alerts. These alerts are easily modified to best suit the specific environment of the cluster. For example, the standard set of alerts includes a Free Memory Low alert if the amount of unused RAM on the master host goes under a set limit. Using the web portal, we selected the Resource Alerts → Alert Definitions panel, selected the Free Memory Low alert, and used the Modify dialog to extend this alert to all the hosts in the cluster and change the alert threshold to 100 MB (Figure 6-19).

Figure 6-19 Modifying an alert definition

6.3 References

The IBM Platform HPC documentation is listed in Table 6-2.

Table 6-2 IBM Platform HPC documentation

Title	Publication number
Getting Started with IBM Platform HPC (Administrator)	GI13-1888-00
Getting Started with IBM Platform HPC (Users)	GI13-1889-00
Release Notes for IBM Platform HPC	GI13-3102-00
Administering IBM Platform HPC	SC22-5379-00
Installing IBM Platform HPC	SC22-5380-00
Installing and Managing the IBM Platform HPC Web Portal Kit	SC22-5381-00
Installing and Managing the IBM Platform HPC Workload Scheduler Kit	SC22-5387-00
Installing and Managing the IBM Platform HPC GPU Scheduling Kit	SC22-5392-00
Installing and Managing the IBM Platform HPC Dynamic Multiboot Kit	SC22-5393-00
Installing and Managing the IBM Platform Cluster Manager Base Kit	SC22-5394-00
The Hidden Cost of Open Source	DCW03023-USEN-00

Additionally, the documentation that is listed in Table 6-3 is included with the product.

Table 6-3 Additional IBM Platform HPC documentation

Title	Publication number
Installing and Managing the IBM Platform High Performance Computing Tools Kit	SC22-5391-00
Installing and Managing the Intel Cluster Checker Kit	SC22-5390-00
Installing and Managing the Intel Runtime Kit	SC22-5389-00
Installing and Managing the Java JRE Kit	SC22-5388-00
Installing and Managing the Nagios Kit	SC22-5386-00
Installing and Managing the NVIDIA CUDATM Kit	SC22-5385-00
Installing and Managing the OFED Kit	SC22-5384-00
Installing and Managing the IBM Platform HPC OS OFED Kit	SC22-5395-00
Installing and Managing the IBM Platform Cluster Manager Kit	SC22-5383-00
Administering IBM Platform HPC	SC22-5379-00

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6. IBM Platform High Performance Computing

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 6. IBM Platform High Performance Computing