Chapter 3. Planning for IBM PowerVM

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Planning for IBM PowerVM

This chapter includes guidance for planning for PowerVM before you start to implement the solution.

This chapter covers the following topics:

•PowerVM prerequisites

•Processor virtualization planning

•Memory virtualization planning

•Virtual I/O Server planning

•Storage virtualization planning

•Network virtualization planning

•Further considerations

3.1 PowerVM prerequisites

PowerVM requires a valid license (feature code) before its features can be used. All
IBM Power9 and IBM Power10 processor-based systems have PowerVM Enterprise Edition features.

The following sections present the hardware and operating system requirements that are associated with available PowerVM features.

3.1.1 Hardware requirements

PowerVM features are supported on most of the Power offerings with a few exceptions.

The availability of Capacity on Demand (CoD) offerings varies based on the Power server model. Support for CoD features can be found in 2.9, “Capacity on Demand” on page 66 and 2.10, “Power Enterprise Pools” on page 70.

Support for a few PowerVM features is discontinued. Table 1-3 on page 6 lists PowerVM features that are discontinued.

Hardware Management Console (HMC) hardware and supported code combinations for Power servers can be found at Power Code Matrix - Supported HMC Hardware, found at:

https://www.ibm.com/support/pages/node/6554904

3.1.2 Software requirements

PowerVM supports running AIX, IBM i, and Linux operating systems on Power servers. PowerVM offers Virtual I/O Server (VIOS) to facilitate I/O virtualization for client VMs. The supported versions of the operating systems, system firmware, I/O adapter firmware, HMC code level, and VIOS code levels depend on the Power server model.

The supported OS versions in PowerVM are as follows:

•AIX

AIX 7.1, AIX 7.2, and AIX 7.3 and later

•IBM i

IBM i 7.2, IBM i 7.3, IBM i 7.4, and IBM i 7.5 and later

•Linux

– Red Hat Enterprise Linux V7 for Power, and RHEL 8 or later

– SUSE Linux Enterprise Server 12, and SUSE Linux Enterprise Server 15 or later

Supported code combinations of HMC and system firmware levels for all IBM Power Systems are listed in the Power Code Matrix, found at:

https://esupport.ibm.com/customercare/flrt/mtm

For compatible system software combinations with POWER processors, see System Software Maps, found at:

https://www.ibm.com/support/pages/system-software-maps

Plan a successful Power system upgrade or migration by finding the minimum system software requirements at IBM Power Systems Prerequisites, found at:

https://esupport.ibm.com/customercare/iprt/home

3.2 Processor virtualization planning

PowerVM hypervisor (PHYP) can map a whole physical processor core or it can time slice a physical processor core. PHYP time slices shared processor partitions (also known as
IBM Micro-Partitioning) on the physical CPUs by dispatching and undispatching the various virtual processors for the partitions that run in the shared pool. The minimum processing capacity per processor is 0.05 of a physical processor core, with a further granularity of 0.01. The PHYP uses a 10 millisecond (ms) time slicing dispatch window for scheduling all shared processor partitions' virtual processor queues to the PHYP physical processor core queues.

Partitions are created by using the HMC or PowerVM NovaLink and orchestrated by
IBM Power Virtualization Center (PowerVC). When you start creating a partition, you must choose between a shared processor and a dedicated processor logical partition (LPAR).

3.2.1 Dedicated processors planning

Dedicated-processor LPARs can be allocated only in whole numbers. Therefore, the maximum number of dedicated-processor LPARs in a system is equal to the number of physical activated processors.

For dedicated processor partitions, you configure these attributes:

•Minimum number of processors

•Allocated number of processors

•Maximum number of processors

Allocated processors define the number of processors that you want for this partition. When the partition is activated, the hypervisor tries to allocate this number of processors to the partition. If not enough available processors are left in the system, then the hypervisor tries to allocate as many as possible from the remaining capacity. If the available processors in the system are less than the minimum value, the partition cannot be activated. Minimum and maximum values also set the limits of dynamic logical partitioning (DLPAR) operations while the partition is active. You cannot allocate more processors than the maximum value to the partition, and you can change only minimum and maximum values when the partition is inactive.

Consider setting the maximum value high enough to support the future requirements of growing workloads. Similarly, ensure that the minimum value is set low enough in case you must reduce the resources of this partition, but also set the minimum value high enough to prevent the scenario where the application is starving for CPU because the partition is activated with fewer processors than the number that is required for the application.

When a dedicated processor partition is powered off, its processors are donated to the default shared processor pool (SPP) by default. It is possible to disable this attribute in the partition properties windows. It also is possible to enable donating unused processing cycles of dedicated processors while the dedicated processor partition is running. You can change these settings at any time without having to shut down and restart the LPAR.

Processor resources within PEP 2.0 are tracked based on the assignment of dedicated processors to active partitions. If sharing of unused capacity is not enabled, the whole core is marked as consumed. If the processor is set to dedicated-donating mode, then actual consumption is reported like shared processor VMs. It is a best practice to allow sharing of unused capacity of dedicated processors for cost efficiency, especially in PEP 2.0 environments.

3.2.2 Shared processors planning

For shared processor partitions, you configure these additional attributes:

•Minimum, wanted, and maximum processing units of capacity

•The processing sharing mode, either capped or uncapped

•Minimum, wanted, and maximum virtual processors

Processing units of capacity

Processing capacity can be configured in fractions of 0.01 processors. The minimum amount of processing capacity that must be assigned to a micro-partition is 0.05 processors. On the HMC, processing capacity is specified in terms of processing units. The minimum capacity of 0.05 processors is specified as 0.05 processing units. To assign a processing capacity that represents 75% of a processor, 0.75 processing units are specified on the HMC.

On a system with two processors, a maximum of 2.0 processing units can be assigned to a micro-partition. Processing units that are specified on the HMC are used to quantify the minimum, wanted, and maximum amount of processing capacity for a shared processor partition.

After a shared processor partition is activated, processing capacity is usually referred to as capacity entitlement or entitled capacity. A shared processor partition is guaranteed to receive its capacity entitlement under all systems and processing circumstances.

Capacity entitlement must be correctly configured for normal production operation and, if capped, to cover workload during peak time. Having enough capacity entitlement is important to not impact operating system performance.

Capped and uncapped mode

Shared processor partitions have a specific processing mode that determines the maximum processing capacity that is given to them from their SPP.

The processing modes are as follows:

Uncapped mode The processing capacity can exceed the entitled capacity when extra resources are available in their SPP. Extra capacity is distributed on a weighted basis. An uncapped weight value is assigned to each uncapped partition when it is created.

Capped mode The processing capacity that is given can never exceed the entitled capacity of the shared processor partition.

If multiple uncapped partitions are competing for more processing capacity, the hypervisor distributes the remaining unused processor capacity in the processor pool to the eligible partitions in proportion to their uncapped weight. The higher the uncapped weight value, the more processing capacity the partition receives.

The uncapped weight must be an integer 0 - 255. The default uncapped weight for uncapped micro-partitions is 128. Uncapped weight provides information to the hypervisor on how unused capacity must be distributed across partitions. A partition with an uncapped weight of 100 is 100 times more likely to receive some of the unused capacity than a partition with an uncapped weight of 1.

Important: If you set the uncapped weight at 0, the hypervisor treats the micro-partition as a capped micro-partition. A micro-partition with an uncapped weight of 0 cannot be allocated more processing capacity beyond its entitled capacity.

3.2.3 Virtual processors planning

A virtual processor is a depiction or a representation of a physical processor that is presented to the operating system that runs in a micro-partition. The processing entitlement capacity that is assigned to a micro-partition, whether it is a whole or a fraction of a processing unit, is distributed by the server firmware equally between the virtual processors within the micro-partition to support the workload. For example, if a micro-partition has 1.60 processing units and two virtual processors, each virtual processor has the capacity of 0.80 processing units.

A virtual processor cannot have a greater processing capacity than a physical processor. The capacity of a virtual processor is equal to or less than the processing capacity of a physical processor.

A micro-partition must have enough virtual processors to satisfy its assigned processing capacity. This capacity can include its entitled capacity and any additional capacity beyond its entitlement if the micro-partition is uncapped.

So, the upper boundary of processing capacity in a micro-partition is determined by the number of virtual processors that it possesses. For example, if you have a partition with
0.50 processing units and one virtual processor, the partition cannot exceed 1.00 processing units. However, if the same partition with 0.50 processing units is assigned two virtual processors and processing resources are available, the partition can use an extra
1.50 processing units.

The maximum number of processing units that can be allocated to a virtual processor is always 1.00. Additionally, the number of processing units cannot exceed the total processing unit within an SPP.

Number of virtual processors

In general, the value of the minimum, wanted, and maximum virtual processor attributes must parallel the values of the minimum, wanted, and maximum capacity attributes in some fashion. A special allowance must be made for uncapped micro-partitions because they are allowed to consume more than their capacity entitlement.

If the micro-partition is uncapped, the administrator might want to define the wanted and maximum virtual processor attributes greater than the corresponding capacity entitlement attributes. The exact value is installation-specific, but 50 - 100 percent more is reasonable.

In general, it is a best practice to assign enough processing units to an uncapped partition to satisfy average workloads and set virtual processors high enough to address peak demands.

Because the number of virtual processors defines the number of physical cores that the partition has access to, it also sets the number of simultaneous multithreading (SMT) threads that are available to the partition. Therefore, you might want to adjust the number of virtual processors based on application requirements.

Selecting the optimal number of virtual processors depends on the workload in the partition. A high number of virtual processors might negatively affect the system performance. The number of virtual processors also can impact software licensing, for example, if the subcapacity licensing model is used.

Virtual processor folding

Virtual processor folding effectively puts idle virtual processors into a hibernation state so that they do not consume any resources. This feature provides several benefits, such as improved processor affinity, reduced hypervisor workload, and increased average time a virtual processor runs on a physical processor.

The characteristics of the virtual processor folding feature are:

•Idle virtual processors are not dynamically removed from the partition. They are hibernated, and only awoken when more work arrives.

•This feature provides no benefit when partitions are busy.

•If the feature is turned off, all virtual processors that are defined for the partition are dispatched to physical processors.

•Virtual processors that have attachments, such as bindprocessor or rset command attachments in AIX, are not excluded from being disabled.

•The feature can be turned off or on, and the default is on.

When a virtual processor is disabled, threads are not scheduled to run on it unless a thread is bound to that processor.

Virtual processor folding is controlled through the vpm_xvcpus tuning setting, which can be configured by using the schedo command.

3.2.4 Shared processor pools capacity planning

This section describes the capacity attributes of SPPs and provides examples of the capacity resolution according to the server load.

SPPs are described in 2.1.3, “Shared processors” on page 34 and 2.1.5, “Multiple shared processor pools” on page 36.

Capacity attributes

The following attributes are used to calculate the pool capacity of SPPs:

•Maximum Pool Capacity (MPC)

Each SPP has a maximum capacity that is associated with it. The MPC defines the upper boundary of the processor capacity that can be used by the set of micro-partitions in the SPP. The MPC must be represented by a whole number of processor units.

•Reserved Pool Capacity (RPC)

The system administrator can assign an entitled capacity to an SPP to reserve processor capacity from the physical SPP for the express usage of the micro-partitions in the SPP. The RPC is in addition to the processor capacity entitlements of the individual micro-partitions in the SPP. The RPC is distributed among uncapped micro-partitions in the SPP according to their uncapped weighting. The default value for the RPC is zero.

•Entitled Pool Capacity (EPC)

The EPC of an SPP defines the guaranteed processor capacity that is available to the group of micro-partitions in the SPP. The EPC is the sum of the entitlement capacities of the micro-partitions in the SPP plus the RPC.

The default shared processor pool

The default SPP (SPP0) is automatically activated by the system and is always present. Its MPC is set to the capacity of the physical SPP. For SPP0, the RPC is always 0.

The default SPP has the same attributes as a user-defined SPP except that these attributes are not directly under the control of the system administrator; their values are fixed.

The maximum capacity of SPP0 can change indirectly through system administrator action such as powering on a dedicated-processor partition or dynamically moving physical processors in or out of the physical SPP.

Levels of processor capacity resolution

Two levels of processor capacity resolution are implemented by the PHYP and multiple shared processor pools (MSPP):

Level0 The first level, Level0, is the resolution of capacity within the same SPP. Unused processor cycles from within an SPP are harvested and then redistributed to any eligible micro-partition within the same SPP.

Level1 When all Level0 capacities are resolved within the MSPP, the hypervisor harvests unused processor cycles and redistributes them to eligible micro-partitions regardless of the MSPP structure. Level1 is the second level of processor capacity resolution.

Important: When user-defined SPPs are configured, the MPC is not deducted from default SPP (SPP0). The default pool size stays the same. If the MPC is bigger than the sum of entitled capacity in the pool and RPC, partitions in the user-defined SPP might still compete for more processing capacity with partitions that are not in the same user-defined pool.

3.2.5 Software licensing in a virtualized environment

The following sections describe the factors to be considered when you plan the license model that you will use. A licensing factors summary is presented at the end.

Licensing factors in a virtualized system

With the mainstream adoption of virtualization, more independent software vendors (ISVs) are adapting their licensing to accommodate the new virtualization technologies. Several different models exist, varying with the ISVs. When you calculate the cost of licensing and evaluate which virtualization technology to use, consider the following factors:

•ISV recognition of virtualization technology and capacity capping method

•ISV subcapacity licensing available for selected software products

•ISV method for monitoring and management of subcapacity licensing

•ISV flexibility as license requirements change

Cost of software licenses

A careful consideration of the licensing factors in advance can help reduce the overall cost in providing business applications. Traditional software licensing is based on a fixed machine with a fixed number of resources. The new PowerVM technologies present some challenges to this model:

•It is possible to migrate partitions between different physical machines (with different speeds and numbers of total processors that are activated).

•Consider a number of partitions, which, at different times, are all using four processors. However, they can all be grouped by using multiple SPP technologies, which cap the overall CPU always at four CPUs in total.

When the ISV support for these technologies is in place, it is anticipated that it will be possible to increase the utilization within a fixed cost of software licenses.

Active processors and hardware boundaries

The upper boundary for licensing is always the quantity of active processors in the physical system (assigned and unassigned) because only active processors can be real engines for software.

Most software vendors consider each partition as a stand-alone server and depending on whether it is using dedicated processors or micro-partitioning, they license software per partition.

The quantity of processors for a certain partition can vary over time, for example, with dynamic partition operations. But, the overall licenses must equal or exceed the total number of processors that are used by the software at any point. If you are using uncapped micro-partitions, then the licensing must consider the fact that the partition can use extra processor cycles beyond the initial capacity entitlement.

Capacity capping

Two kinds of models for licensing software are available:

•A pre-pay license based on server capacity or number of users.

•A post-pay license based on auditing and accounting for actual capacity that is used.

Most software vendors offer the pre-pay method, and the question that they ask is about how much capacity a partition can use. The following sections illustrate how to calculate the amount of processing power that a partition can use.

Dedicated or dedicated-donating partitions

In a partition with dedicated processors, the initial licensing must be based on the number of processors that are assigned to the partition at activation. Depending on the partition profile maximums, if extra active processors or Capacity Upgrade on Demand (CUoD) processors are available in the system, these processors can be added dynamically, which allows operators to increase the quantity of processors.

Consider the number of software licenses before any additional processors are added, even temporarily, for example, with dynamic partition operations. Some ISVs might require licenses for the maximum number of processors for each of the partitions where the software is installed (the maximum quantity of processors in the partition profile).

Sharing idle processor cycles from running dedicated processor partitions does not change the licensing considerations.

Capacity capping of micro-partitions

Several factors must be considered when you calculate the capacity of micro-partitions. To allow the hypervisor to create micro-partitions, the physical processors are presented to the operating system as virtual processors. As micro-partitions are allocated processing time by the hypervisor, these virtual processors are dispatched on physical processors on a time-share basis.

With each logical processor mapping to a physical processor, the maximum capacity that an uncapped micro-partition can use is the number of available virtual processors, with the following assumptions:

•This capacity does not exceed the number of active processors in the physical system.

•This capacity does not exceed the available capacity in the SPP.

The following sections describe the different configurations that are possible and the licensing implications of each one.

Capped micro-partition

For a micro-partition, the wanted entitled capacity is a guaranteed capacity of computing power that a partition is given on activation. For a capped micro-partition, the entitled capacity also is the maximum processing power that the partition can use.

By using dynamic LPAR operations, you can vary the entitled capacity between the maximum and minimum values in the profile.

Uncapped micro-partition without MSPP technology

The entitled capacity that is given to an uncapped micro-partition is not necessarily a limit on the processing power. An uncapped micro-partition can use more than the entitled capacity if some resources within the system are available.

In this case, on a Power server that uses SPPs or that uses only the default SPP, the limiting factor for uncapped micro-partition is the number of virtual processors. The micro-partition can use up to the number of physical processors in the SPP because each virtual processor is dispatched to a physical processor.

With a single pool, the total resources that are available in the SPP are equal to the activated processors in the machine minus any dedicated (nondonating) partitions. The assumption is that at a point all other partitions are idle.

The total licensing liability for an uncapped partition without MSPP technology is either the number of virtual processors or the number of processors in the default SPP, whichever is smallest.

Uncapped micro-partition with MSPP technology

Similarly, the entitled capacity for an uncapped micro-partition is not necessarily a limit on the processing power. An uncapped micro-partition can use more than the entitled capacity if some resources are available within the system.

By using MSPP technology, it is possible to group micro-partitions and place a limit on the overall group maximum processing units. After an SPP group is defined, operators can group specific micro-partitions that are running the same software (if the software licensing terms permit it). This approach allows a pool of capacity that can be shared among several different micro-partitions.

System with CoD processors

Processors in the CoD pool do not count for licensing purposes until the following events happen:

•They become temporarily or permanently active and assigned to partitions.

•They become temporarily or permanently active in systems with PowerVM technology, and they can be used by micro-partitions.

Clients can provision licenses of selected software for temporary or permanent usage on their systems. Such licenses can be used to align with the possible temporary or permanent usage of CoD processors in existing or new AIX, IBM i, or Linux partitions.

Summary of licensing factors

Depending on the licensing model that is supported by the software vendor, it is possible to work out licensing costs based on these factors:

•Capped versus uncapped micro-partitions.

•Number of virtual processors.

•Unused processing cycles that are available in the machine, from dedicated-donating partitions and other micro-partitions.

•Multiple shared processor pool maximum.

•Active physical processors in the system.

An example of the license boundaries is illustrated in Figure 3-1 on page 87.

Figure 3-1 License boundaries with different processor and pool modes

IBM i software licensing

It is possible to use workload groups to limit the processing capacity of a workload to a subset of processor cores in a partition. This capability requires the workload groups’ PTFs.

Therefore, workload groups can be used to reduce license costs for a processor usage type-licensed program by completing the following steps:

•Create a workload group with a maximum processor core limit that is less than the number of processor cores that are configured for the partition.

•Add the licensed program to the newly created workload group.

•Identify the workloads that are associated with the licensed program and associate the workloads with the newly created workload group.

The licensed program owner must accept the reduced processor core capacity.

Linux software licensing

The license terms and conditions of Linux operating system distributions are provided by the Linux distributor, but all base Linux operating systems are licensed under the GPL. Distributor pricing for Linux includes media, packaging, shipping, and documentation costs, and they can offer extra programs under other licenses, and bundled service and support.

Clients or authorized IBM Business Partners are responsible for the installation of the Linux operating system, with orders handled according to license agreements between the client and the Linux distributor.

Clients must consider the quantity of virtual processors in micro-partitions for scalability and licensing purposes (uncapped partitions) when Linux is installed in a virtualized Power server.

Each Linux distributor sets its own pricing method for their distribution, service, and support. For more information, check the distributor's website and the following resources:

•SUSE Linux Enterprise Server, found at:

https://www.suse.com/products/server/

•Red Hat, found at:

https://www.redhat.com/en

For more information about Linux licensing, contact an IBM sales representative and see Enterprise Linux on Power, found at:

https://www.ibm.com/it-infrastructure/power/os/linux

3.3 Memory virtualization planning

This section describes the points that you need to plan and verify before you configure the server and implement the Active Memory Expansion (AME) memory virtualization features in your environment.

3.3.1 Hypervisor memory planning

The PHYP uses some of the memory that is activated in a Power server to manage memory that is assigned to individual partitions, manage I/O requests, and support virtualization requests. The amount of memory that is required by the hypervisor to support these features varies based on various configuration options that are chosen.

The assignment of the memory to the hypervisor ensures secure isolation between LPARs because the only allowed access to the memory contents is through security-validated hypervisor interfaces. In Figure 3-2, 128 GB is installed in the system, 128 GB is licensed memory (Configurable), and 3.5 GB (Reserved) memory is assigned to the hypervisor.

Figure 3-2 System memory properties

Components that contribute to hypervisor memory usage

The three main components that contribute to the overall usage of memory by the hypervisor are:

1. Memory that is required for hardware page tables (HPTs).

2. Memory that is required to support I/O devices.

3. Memory that is required for virtualization.

Memory usage for hardware page table

Each partition on the system has its own HPT that contributes to hypervisor memory usage. The HPT is used by the operating system to translate from effective addresses to physical real addresses in the hardware. This translation from effective to real addresses allows multiple operating systems to all run simultaneously in their own logical address space. The amount of memory for the HPT is based on the maximum memory size of the partition and the HPT ratio. The default HPT ratio is either 1/64 of the maximum (for IBM i partitions) or 1/128 (for AIX, VIOS, and Linux partitions) of the maximum memory size of the partition. AIX, VIOS, and Linux use larger page sizes (16 K, 64 K, and such) instead of using 4 K pages. Using larger page sizes reduces the overall number of pages that must be tracked so the overall size of the HPT can be reduced. For example, for an AIX partition with a maximum memory size of 256 GB, the HPT is 2 GB.

When a partition is defined, the maximum memory size that is specified must be based on the amount of memory that can be dynamically added to the partition (DLPAR) without having to change the configuration and restart the partition.

Memory usage for I/O devices

In support I/O operations, the hypervisor maintains structures that are called the Translation Control Entries (TCEs), which provide an information path between I/O devices and partitions. The TCEs provide the address of the I/O buffer, indication of read versus write requests, and other I/O-related attributes. Many TCEs per I/O device are in use, so multiple requests can be active simultaneous to the same physical device. For physical I/O devices, the base amount of space for the TCEs is defined by the hypervisor, based on the number of I/O devices that are supported.

Memory usage for virtualization features

Virtualization requires extra memory to be allocated by the hypervisor for hardware statesave areas and all the various virtualization technologies. For example, on Power8 processor-based servers and later servers, each processor core supports up to eight SMT threads of execution, and each thread contains over 80 different registers. The hypervisor must set aside save areas for the register contents for the maximum number of virtual processors that are configured. The greater the number of physical hardware devices, the greater the number of virtual devices, the greater the amount of virtualization, and the more hypervisor memory is required. For efficient memory consumption, wanted and maximums for various attributes (processors, memory, and virtual adapters) must be based on business needs, and not set to values that are higher than actual requirements.

Predicting memory usage

The IBM System Planning Tool (SPT) is a resource that can be used to estimate the amount of hypervisor memory that is required for a specific server configuration. After the SPT executable file is downloaded and installed, a configuration can be defined by selecting the appropriate hardware platform, installed processors, and memory, which define partitions and partition attributes. Given a configuration, the SPT can estimate the amount of memory that will be assigned to the hypervisor. This capability can help to change an existing configuration or when new servers are deployed.

For more information about SPT, see IBM System Planning Tool for Power processor-based systems, found at:

https://www.ibm.com/support/pages/ibm-system-planning-tool-power-processor-based-systems-0

3.3.2 Active Memory Expansion planning

AME is described in 2.2.2, “Active Memory Expansion” on page 39.

When a partition with AME is configured, the following two settings define how much memory is available:

Physical memory The amount of physical memory that is available to the partition. Usually, it corresponds to the wanted memory in the partition profile.

Memory expansion factor Defines how much of the physical memory is expanded.

Tip: The memory expansion factor can be defined individually for each partition.

AME relies on compression of in-memory data to increase the amount of data that can be placed into memory and thus expands the effective memory capacity of Power servers. The in-memory data compression is managed by the operating system, and this compression is transparent to applications and users.

The amount of memory that is available to the operating system can be calculated by multiplying the physical memory with the memory expansion factor. For example, in a partition that has 10 GB of physical memory and configured with a memory expansion factor of 1.5, the operating system sees 15 GB of available memory.

The compression and decompression activities require CPU cycles. Therefore, when AME is enabled, spare CPU resources must be available in the partition for AME.

AME does not compress file cache pages and pinned memory pages.

If the expansion factor is too high, the target-expanded memory size cannot be achieved and a memory deficit forms. The effect of a memory deficit is the same as the effect of configuring a partition with too little memory. When a memory deficit occurs, the operating system might have to resort to paging out virtual memory to the paging space.

Note: When AME is enabled, by default the AIX operating system uses 4 KB pages. However, if you are running IBM AIX 7.2 with Technology Level 1 or later on a Power9 or a Power10 processor-based server, you can use the vmo command with the ame_mpsize_support parameter to enable 64 KB page size.

AME factor

You can configure the degree of memory expansion that you want to achieve for the LPAR by setting the AME factor in a partition profile of the LPAR. The expansion factor is a multiplier of the amount of memory that is assigned to the LPAR.

When AME is configured, a single configuration option must be set for the LPAR, which is the memory expansion factor. An LPAR's memory expansion factor specifies the target effective memory capacity for the LPAR. This target memory capacity provides an indication to the operating system of how much memory is made available with memory compression. The target memory capacity that is specified is referred to as the expanded memory size. The memory expansion factor is specified as a multiplier of an LPAR's true memory size, as shown in the following equation:

LPAR_expanded_mem_size = LPAR_true_mem_size * LPAR_mem_exp_factor

For example, an LPAR’s memory expansion factor of 2.0 indicates that memory compression must be used to double the LPAR's memory capacity. If an LPAR is configured with a memory expansion factor of 2.0 and a memory size of 20 GB, then the expanded memory size for the LPAR is 40 GB, as shown in the following equation:

40 GB = 20 GB * 2.0

The operating system compresses enough in-memory data to fit 40 GB of data into 20 GB of memory. The memory expansion factor and the expanded memory size can be dynamically changed at run time by using the HMC through dynamic LPAR operations. The expanded memory size is always rounded down to the nearest logical memory block (LMB) multiple.

Note: You do not need to check whether your application is certified for AME; it is hidden within the AIX kernel.

Memory deficit

When the memory expansion factor for an LPAR is configured, it is possible that the chosen memory expansion factor is too large and cannot be achieved based on the compressibility of the workload.

When the memory expansion factor for an LPAR is too large, then a memory expansion deficit forms, which indicates that the LPAR cannot achieve its memory expansion factor target. For example, if an LPAR is configured with a memory size of 20 GB and a memory expansion factor of 1.5, it results in a total target-expanded memory size of 30 GB. However, the workload that runs in the LPAR does not compress well, and the workload's data compresses only by a ratio of 1.4 to 1. In this case, it is impossible for the workload to achieve the targeted memory expansion factor of 1.5. The operating system limits the amount of physical memory that can be used in a compressed pool up to a maximum of 95%. This value can be adjusted by using the vmo command with the ame_min_ucpool_size parameter. In this example with the LPAR memory size as 20 GB, if the ame_min_ucpool_size parameter value is set to 90, 18 GB are reserved for compressed pool. The maximum achievable expanded memory size is
27.2 GB (2 GB + 1.4 x 18 GB). The result is a 2.8 GB shortfall. This shortfall is referred to as the memory deficit.

The effect of a memory deficit is the same as the effect of configuring an LPAR with too little memory. When a memory deficit occurs, the operating system cannot achieve the expanded memory target that is configured for the LPAR. In this case, the operating system might have to resort to paging out virtual memory pages to paging space. Thus, in the previous example, if the workload uses more than 27.2 GB of memory, the operating system starts paging out virtual memory pages to paging space.

To get an indication of whether a workload can achieve its expanded memory size, the operating system reports a memory deficit metric. This deficit is a “hole” in the expanded memory size that cannot be achieved. If this deficit is zero, the target memory expansion factor can be achieved, and the LPAR's memory expansion factor is configured correctly. If the expanded memory deficit metric is nonzero, then the workload falls short of achieving its expanded memory size by the size of the deficit.

To eliminate a memory deficit, the LPAR's memory expansion factor must be reduced. However, reducing the memory expansion factor reduces the LPAR's expanded memory size. Thus, to keep the LPAR's expanded memory size the same, the memory expansion factor must be reduced and more memory must be added to the LPAR. Both the LPAR's memory size and memory expansion factor can be changed dynamically.

AME planning

The benefit of AME to a workload varies based on the workload's characteristics. Some workloads can get a higher level of memory expansion than other workloads. The AME Planning and Advisory Tool amepat helps plan the deployment of a workload in the AME environment. It also provides guidance on the level of memory expansion that a workload can achieve.

The AME Planning Tool (located in /usr/bin/amepat) serves two primary purposes. They are:

•To plan an initial AME configuration.

•To monitor and fine-tune an active AME configuration.

The AME Planning Tool can run on LPARs with and without AME enabled. In an LPAR where AME was not enabled, run amepat with a representative workload. Set amepat to monitor the workload for a meaningful period. For example, the amepat tool is set to run during a workload's peak resource usage. After it completes, the tool displays a report with various potential memory expansion factors and the expected CPU utilization attributable to an AME for each factor. The tool also provides a recommended memory expansion factor that seeks to maximize memory savings while minimizing extra CPU utilization.

Figure 3-3 on page 93 shows an amepat output sample report.

Figure 3-3 An amepat output sample report

The report and recommendation can be a useful initial configuration for an AME deployment. In an LPAR where AME is enabled, amepat serves a similar purpose. When it is run at peak time for a representative workload, the tool provides a report with the actual CPU utilization attributable to AME at the current memory expansion factor. It also displays memory deficit information if it is present. Because the AME is enabled, the tool can also provide a more accurate representation of what CPU utilization levels can be expected at different memory expansion factors. A new recommendation based on this information is presented to the user.

For more information about the amepat report, see Active Memory Expansion (AME), found at:

https://www.ibm.com/docs/en/aix/7.3?topic=management-active-memory-expansion-ame

3.4 Virtual I/O Server planning

This section describes the details to consider for planning a VIOS.

3.4.1 Specifications that are required to create the VIOS

To activate the VIOS, the PowerVM Editions hardware feature is required. An LPAR with enough resources to share with other LPARs also is required. Table 3-1 shows a list of minimum hardware requirements that must be available to create the VIOS.

Table 3-1 Resources that are required for VIOS

Resource	Requirement
HMC	The HMC is required to create the LPAR and assign resources.
Storage adapter	The server LPAR needs at least one storage adapter.
Physical disk	The disk must be at least 30 GB. This disk can be shared.
Ethernet adapter	To route network traffic from Virtual Ethernet Adapters (VEAs) to a Shared Ethernet Adapter (SEA), you need an Ethernet adapter.
Memory	A general rule for the minimum memory requirement for VIOS 3.1 is 4 GB. A minimum current memory requirement might support a configuration with a minimum number of devices or a small maximum memory configuration. However, to support shared storage pools (SSPs), the minimum memory requirement is 4 GB. More devices increase the minimum current memory requirement.
Processor	At least 0.05 processing units are required.

Table 3-2 defines the limitations for storage management:

Table 3-2 Limitations for storage management

Category	Limit
Volume groups	4096 per system.
Physical volumes	1024 per volume group.
Physical disk	The disk must be at least 30 GB. This disk can be shared.
Physical partitions	1024 per volume group.
Logical volumes	1024 per volume group.
LPARs	No limit.

Limitations and restrictions of the VIOS configuration

Consider the following items when you implement virtual SCSI (vSCSI):

•vSCSI supports the following connection standards for backing devices: Fibre Channel (FC), SCSI, SCSI RAID, iSCSI, SAS, SATA, Universal Serial Bus (USB), and IDE.

•The SCSI protocol defines mandatory and optional commands. Although vSCSI supports all the mandatory commands, not all the optional commands are supported.

•There might be utilization implications when you use vSCSI devices. Because the client/server model is made up of layers of function, vSCSI can consume more processor cycles when processing I/O requests.

•The VIOS is a dedicated LPAR that is used only for VIOS operations. Other applications cannot run in the VIOS LPAR.

•If there is a resource shortage, performance degradation might occur. If a VIOS is serving many resources to other LPARs, ensure that enough processor power is available. In case of high workload across VEAs and virtual disks, LPARs might experience delays in accessing resources.

•Logical volumes and files that are exported as vSCSI disks are always configured as single path devices on the client LPAR.

•Logical volumes or files that are exported as vSCSI disks that are part of the root volume group (rootvg) are not persistent if you reinstall the VIOS. However, they are persistent if you update the VIOS to a new Service Pack (SP). Therefore, before you reinstall the VIOS, ensure that you back up the corresponding clients' virtual disks. When exporting logical volumes, it is best to export logical volumes from a volume group other than the root volume group. When exporting files, it is best to create file storage pools and the virtual media repository in a parent storage pool other than the root volume group.

Consider the following items when you implement virtual adapters:

•Only Ethernet adapters can be shared.

•IP forwarding is not supported on the VIOS.

•The maximum number of virtual adapters can be any value 2 - 65,536. However, if you set the maximum number of virtual adapters to a high value, the server firmware requires more system memory to manage the virtual adapters. Setting the maximum number of virtual adapters to an excessive number might even lead an LPAR failing to activate.

•Consider the following items when you increase the virtual I/O slot limit:

– The maximum number of virtual I/O slots that is supported on AIX, IBM i, and Linux partitions is 32,767.

– The maximum number of virtual adapters can be any value 2 - 32767. However, higher maximum values require more system memory to manage the virtual adapters.

Sizing of processor and memory

The sizing of the processor and memory resources for VIOS depends on the amount and type of workload that the VIOS must process. For example, network traffic that goes through a SEA requires more processor resources than vSCSI traffic. Also, when a VIOS is used as a mover service partition (MSP), it requires more processor and memory resources during an active Live Partition Mobility (LPM) operation.

Table 3-3 can be used as a starting point for the environment.

Rules: The following examples are only starting points when you set up an environment that uses the VIOS for the first time. The actual sizing might vary depending on the level of the virtualization and configuration of the system.

Table 3-3 Virtual I/O Server sizing examples

Environment	CPU (example)	Virtual CPU (example)	Memory (example)
Small environment	0.25 - 0.5 processors (uncapped)	1 - 2	4 GB
Large environment	1 - 2 processors (uncapped)	2 - 4	6 GB
Environment that uses SSPs	At least one processor (uncapped)	4 - 6	8 GB

Monitoring: When the environment is in production, the processor and memory resources on the VIOS must be monitored regularly, and adjusted if necessary to make sure the configuration fits with workload. For more information about monitoring CPU and memory on the VIOS, see IBM PowerVM Virtualization Managing and Monitoring, SG24-7590.

The VIOS is designed for selected configurations that include specific models of IBM and other vendor storage products. Consult your IBM representative or IBM Business Partner for the latest information and included configurations.

List of supported adapters

Virtual devices that are exported to client partitions by the VIOS must be attached through supported adapters. An updated list of supported adapters and storage devices is available at the following websites:

•Adapter information by feature code for the 9043-MRX, 9080-HEX, 9105-22A, 9105-22B, 9105-41B, 9105-42A, 9786-22H, or 9786-42H system and EMX0 PCIe3 expansion drawers, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=adapters-adapter-information-by-feature-code

•Adapter information by feature code for the 5105-22E, 9008-22L, 9009-22A, 9009-22G, 9009-41A, 9009-41G, 9009-42A, 9009-42G, 9040-MR9, 9080-M9S, 9223-22H, 9223-22S, 9223-42H, 9223-42S system, and EMX0 PCIe3 expansion drawers, found at:

https://www.ibm.com/docs/en/power9/9080-M9S?topic=adapters-adapter-information-by-feature-code

Plan carefully before you begin the configuration and installation of your VIOS and client partitions. Depending on the type of workload and the needs of an application, it is possible to mix virtual and physical devices in the client partitions.

For more information about planning for the VIOS, see Planning for the Virtual I/O Server, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=server-planning

3.4.2 Redundancy considerations

This section describes requirements for providing high availability (HA) for VIOSs.

Redundancy options are available at several levels in the virtual I/O environment. Multipathing, mirroring, and RAID redundancy options exist for the VIOS and client LPARs. Ethernet link aggregation (LA) (also called Etherchannel) is also an option for the client LPARs, and the VIOS provides SEA failover and single-root I/O virtualization (SR-IOV) with virtual Network Interface Controllers (vNICs) failover. SR-IOV with vNIC is described in 2.4.4, “SR-IOV with virtual Network Interface Controller” on page 50.

Support for node failover (by using IBM PowerHA SystemMirror or VM Recovery Manager (VMRM)) is available for nodes that use virtual I/O resources.

This section contains information about redundancy for both the client LPARs and the VIOS. Although these configurations help protect the LPARs and VIOS from the failure of one of the physical components, such as a disk or network adapter, they might cause the client LPAR to lose access to its devices if the VIOS fails. The VIOS can be made redundant by running a second instance in another LPAR. When you run two instances of the VIOS, you can use logical volume mirroring (LVM), multipath input/output (MPIO), Network Interface Backup (NIB), or multipath routing with Dead Gateway Detection (DGD) in the client LPAR to provide HA access to virtual resources that are hosted in separate VIOS LPARs.

In a dual-VIOS configuration, vSCSI, virtual Fibre Channel (VFC) (NPIV), SEA, and SR-IOV with vNIC failover can be configured in a redundant fashion. This approach allows system maintenance such as restarts, software updates, or even reinstallation to be performed on a VIOS without causing outage to virtual I/O clients. This reason is the main one to implement dual VIOSs.

With proper planning and architecture implementation, maintenance can be performed on a VIOS and any external device to which it connects, such as a network or storage area network (SAN) switch, removing the layer of physical resource dependency.

When the client partition uses multipathing and SEA or SR-IOV with vNIC failover, no actions need to be performed on the client partition during the VIOS maintenance, or after it completes. This approach results in improved uptime and reduced system administration efforts for the client partitions.

Tip: A combination of multipathing for disk redundancy and SEA failover or SR-IOV with vNIC failover for network redundancy are industry best practices.

Upgrading and rebooting a VIOS, network switch, or SAN switch is simpler and more compartmentalized because the client no longer depends on the availability of all the environment.

In Figure 3-4, a client partition has vSCSI devices and a VEA that is backed by two VIOSs. The client has multipathing implemented across the vSCSI devices and SEA failover for the virtual Ethernet.

Figure 3-4 Redundant Virtual I/O Servers before maintenance

When VIOS 2 is shut down for maintenance, as shown in Figure 3-5 on page 99, the client partition continues to access the network and SAN storage through VIOS 1.

Figure 3-5 Redundant Virtual I/O Servers during maintenance

When VIOS 2 returns to a full running state, these events occur:

•An AIX client continues to use the MPIO path through VIOS 1 unless the MPIO path is manually changed to VIOS 2.

•An IBM i or Linux multipathing client, which uses a round-robin multipathing algorithm, automatically starts to use both paths when the path to VIOS 2 becomes operational again.

•If VIOS 2 is the primary SEA, client network traffic that goes through the backup SEA on VIOS 1 automatically resumes on VIOS 2.

In addition to continuous availability, a dual VIOS setup also separates or balances the virtual I/O load, which results in resource consumption across the VIOSs.

Virtual Ethernet traffic is generally heavier on the VIOS than vSCSI traffic. Virtual Ethernet connections generally take up more CPU cycles than connections through physical Ethernet adapters. The reason is that modern physical Ethernet adapters contain many functions to offload some work from the system’s CPUs, for example, checksum computation and verification, interrupt modulation, and packet reassembly.

In a configuration that runs MPIO and a single SEA per VIOS, the traffic is typically separated so that the virtual Ethernet traffic goes through one VIOS and the vSCSI traffic goes through the other. This separation is done by defining the SEA trunk priority and the MPIO path priority.

Important: Do not turn off SEA threading on VIOS that might be used both for storage and network virtualization.

In an MPIO configuration with several SEAs per VIOS, you typically balance the network and vSCSI traffic between VIOSs.

Paths: Use the storage configuration commands to check that the preferred paths on the storage subsystem are in accordance with the path priorities that are set in the virtual I/O clients.

Figure 3-6 shows an example configuration where network and disk traffic are separated:

•VIOS 1 has priority 1 for the network and priority 2 for the disk.

•VIOS 2 has priority 2 for the network and priority 1 for the disk.

Figure 3-6 Separating disk and network traffic

3.5 Storage virtualization planning

The following sections explain how to plan storage virtualization in a PowerVM environment that uses vSCSI and VFC (NPIV).

Note: The configurations that are described in this section are not a complete list of all available supported configurations.

3.5.1 Virtual SCSI planning

By using vSCSI, client LPARs can share disk storage and tape or optical devices that are assigned to the VIOS LPAR.

Physical storage devices such as disk, tape, USB mass storage, or optical devices that are attached to the VIOS LPAR can be shared by one or more client LPARs. The VIOS provides access to storage subsystems by using logical unit numbers (LUNs) that are compliant with the SCSI protocol. The VIOS can export a pool of heterogeneous physical storage as a homogeneous pool of block storage in the form of SCSI disks. The VIOS is a storage subsystem. Unlike typical storage subsystems that are physically in the SAN, the SCSI devices that are exported by the VIOS are limited to the domain within the server. Therefore, although the SCSI LUNs are SCSI-compliant, they might not meet the needs of all applications, particularly those applications that exist in a distributed environment.

The following SCSI peripheral device types are supported:

•Disk that is backed by a logical volume.

•Disk that is backed by a file.

•Disk that is backed by a logical unit (LU) in SSPs.

•Optical CD-ROM, DVD-RAM, and DVD-ROM.

•Optical DVD-RAM backed by file.

•Tape devices.

•USB mass storage devices.

vSCSI is based on a client/server relationship model, as described in the following points.

•The VIOS owns the physical resources and the vSCSI server adapter, and acts as a server, or SCSI target device. The client LPARs have a SCSI initiator that is referred to as the vSCSI client adapter, and accesses the vSCSI targets as standard SCSI LUNs.

•Virtual disk resources can be configured and provisioned by using the HMC or the VIOS command-line interface (CLI).

•Physical disks that are owned by the VIOS can be exported and assigned to a client LPAR as a whole, added to an SSP, or partitioned into parts, such as logical volumes or files. Then, the logical volumes and files can be assigned to different LPARs. Therefore, by using vSCSI, you can share adapters and disk devices.

•LUs in logical volumes and file-backed virtual devices prevent the client partition from participating in LPM. To make a physical volume, logical volume, or file available to a client LPAR requires that it must be assigned to a vSCSI server adapter on the VIOS. The client LPAR accesses its assigned disks through a vSCSI client adapter. The vSCSI client adapter recognizes standard SCSI devices and LUNs through this virtual adapter.

For more information about vSCSI, see Planning for virtual SCSI, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=overview-virtual-scsi

Performance considerations

If sufficient CPU processing capacity is available, the performance of vSCSI must be comparable to dedicated I/O devices.

Virtual Ethernet, which has nonpersistent traffic, runs at a higher priority than the vSCSI on the VIOS. To make sure that high volumes of networking traffic do not starve vSCSI of CPU cycles, a threaded mode of operation is implemented for the VIOS by default since
Version 1.2.

For more information about performance differences between physical and virtual I/O, see Planning for virtual SCSI, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=planning-virtual-scsi

Maximum number of slots

vSCSI itself does not have any maximums in terms of number of supported devices or adapters. The VIOS supports a maximum of 1024 virtual I/O slots per VIOS. A maximum of 256 virtual I/O slots can be assigned to a single client partition.

Every I/O slot needs some physical server resources to be created. Therefore, the resources that are assigned to the VIOS put a limit on the number of virtual adapters that can be configured.

For more information about limitation restrictions, see Limitations and restrictions of the Virtual I/O Server configuration, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=planning-limitations-restrictions-virtual-io-server-configuration

Naming conventions

A well-planned naming convention is key in managing the information. One strategy for reducing the amount of data that must be tracked is to make settings match on the virtual I/O client and server wherever possible.

The naming convention might include corresponding volume group, logical volume, and virtual target device (VTD) names. Integrating the virtual I/O client hostname into the VTD name can simplify tracking on the server.

Virtual device slot numbers

All vSCSI and Virtual Ethernet devices have slot numbers. In complex systems, there tends to be far more storage devices than network devices because each vSCSI device can communicate only with one server or client.

As shown in Figure 3-7 on page 103, the default value is 10 when you create an LPAR. The appropriate number for your environment depends on the number of virtual servers and adapters that are expected on each system. Each unused virtual adapter slot consumes a small amount of memory, so the allocation must be balanced. It is a best practice to set the maximum virtual adapters number to at least 100 on VIOS.

Important: When you plan for the number of virtual I/O slots on your LPAR, the maximum number of virtual adapter slots that is available on a partition is set by the partition’s profile. To increase the maximum number of virtual adapters, you must change the profile, stop the partition (not just a restart), and start the partition.

To add virtual I/O clients without shutting down the LPAR or VIOS partition, leave plenty of room for expansion when the maximum number of slots are set.

The maximum number of virtual adapters must not be set higher than 1024 because that setting can cause performance problems.

Figure 3-7 Setting the maximum limits in the partition’s properties

For AIX virtual I/O client partitions, each adapter pair can handle up to 85 virtual devices with the default queue depth of three.

For IBM i clients, up to 16 virtual disk and 16 optical devices are supported.

For Linux clients, by default, up to 192 vSCSI targets are supported.

In situations where virtual devices per partition are expected to exceed these numbers, or where the queue depth on certain devices might be increased over the default, reserve extra adapter slots for the VIOS and the virtual I/O client partition.

When queue depths are tuned, the vSCSI adapters have a fixed queue depth. There are 512 command elements, of which two are used by the adapter, three are reserved for each vSCSI LUN for error recovery, and the rest are used for I/O requests. Thus, the default queue depth of 3 for vSCSI LUNs allows for up to 85 LUNs to use an adapter: (512 - 2) / (3 + 3) = 85 rounding down. If you need higher queue depths for the devices, the number of LUNs per adapter is reduced. For example, if you want to use a queue depth of 25, it allows 510/28 =
18 LUNs per adapter for an AIX client partition.

For Linux clients, the maximum number of LUNs per vSCSI adapter is decided by the max_id and max_channel parameters. The max_id is set to 3 by default, which can be increased to 7. The max_channel parameter is set to 64 by default, which is the maximum value. With the default values, the Linux client can have 3 * 64 = 192 vSCSI targets. If you overload an adapter, your performance is reduced.

Adding multiple adapters between a VIOS and a client must be considered when you are using mirroring on the virtual I/O client across multiple storage subsystems for availability.

For more information about capacity planning for latency, bandwidth, and sizing considerations, see Planning for virtual SCSI, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=planning-virtual-scsi

Virtual SCSI limitations for IBM i

The vSCSI limitations for IBM i are as follows:

•The IBM i 7.1 TR8 or later client LPARs can have up to 32 disk units (logical volumes, physical volumes, or files) and up to 16 optical units under a single virtual adapter.

•The maximum virtual disk size is 2 TB minus 512 bytes. If you are limited to one adapter and you have a storage requirement of 32 TB, for example, you might need to make your virtual disks the maximum size of 2 TB. However, in general, consider spreading the storage over multiple virtual disks with smaller capacities. This approach can help improve concurrency.

•Mirroring and multipath through up to eight VIOS partitions is the redundancy option for client LPARs. However, you also can use multipathing and RAID on the VIOS for redundancy.

•You must assign the tape device to its own VIOS adapter because tape devices often send large amounts of data that might affect the performance of any other device on the adapter.

For more information, see Multipathing and disk resiliency with vSCSI in a dual VIOS configuration, found at:

https://www.ibm.com/support/pages/multipathing-and-disk-resiliency-vscsi-dual-vios-configuration

3.5.2 Virtual Fibre Channel planning

N_Port ID Virtualization (NPIV) is an industry-standard technology that helps you to configure an NPIV-capable FC adapter with multiple, virtual worldwide port names (WWPNs). This technology is also called VFC. Similar to the virtual vSCSI function (vSCSI), VFC is a method to securely share a physical FC adapter among multiple VIOSs.

From an architectural perspective, the key difference between VFC and vSCSI is that the VIOS does not act as a SCSI emulator to its client partitions. Instead, it acts as a direct FC pass-through for the Fibre Channel Protocol (FCP) I/O traffic through the hypervisor. The client partitions are presented with full access to the physical SCSI target devices of a SAN disk or tape storage systems. The benefits of VFC are that the physical target device characteristics such as vendor or model information remains fully visible to the VIOS. Hence, you do not change the device drivers such as multi-pathing software, middleware such as copy services, or storage management applications that rely on the physical device characteristics.

For each VFC client adapter, two unique, virtual WWPNs, starting with the letter c, are generated by the HMC. After the activation of the client partition, the WWPNs log in to the SAN similar to other WWPNs from a physical port.

NPIV is described in 2.3.2, “Virtual Fibre Channel” on page 43.

Role of Virtual I/O Server

For VFC, the VIOS acts as an FC pass-through instead of a SCSI emulator, such as when vSCSI is used. A comparison between vSCSI and VFC is shown in Figure 3-8.

Figure 3-8 Comparing virtual SCSI and Virtual Fibre Channel

Two unique virtual WWPNs starting with the letter “c” are generated by the HMC for the VFC client adapter. After activation of the client partition, these WWPNs log in to the SAN like any other WWPNs from a physical port. Therefore, disk or tape storage target devices can be assigned to them as though they were physical FC ports.

Planning considerations for Virtual Fibre Channel

Consider the following information when you use the VFC:

•One VFC client adapter per physical port per client partition. This strategy helps to avoid a single point of failure (SPOF).

•For 16 GBps or slower FC adapters, a maximum of 64 active VFC client adapters per physical port. The virtual adapters per physical port can be reduced due to other VIOS resource constraints.

•For 32 GBps or faster FC adapters, a maximum of 255 VFC client adapters per physical port. The virtual adapters per physical port can be reduced because of other VIOS resource constraints.

•Maximum of 64 targets per VFC adapter.

•32,000 unique WWPN pairs per system. Removing a VFC client adapter does not reclaim WWPNs. You can manually reclaim WWPNs by using the mksyscfg and chhwres commands or by using the virtual_fc_adapters attribute.

•To enable VFC on the managed system, create the required VFC adapters and connections by using HMC, as described in Chapter 4, “Implementing IBM PowerVM” on page 139.

The HMC generates WWPNs based on the range of names that is available for use with the prefix in the vital product data on the managed system. You can get the 6-digit prefix when you purchase the managed system. The 6-digit prefix includes 32,000 pairs of WWPNs. When you remove a VFC adapter from a client partition, the hypervisor deletes the WWPNs that are assigned to the VFC adapter on the client partition. The HMC does not reuse the deleted WWPNs to generate WWPNs for VFC adapters. If you require more WWPNs, you must obtain an activation code that includes another prefix that has another 32,000 pairs of WWPNs.

To avoid configuring the physical FC adapter to be a SPOF for the connection between the client partition and its physical storage on the SAN, do not connect two VFC adapters from the same client partition to the same physical FC adapter. Instead, connect each VFC adapter to a different physical FC adapter.

On a server that is managed by the HMC, you can dynamically add and remove VFC adapters to and from the VIOS and from each client partition. You can also view information about the virtual and physical FC adapters and the WWPNs by using VIOS commands.

Virtual Fibre Channel limitations for IBM i

The VFC limitations for IBM i are as follows:

•The IBM i client partition supports up to 128 target port connections per VFC adapter.

•The IBM i 7.3 and IBM i 7.4 client partitions support up to 127 SCSI devices per VFC adapter. The 127 SCSI devices can be any combination of disk units or tape libraries. With tape libraries, each control path is counted as a unique SCSI device in addition to a single SCSI device per tape drive.

•For IBM i client partitions, the LUNs of the physical storage that is connected with VFC require a storage-specific device driver and do not use the generic vSCSI device driver.

•The IBM i client partition supports up to eight multipath connections to a single FC disk unit. Each multipath connection can be made with a VFC adapter or with FC I/O adapter hardware that is assigned to the IBM i partition.

•IBM i supports mapping the same physical FC port to multiple VFC adapters in the same IBM i client. All LUNs (disk or tape) that are associated to that physical FC adapter must be unique so that no multi-path is created within the same physical port. To use LPM or remote restart capability, you can map only the physical port twice to the same IBM i LPAR. The VIOS must be at the Version 3.1.2.0 or later. The HMC must be at
Version 9.2.950 or later. These versions are required for the LPM and to restart the LPAR with double-mapped ports support.

•With VIOS, you can install IBM i in a client LPAR on Power9 or Power10 processor-based systems. IBM i client LPARs have unique system and storage requirements and considerations.

For more information about VFC limitations for IBM i, see Limitations and restrictions for IBM i client logical partitions, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=planning-i-restrictions

3.5.3 Redundancy configurations for virtual Fibre Channel adapters

To implement highly reliable virtual I/O storage configurations, plan the following redundancy configurations to protect your virtual I/O production environment from physical adapter failures and from VIOS failures.

With NPIV, you can configure the managed system so that multiple LPARs can access independent physical storage through the same physical FC adapter. Each VFC adapter is identified by a unique WWPN, which means that you can connect each VFC adapter to independent physical storage on a SAN.

Host bus adapter redundancy

Similar to vSCSI redundancy, VFC redundancy can be achieved by using multipathing or mirroring at the client LPAR. The difference between redundancy with vSCSI adapters and the VFC technology that uses VFC client adapters is that the redundancy occurs at the client because only the virtual I/O client LPAR recognizes the disk. The VIOS is just an FC pass-through managing the data transfer through the hypervisor.

Host bus adapter (HBA) failover provides a basic level of redundancy for the client LPAR. Figure 3-9 shows the connectivity example.

Figure 3-9 Host bus adapter connectivity

•The SAN connects physical storage to two physical FC adapters that are on the managed system.

•The physical FC adapters are assigned to the VIOS and support NPIV.

•The physical FC ports are each connected to a VFC adapter on the VIOS. The two VFC adapters on the VIOS are connected to ports on two different physical FC adapters to provide redundancy for the physical adapters.

•Each VFC adapter on the VIOS is connected to one VFC adapter on a client LPAR. Each VFC adapter on each client LPAR receives a pair of unique WWPNs. The client LPAR uses one WWPN to log in to the SAN at any specific time. The other WWPN is used when you move the client LPAR to another managed system.

•The VFC adapters always have a one-to-one relationship between the client LPARs and the VFC adapters on the VIOS LPAR. That is, each VFC adapter that is assigned to a client LPAR must connect to only one VFC adapter on the VIOS. Also, each VFC on the VIOS must connect to only one VFC adapter on a client LPAR.

Note: As a best practice, configure VFC adapters from multiple LPARs to the same HBA, or configure VFC adapters from the same LPAR to different HBAs.

Host bus adapter and Virtual I/O Server redundancy

An HBA and VIOS redundancy configuration provide a more advanced level of redundancy for the virtual I/O client partition. Figure 3-10 on page 109 shows the following connections:

•The SAN connects physical storage to two physical FC adapters that are on the managed system.

•Two VIOS LPARs provide redundancy at the VIOS level.

•The physical FC adapters are assigned to their respective VIOS and support NPIV.

•The physical FC ports are each connected to a VFC adapter on the VIOS.

•The two virtual FC adapters on the VIOS are connected to ports on two different physical FC adapters to provide redundancy for the physical adapters. A single adapter might have multiple ports.

Figure 3-10 A host bus adapter and Virtual I/O Server redundancy

The client can write to the physical storage through VFC adapter 1 or 2 on the client LPAR through VIOS 2. The client also can write to physical storage through VFC adapter 3 or 4 on the client LPAR through VIOS 1. If a physical FC adapter fails on VIOS 1, the client uses the other physical adapter that is connected to VIOS 1 or uses the paths that are connected through VIOS 2. If VIOS 1 fails, then the client uses the path through VIOS 2. This example does not show redundancy in the physical storage, but assumes it is built into the SAN.

Other considerations for virtual Fibre Channel

These examples can become more complex as you add physical storage redundancy and multiple clients, but the concepts remain the same. Consider the following points:

•To avoid configuring the physical FC adapter to be a SPOF for the connection between the client LPAR and its physical storage on the SAN, do not connect two VFC adapters from the same client LPAR to the same physical FC adapter. Instead, connect each VFC adapter to a different physical FC adapter.

•Consider load-balancing when a VFC adapter on the VIOS is mapped to a physical port on the physical FC adapter.

•Consider what level of redundancy exists in the SAN to determine whether to configure multiple physical storage units.

•Consider the usage of two VIOS LPARs. Because the VIOS is central to communication between LPARs and the external network, it is important to provide a level of redundancy for the VIOS. Multiple VIOS LPARs require more resources too, so you must plan for them too.

Using their unique WWPNs and the VFC connections to the physical FC adapter, the client operating system that runs in the virtual I/O client partitions discovers, instantiates, and manages the physical storage that is on the SAN as though it were natively connected to the SAN storage device. The VIOS provides the virtual I/O client partitions with a connection to the physical FC adapters on the managed system.

A one-to-one relationship always exists between the VFC client adapter and the VFC server adapter.

The SAN uses zones to provide access to the targets based on WWPNs. VFC client adapters are created by using HMC with unique set of WWPNs. VFC adapters can be zoned for SAN access, just like physical FC adapters.

Redundancy configurations help to increase the serviceability of your VIOS environment. With VFC, you can configure the managed system so that multiple virtual I/O client partitions can independently access physical storage through the same physical FC adapter. Each VFC client adapter is identified by a unique WWPN, which means that you can connect each virtual I/O partition to independent physical storage on a SAN.

Mixtures: Though any mixture of VIOS native SCSI, vSCSI, and VFC I/O traffic is supported on the same physical FC adapter port, consider the implications that this mixed configuration might have for manageability and serviceability.

IBM i Virtual Fibre Channel recommendations

For more information about recommendations that can be followed to ensure that VFC and NPIV environments perform as well as possible when they are connected to supported external storage systems, see IBM i Virtual Fibre Channel Performance Best Practices, found at:

https://www.ibm.com/support/pages/ibm-i-virtual-fibre-channel-performance-best-practices

For more information about Virtual Fibre Channel planning, see How to prepare for SAN changes in a Virtualized Fibre Channel NPIV environment, found at:

https://www.ibm.com/support/pages/node/6610641

For more information about recommended device attributes for redundancy, see Configuring a VIOS for client storage, found at:

https://www.ibm.com/docs/en/power10?topic=partition-configuring-vios-client-storage

3.5.4 Virtual SCSI and Virtual Fibre Channel comparison

vSCSI and VFC both offer significant benefits by enabling shared utilization of physical I/O resources. The following sections compare both capabilities and provide guidance for selecting the most suitable option.

Overview

Table 3-4 shows a high-level comparison of vSCSI and VFC.

Table 3-4 Virtual SCSI and Virtual Fibre Channel comparison

Feature	Virtual SCSI	VFC
Server-based storage virtualization	Yes	No
Adapter-level sharing	Yes	Yes
Device-level sharing	Yes	No
LPM-capable	Yes	Yes
SSP-capable	Yes	No
SCSI-3 compliant (persistent reserve)	No¹	Yes
Generic device interface	Yes	No
Tape library and LAN-free backup support	No	Yes
Virtual tape and virtual optical support	Yes	No
Support for IBM PowerHA System Mirror for i²	No	Yes

¹ Unless using SSPs.

² Applies only to IBM i partitions.

Components and features

The following section describes the various components and features.

Device types

vSCSI provides virtualized access to disk devices, optical devices, and tape devices. With VFC, SAN disk devices and tape libraries can be attached. The access to tape libraries enables the usage of LAN-free backup, which is not possible with vSCSI.

Adapter and device sharing

vSCSI allows sharing of physical storage adapters. It also allows sharing of storage devices by creating storage pools that can be partitioned to provide logical volume or file-backed devices.

VFC technology allows sharing of physical FC adapters only.

Hardware requirements

VFC implementation requires VFC-capable FC adapters on the VIOS and VFC-capable SAN switches.

vSCSI supports a broad range of physical adapters.

Storage virtualization

vSCSI server provides servers-based storage virtualization. Storage resources can be aggregated and pooled on the VIOS.

When VFC is used, the VIOS is only passing-through I/O to the client partition. Storage virtualization is done on the storage infrastructure in the SAN.

Storage assignment

With vSCSI, the storage is assigned (zoned) to the VIOSs. From a storage administration perspective, no end-to-end view to see which storage is allocated to which client partition is available. When new disks are added to an existing client partition, they must be mapped on the VIOS. When LPM is used, storage must be assigned to the VIOSs on the target server.

With VFC, the storage is assigned to the client partitions, as in an environment where physical adapters are used. No intervention is required on the VIOS when new disks are added to an existing partition. When LPM is used, storage moves to the target server without requiring a reassignment because the VFCs have their own WWPNs that move with the client partitions to the target server.

Support of PowerVM capabilities

Both vSCSI and VFC support most PowerVM capabilities, such as LPM.

VFC does not support virtualization capabilities that are based on the SSP, such as thin-provisioning.

Client partition considerations

vSCSI uses a generic device interface, which means regardless of the backing device that is used, the devices appear in the same way in the client partition. When vSCSI is used, no additional device drivers must be installed in the client partition. vSCSI does not support load-balancing across virtual adapters in a client partition.

With VFC, a tape device driver must be installed in the client partition for the disk devices or tape devices. Native AIX MPIO allows load-balancing across virtual adapters. Upgrading these drivers requires special attention when you use SAN devices as boot disks for the operating system.

Worldwide port names

With the redundant configurations that use two VIOSs and two physical FC adapters that are explained in 3.5.3, “Redundancy configurations for virtual Fibre Channel adapters” on page 107, up to eight WWPNs are used. Some SAN storage devices have a limit on the number of WWPNs that they can manage. Therefore, before VFC is deployed, verify that the SAN infrastructure can support the planned number of WWPNs. vSCSI uses only WWPNs of the physical adapters on the VIOS.

Hybrid configurations

vSCSI and VFC can be deployed in hybrid configurations. The next two examples show how both capabilities can be combined in real-world scenarios:

1. In an environment that is constrained by the number of WWPNs, vSCSI can be used to provide access to disk devices.

2. For partitions that require LAN-free backup, access to tape libraries can be provided by using VFC.

To simplify the upgrade of device drivers, VFC can be used to provide access to application data, and vSCSI can be used for access to the operating system boot disks.

3.5.5 Availability planning for virtual storage

This section provides planning details that are required to set up redundancy for virtual storage.

Virtual storage redundancy

VFC or vSCSI redundancy can be achieved by using MPIO and LVM mirroring at the client partition and VIOS level.

Figure 3-10 on page 109 depicts a redundant VFC configuration. Review the description under that figure to understand how to implement highly reliable virtual I/O storage configurations that are based on VFC technology.

Figure 3-11 depicts a vSCSI redundancy advanced setup by using both MPIO and LVM mirroring in the client partition concurrently, with two VIOSs host disks for a client partition. The client is using MPIO to access a SAN disk and LVM mirroring to access two SCSI disks. From the client perspective, the following situations can be handled without causing downtime for the client:

•Either path to the SAN disk can fail, but the client still can access the data on the SAN disk through the other path. No actions must be taken to reintegrate the failed path to the SAN disk after repair if MPIO is configured.

•The failure of a SCSI disk causes stale partitions on AIX for the volume group with the assigned virtual disks, a suspended disk unit on IBM i, or a disk marked as failed on Linux. The client partition still can access the data on the second copy of the mirrored disk. After the failed disk is available again, the stale partitions must be synchronized on the AIX client by using the varyonvg command. The IBM i client automatically resumes mirrored protection, while on the Linux client, the command mdadm and a rescan of the devices are required.

•Either VIOS can be restarted for maintenance. This action results in a temporary simultaneous failure of one path to the SAN disk and stale partitions for the volume group on the SCSI disks, as described before.

Figure 3-11 Virtual SCSI redundancy by using multipathing and mirroring

Considerations for redundancy

Consider the following points:

•If mirroring and multipathing are both configurable in your setup, multipathing is the preferred method for adding disk connection redundancy to the client. Mirroring causes stale partitions on AIX or Linux, and suspended disk units on IBM i, which require synchronization, but multipathing does not. Depending on the RAID level that is used on the SAN disks, the disk space requirements for mirroring can be higher. Mirroring across two storage systems even allows enhancement of the redundancy that is provided in a single storage system by RAID technology.

•Two FC adapters in each VIOS allow for adapter redundancy.

The following sections describe the usage of mirroring for each different AIX, IBM i, and Linux client partition across two VIOSs.

AIX LVM mirroring in the client partition

To provide storage redundancy in the AIX client partition, AIX LVM mirroring can be used for VFC devices or vSCSI devices.

When vSCSI and AIX client partition mirroring is used between two storage subsystems, in certain situations, errors on hdisks that are on a single storage subsystem can cause all hdisks that are connected to a vSCSI adapter to become inaccessible.

To avoid losing access to mirrored data, a best practice is to provide the disks of each mirror copy through a different vSCSI adapter, as shown in Figure 3-12 on page 115.

Figure 3-12 LVM mirroring with two storage subsystems

Volume group mirroring in the AIX client is also a best practice when a logical volume in VIOS is used as a vSCSI device on the client. In this case, the vSCSI devices are associated with different SCSI disks, each one controlled by one of the two VIOS, as shown in Figure 3-12. Mirroring logical volumes in VIOS is not necessary when the data is mirrored in the AIX client.

IBM i mirroring in the client partition

IBM i mirroring in the client partition to enable storage redundancy, ideally across two VIOSs and two separate storage systems, is supported for vSCSI or IBM DS8000® VFC LUNs that are attached by VFC.

vSCSI LUNs are presented by the VIOS as unprotected LUNs of type-model 6B22-050 to the IBM i client so they are eligible for IBM i mirroring. For DS8000 series VFC LUNs, as with DS8000 series native attachment, the LUNs must be created as unprotected models
(IBM OS/400® model A8x) on the DS8000 series to be eligible for IBM i mirroring.

Important: Currently, all vSCSI or FC adapters report on IBM i under the same bus number 255, which allows for IOP-level mirrored protection only. To implement the concept of bus-level mirrored protection for virtual LUNs with larger configurations with more than one virtual IOP per mirror side and not compromise redundancy, consider iteratively adding LUNs from one IOP pair at a time to the auxiliary storage pool by selecting the LUNs from one virtual IOP from each mirror side.

Linux mirroring in the client partition

Mirroring on Linux partitions is implemented with a Linux software RAID function that is provided by an md (Multiple Devices) device driver. The md driver combines devices in one array for performance improvements and redundancy.

An md device with RAID 1 indicates a mirrored device with redundancy. RAID devices on Linux are represented as md0, md1, and so on.

Linux software RAID devices are managed and listed with the mdadm command. You also can list RAID devices with the cat /proc/mdstat command.

All devices in a RAID1 array must have the same size; otherwise, the smallest device space is used, and any extra space on other devices is wasted.

3.5.6 Shared storage pools planning

This section describes the necessary planning details for implementing SSPs in a PowerVM environment.

SSPs are described in 2.3.3, “Shared storage pools” on page 44.

The following sections list the prerequisites for creating SSPs.

Prerequisites

Ensure that the following prerequisites are met:

•VIOS

•HMC

•Minimum 20 GB of available storage space for a storage pool

•Storage requirements of your storage vendor

Configuring the Virtual I/O Server logical partitions

Configure the VIOS LPARs as follows:

•There must be at least one CPU and one physical CPU of entitlement.

•The LPARs must be configured as VIOS LPARs.

•The LPARs must consist of at least 4 GB of memory.

•The LPARs must consist of at least one physical FC adapter.

•The rootvg device for a VIOS LPAR cannot be included in storage pool provisioning.

•The VIOS LPARs in the cluster require access to all the SAN-based physical volumes in the SSP of the cluster.

Scalability limits

The scalability limits of SSP Cluster on VIOS 3.1.3.0 are shown in the following fields:

•Max number of Nodes in cluster: 16

•Max Number of Physical Disks in Pool: 1024

•Max Number of Virtual Disks: 8192

•Max Number of Client LPARs per VIOS: 250 (requires that each VIOS has at least 4 CPUs and 8 GB memory)

•Max Capacity of Physical Disks in Pool: 16 TB

•Min/Max Storage Capacity of Storage Pool: 512 TB

•Max Capacity of a Virtual Disk (LU) in Pool: 4 TB

Configuring client logical partitions

Configure the client partitions with the following characteristics:

•The client LPARs must be configured as AIX or Linux client systems.

•They must have at least 1 GB of minimum memory.

•The associated rootvg device must be installed with the appropriate AIX or Linux system software.

•Each client LPAR must be configured with enough vSCSI adapter connections to map to the virtual server SCSI adapter connections of the required VIOS LPARs.

Network addressing considerations

Uninterrupted network connectivity is required for SSP operations. The network interface that is used for the SSP configuration must be on a highly reliable network, which is not congested.

Ensure that both the forward and reverse lookup for the hostname that is used by the VIOS LPAR for clustering resolves to the same IP address.

Notes:

•The SSP cluster can be created on an IPv6 configuration. Therefore, VIOS LPARs in a cluster can have hostnames that resolve to an IPv6 address. To set up an SSP cluster on an IPv6 network, IPv6 stateless autoconfiguration is suggested. You can have VIOS LPARs that are configured with either an IPv6 static configuration or an IPv6 stateless autoconfiguration. A VIOS that has both IPv6 static configuration and IPv6 stateless autoconfiguration is not supported.

•The hostname of each VIOS LPAR that belongs to the same cluster must resolve to the same IP address family, which is either IPv4 or IPv6.

•To change the hostname of a VIOS LPAR in the cluster, you must remove the partition from the cluster and change the hostname. Later, you can add the partition back to the cluster again with new hostname.

•Commands on VIOS (mktcpip, rmtcpip, chtcpip, hostmap, chdev, and rmdev) are enhanced to configure more than one network interface without disturbing its existing network configuration. In the SSP environment, this feature helps the user to configure multiple network interfaces without causing any harm to the existing SSP setup. In the presence of multiple network interfaces, the primary interface might not be the interface that is used for cluster communication. In such an SSP environment, the user is not restricted from altering the network configuration of other interfaces.

Storage provisioning to Virtual I/O Server partitions

When a cluster is created, you must specify one physical volume for the repository disk and at least one physical volume for the storage pool. The storage pool physical volumes are used to provide storage to the data that is generated by the client partitions. The repository disk is used to perform cluster communication and store the cluster configuration. The maximum client storage capacity matches the total storage capacity of all storage pool physical volumes. The repository disk must have at least 10 GB of available storage space. The physical volumes in the storage pool must have at least 20 GB of available storage space in total.

Use any method that is available for the SAN vendor to create each physical volume with at least 20 GB of available storage space. Map the physical volume to the LPAR’s FC adapter for each VIOS in the cluster. The physical volumes must be mapped only to the VIOS LPARs that are connected to the SSP.

Note: Each of the VIOS LPARs assigns hdisk names to all physical volumes that are available through the FC ports, such as hdisk0 and hdisk1. The VIOS LPAR might select different hdisk numbers for the same volumes to the other VIOS LPAR in the same cluster. For example, the viosA1 VIOS LPAR can have hdisk9 assigned to a specific SAN disk, and the viosA2 VIOS LPAR can have the hdisk3 name assigned to that same disk. For some tasks, the unique device ID (UDID) can be used to distinguish the volumes. Use the chkdev command to obtain the UDID for each disk. It is also possible to rename the devices by using the rendev command.

Set the FC adapters parameters as follows:

chdev -dev fscsi0 -attr dyntrk=yes -perm

chdev -dev fscsi0 -attr fc_err_recov=fast_fail -perm

You do not need to set no_reserve on the repository disk or set it on any of the SSP disks. The Cluster Aware AIX (CAA) layer on the VIOS does this task.

3.6 Network virtualization planning

The following sections describe available network virtualization options and provide planning guidance for them.

3.6.1 Virtual Ethernet planning

Virtual Ethernet technology facilitates IP-based communication between LPARs on the same system by using virtual local area network (VLAN)-capable software switch systems. Using SEA technology, LPARs can communicate with other systems outside the hardware unit without being assigned physical Ethernet slots.

SEA is described in 2.4.2, “Shared Ethernet Adapter” on page 47.

You can create VEAs by using the HMC. You can add, remove, or modify the existing set of VLANs for a VEA that is assigned to an active partition by using the HMC.

Consider using virtual Ethernet on the VIOS in the following situations:

•When the capacity or the bandwidth requirement of the individual LPAR is inconsistent with or is less than the total bandwidth of a physical Ethernet adapter. LPARs that use the full bandwidth or capacity of a physical Ethernet adapter must either use SR-IOV technology or use dedicated Ethernet adapters.

•When you need an Ethernet connection, but no free slot is available where you can install a dedicated adapter.

•When advanced PowerVM virtualization technologies like LPM or partition Simplified Remote Restart (SRR) are used, you may not assign physical I/O devices to the client partitions. In this case, use virtual Ethernet with a SEA on the VIOS.

LPM is described in 2.6, “Partition mobility” on page 54, and SRR is described in 2.7, “Simplified Remote Restart” on page 59.

3.6.2 Virtual LAN planning

In many situations, the physical network topology must account for the physical constraints of the environment, such as rooms, walls, floors, and buildings.

However, VLANs can be independent of the physical topology. Figure 3-13 shows two VLANs (VLAN 1 and 2) that are defined on three switches (Switch A, B, and C). Seven hosts (A-1, A-2, B-1, B-2, B-3, C-1, and C-2) are connected to the three switches.

Figure 3-13 Multiple VLANs example

The physical network topology of the LAN forms a tree, which is typical for a nonredundant LAN:

•Switch A:

– Node A-1

– Node A-2

– Switch B:

• Node B-1

• Node B-2

• Node B-3

– Switch C:

• Node C-1

• Node C-2

Although nodes C-1 and C-2 are physically connected to the same switch C, traffic between two nodes is blocked:

•VLAN 1:

– Node A-1

– Node B-1

– Node B-2

– Node C-1

•VLAN 2:

– Node A-2

– Node B-3

– Node C-2

To enable communication between VLAN 1 and 2, L3 routing or inter-VLAN bridging must be established between the VLANs. The bridging is typically provided by an L3 device, for example, a router or firewall that is plugged into switch A.

Consider the uplinks between the switches. They carry traffic for both VLANs 1 and 2. Thus, there must be only one physical uplink from B to A, not one per VLAN. The switches are not confused and do not mix up the different VLANs' traffic because packets that travel through the trunk ports over the uplink are tagged.

VLANs also have the potential to improve network performance. By splitting up a network into different VLANs, you also split up broadcast domains. Thus, when a node sends a broadcast, only the nodes on the same VLAN are interrupted by receiving the broadcast. The reason is that normally broadcasts are not forwarded by routers. Consider this fact if you implement VLANs and want to use protocols that rely on broadcasting, such as Boot Protocol (BOOTP) or Dynamic Host Configuration Protocol (DHCP) for IP autoconfiguration.

It also is a best practice to use VLANs if gigabit Ethernet jumbo frames are implemented in an environment, where not all nodes or switches can use or are compatible with jumbo frames. Jumbo frames allow for a maximum transmission unit (MTU) size of 9000 instead of Ethernet’s default of 1500. This feature can improve throughput and reduce processor load on the receiving node in a heavy loaded scenario, such as backing up files over the network.

VLANs can provide extra security by allowing an administrator to block packets from one domain to another domain on the same switch. This approach provides more control over what LAN traffic is visible to specific Ethernet ports on the switch. Packet filters and firewalls can be placed between VLANs, and Network Address Translation (NAT) can be implemented between VLANs. VLANs can make the system less vulnerable to attacks.

3.6.3 Virtual switches planning

The PHYP switch is consistent with IEEE 802.1Q. It works on OSI-Layer 2 and supports up to 4094 networks (4094 VLAN IDs).

When a message arrives at a logical LAN switch port from a logical LAN adapter, the hypervisor caches the message’s source MAC address to use as a filter for future messages to the adapter. Then, the hypervisor processes the message depending on whether the port is configured for IEEE VLAN headers. If the port is configured for VLAN headers, the VLAN header is checked against the port’s allowable VLAN list. If the message-specified VLAN is not in the port’s configuration, the message is dropped. After the message passes the VLAN header check, it passes to the destination MAC address for processing.

If the port is not configured for VLAN headers, the hypervisor inserts a 2-byte VLAN header (based on the VLAN number that is configured in the port) into the message. Next, the destination MAC address is processed by searching the table of cached MAC addresses.

If a match for the MAC address is not found and if no trunk adapter is defined for the specified VLAN number, the message is dropped. Otherwise, if a match for the MAC address is not found and if a trunk adapter is defined for the specified VLAN number, the message is passed on to the trunk adapter. If a MAC address match is found, then the associated switch port's allowable VLAN number table is scanned. It looks for a match with the VLAN number that is in the message's VLAN header. If a match is not found, the message is dropped.

Next, the VLAN header configuration of the destination switch port is checked. If the port is configured for VLAN headers, the message is delivered to the destination logical LAN adapters, including any inserted VLAN header. If the port is configured for no VLAN headers, the VLAN header is removed before it is delivered to the destination logical LAN adapter.

Figure 3-14 on page 122 shows a graphical representation of the behavior of the virtual Ethernet when processing packets.

Figure 3-14 Flow chart of virtual Ethernet

Multiple virtual switches

Power servers support multiple virtual switches. By default, a single virtual switch that is named “Ethernet0” is configured. This name can be changed dynamically, and more virtual switches can be created with a name of your choice.

Extra virtual switches can be used to provide an extra layer of security or increase the flexibility of a virtual Ethernet configuration.

For example, to isolate traffic in a DMZ from an internal network without relying entirely on VLAN separation, two virtual switches can be used. The virtual adapters of the systems that participate in the DMZ network are configured to use one virtual switch, and systems that participate in the internal network are configured to use another virtual switch.

Consider the following points when multiple virtual switches are used:

•A VEA can be associated only with a single virtual switch.

•Each virtual switch supports the full range of VLAN IDs (1 - 4094).

•The same VLAN ID can exist in all virtual switches independently of each other.

•Virtual switches can be created and removed dynamically. However, a virtual switch cannot be removed if an active VEA is using it.

•Virtual switch names can be modified dynamically without interruption to connected VEAs.

•With LPM, virtual switch names must match between the source and target systems. The validation phase fails if names do not match.

•All virtual adapters in a SEA must be members of the same virtual switch.

Important: When a SEA is used, the name of the virtual switch is recorded in the configuration of the SEA on the VIOS at creation time. If the virtual switch name is modified, the name change is not reflected in this configuration until the VIOS is restarted, or the SEA device is reconfigured. The rmdev -l command followed by cfgmgr is sufficient to update the configuration. If the configuration is not updated, it can cause a Live Partition Migration validation process to fail because the VIOS still refers to the old name.

3.6.4 Shared Ethernet Adapter planning

SEA is described in 2.4.2, “Shared Ethernet Adapter” on page 47.

A SEA can be used to bridge a physical Ethernet network to a virtual Ethernet network. It also provides the ability for several client partitions to share one physical adapter. Using a SEA, you can connect internal and external VLANs by using a physical adapter. The SEA that is hosted in the VIOS acts as a layer-2 bridge between the internal and external network.

A SEA is a layer-2 network bridge to securely transport network traffic between virtual Ethernet networks and physical network adapters. The SEA service runs in the VIOS. It cannot be run in a general-purpose AIX or Linux partition.

Tip: A Linux partition also can provide a bridging function with the brctl command.

The SEA allows partitions to communicate outside the system without having to dedicate a physical I/O slot and a physical network adapter to a client partition. The SEA has the following characteristics:

•Virtual Ethernet MAC addresses of VEAs are visible to outside systems (by using the arp -a command).

•Unicast, broadcast, and multicast is supported. Therefore, protocols that rely on broadcast or multicast, such as Address Resolution Protocol (ARP), DHCP, BOOTP, and Neighbor Discovery Protocol (NDP) can work across an SEA.

To bridge network traffic between the virtual Ethernet and external networks, the VIOS must be configured with at least one physical Ethernet adapter. One SEA can be shared by multiple VEAs, and each one can support multiple VLANs. A SEA can include up to 16 VEAs on the VIOS that share the physical access.

Tip: An IP address does not need to be configured on a SEA to perform the Ethernet bridging function. It is convenient to configure an IP address on the VIOS because the VIOS can be reached by TCP/IP. For example, you can perform dynamic LPAR operations or enable remote login by configuring an IP address directly on the SEA device, but it can also be defined on an extra VEA in the VIOS that carries the IP address. Doing so leaves the SEA without the IP address, which allows for maintenance on the SEA without losing IP connectivity if SEA failover is configured. Neither approach has a remarkable impact on Ethernet performance.

SEA availability

PowerVM offers a range of configurations to keep the services availability. The following sections present some example scenarios.

Virtual Ethernet redundancy

In a single VIOS configuration, communication to external networks ceases if the VIOS loses connection to the external network. Client partitions experience this disruption if they use the SEA as a means to access the external networks. Communication through the SEA is, for example, suspended when the physical network adapter in the VIOS fails or loses connectivity to the external network due to a switch failure.

Another reason for a failure might be a planned shutdown of the VIOS for maintenance purposes. Communication resumes when the VIOS regains connectivity to the external network. Internal communication between partitions through virtual Ethernet connections continues unaffected while access to the external network is unavailable. Virtual I/O clients do not have to be restarted or otherwise reconfigured to resume communication through the SEA. Similarly, the clients are affected as when unplugging and replugging an uplink of a physical Ethernet switch.

If the temporary failure of communication with external networks is unacceptable, more than a single forwarding instance and some function for failover must be implemented in the VIOS.

Several approaches can be used to achieve HA for shared Ethernet access to external networks. Most commonly used are SEA failover and SEA failover with load sharing, which are described in detail in the following sections.

Other approaches can be used to achieve HA for shared Ethernet access by leveraging configurations that are also used in physical network environments, such as:

•IP Multipathing with DGD or virtual IP addresses (VIPAs) and dynamic routing protocols, such as Open Shortest Path First (OSPF).

•IP Address Takeover (IPAT), with High Availability Cluster Management or Automation Software, such as PowerHA SystemMirror for AIX.

SEA failover

SEA failover offers Ethernet redundancy to the client at the VIOS level. In a SEA failover configuration, two VIOSs have the bridging functions of the SEA. They use a control channel to determine which of them is supplying the Ethernet service to the client. If one SEA loses access to the external network through its physical Ethernet adapter or one VIOS is shut down for maintenance, it automatically fails over to the other VIOS SEA. You also can trigger a manual failover.

The client partition has one VEA that is bridged by two VIOSs. The client partition has no special protocol or software that is configured and uses the VEA as though it was bridged by only one VIOS.

SEA failover supports IEEE 802.1Q VLAN tagging.

As shown in Figure 3-15, both VIOSs attach to the same virtual and physical Ethernet networks and VLANs. Both VEAs of both SEAs have the access external network (in a later HMC version, it is “Use this adapter for Ethernet bridging”) flag enabled and a trunk priority (in a later HMC version, it is “priority”) set.

Figure 3-15 Basic SEA failover configuration

An extra virtual Ethernet connection is required as a separate VLAN between the two VIOSs. It must be attached to the SEA as a control channel, not as a regular member of the SEA. This VLAN serves as a channel for the exchange of keep-alive or heartbeat messages between the two VIOSs, which controls the failover of the bridging function. When control channel adapters are not configured, VLAN ID 4095 in virtual switch is automatically used for a simplified SEA design. This approach allows SEA partners to heartbeat without dedicated VEAs for control channel.

You must select different priorities for the two SEAs by setting all VEAs of each SEA to that priority value. The priority value defines which of the two SEAs is the primary (active) and which one is the backup (standby). The lower the priority value, the higher the priority, so priority=1 means the highest priority.

Support: SEA failover configurations are supported only on dual-VIOS configurations.

Some types of network failures might not trigger a failover of the SEA because keepalive messages are only sent over the control channel. No keepalive messages are sent over other SEA networks, especially not over the external network. The SEA failover feature can be configured to periodically check the reachability of a specific IP address. The SEA periodically pings this IP address to detect some other network failures. This approach is similar to the IP address ping function that can be configured with NIB.

Important: To use this periodic reachability test, the SEAs must have network interfaces, with IP addresses that are associated. These IP addresses must be unique, and you must use different IP addresses on the two SEAs.

Here are the four cases that initiate a SEA failover:

•The standby SEA detects that keepalive messages from the active SEA are no longer received over the control channel.

•The active SEA detects that a loss of the physical link is reported by the physical Ethernet adapter’s device driver.

•On the VIOS with the active SEA, a manual failover can be initiated by setting the active SEA to standby mode.

•The active SEA detects that it cannot ping a specific IP address anymore.

An end of the keepalive messages occurs when the VIOS with the primary SEA is shut down or halted, stops responding, or is deactivated from the HMC.

Important: You might experience up to a 30-second failover delay when SEA failover is used. The behavior depends on the network switch and the spanning tree settings. Any of the following three hints can help in reducing this delay to a minimum:

•For all AIX client partitions, set up DGD on the default route:

a. Set up DGD on the default route:

# route change default -active_dgd

b. Add the command route change default -active_dgd to the /etc/rc.tcpip file to make this change permanent.

c. Set interval between pings of a gateway by DGD to 2 seconds
(default is 5 seconds; setting this parameter to 1 or 2 seconds allows faster recovery):

# no -p -o dgd_ping_time=2

•On the network switch, enable Rapid Spanning-Tree (RSTP) or PortFast while legacy Spanning Tree is on, or disable Spanning Tree.

•On the network switch, set the channel group for your ports to Active if they are currently set to Passive.

Figure 3-16 shows an alternative setup where the IP address of the VIOSs is configured on a separate physical Ethernet adapter.

Figure 3-16 Alternative configuration for SEA failover

Network Interface Backup in the client partition

NIB in the client partition can be used to achieve network redundancy when two Virtual I/O Severs (VIOSs) are used. An Etherchannel with only one primary adapter and one backup adapter is said to be operating in NIB mode.

Figure 3-17 shows an NIB setup for an AIX client partition. The client partition uses two VEAs to create an Etherchannel that consists of one primary adapter and one backup adapter. The interface is defined on the Etherchannel. If the primary adapter becomes unavailable, the NIB switches to the backup adapter.

Figure 3-17 Network redundancy by using two Virtual I/O Servers and NIB

An LA of more than one active VEA is not supported. Only one primary VEA plus one backup VEA are supported. To increase the bandwidth of a VEA, LA must be done on the VIOS.

When NIB is configured in a client partition, each VEA must be configured on a different VLAN.

Important: When NIB is used with VEAs on AIX, you must use the ping-to-address feature to detect network failures. The reason is that there is no hardware link failure for VEAs to trigger a failover to the other adapter.

For IBM i, an equivalent solution to NIB can be implemented by using VIPA failover with a virtual-to-VEA failover script. The same solution can be implemented on Linux VMs by using Ethernet connection bonding.

SEA failover with load sharing

The VIOS provides a load-sharing function to enable the usage of the bandwidth of the backup SEA.

In a SEA failover configuration, the backup SEA is in standby mode, and is used only when the primary SEA fails. The bandwidth of the backup SEA is not used in normal operation.

Figure 3-15 on page 125 shows a basic SEA failover configuration. All network packets of all Virtual I/O clients are bridged by the primary VIOS.

A SEA failover with load sharing effectively uses the backup SEA bandwidth, as shown in Figure 3-18. In this example, network packets of for VLANs 12 and 14 are bridged by VIOS2, where VLANs 11 and 13 are bridged by VIOS1.

Figure 3-18 SEA failover with load-sharing

Prerequisites for SEA failover with load sharing are as follows:

•Both primary and backup VIOSs are at Version 2.2.1.0 or later.

•Two or more trunk adapters are configured for the primary and backup SEA pairs.

•Load-sharing mode must be enabled on both the primary and backup SEA pair.

•The VLAN definitions of the trunk adapters are identical between the primary and backup SEA pair.

Important: You must set the same priority to all trunk adapters under one SEA. The primary and backup priority definitions are set at the SEA level, not at the trunk adapters level.

Using link aggregation on the Virtual I/O Server

LA is a network port aggregation technology that allows several Ethernet adapters to be aggregated together to form a single pseudo-Ethernet adapter. This technology can be used on the VIOS to increase the bandwidth compared to when a single network adapter is used. It also avoids bottlenecks when one network adapter is shared among many client partitions.

The main benefit of an LA is that it has the network bandwidth of all its adapters in a single network presence. If an adapter fails, the packets are automatically sent to the next available adapter without disruption to existing user connections. The adapter is automatically returned to service on the LA when it recovers. Thus, LA also provides some degree of increased availability. A link or adapter failure leads to a performance degradation, but not a disruption.

Depending on the manufacturer, LA is not a complete HA networking solution because all the aggregated links must connect to the same switch. By using a backup adapter, you can add a single extra link to the LA, which is connected to a different Ethernet switch with the same VLAN. This single link is used only as a backup.

As an example for LA, ent0 and ent1 can be aggregated to ent2. The system considers these aggregated adapters as one adapter. Then, interface en2 is configured with an IP address. Therefore, IP is configured as on any other Ethernet adapter. In addition, all adapters in the LA are given the same hardware (MAC) address so that they are treated by remote systems as though they were one adapter.

Two variants of LA are supported:

•Cisco Etherchannel

•IEEE 802.3ad Link Aggregation

Although Etherchannel is a Cisco-specific implementation of adapter aggregation, LA follows the IEEE 802.3ad standard. Table 3-5 shows the main differences between Etherchannel and LA.

Table 3-5 Main differences between Etherchannel and Link Aggregation

Cisco Etherchannel	IEEE 802.3ad Link Aggregation
Cisco-specific.	Open standard.
Requires switch configuration.	Little, if any, configuration of the switch is required to form aggregation. Some initial setup of the switch might be required.
Supports different packet distribution modes.	Supports only standard distribution mode.

Using IEEE 802.3ad Link Aggregation allows for the use of Ethernet switches, which support the IEEE 802.3ad standard but might not support Etherchannel. The benefit of Etherchannel is the support of different packet distribution modes. This support means that it is possible to influence the load-balancing of the aggregated adapters. In the remainder of this publication, we use LA where possible because that is considered a more universally understood term.

Note: When IEEE 802.3ad Link Aggregation is used, ensure that your Ethernet switch hardware supports the IEEE 802.3ad standard. In VIOS, configuring an Ethernet interface to use the 802.3ad mode requires that the Ethernet switch ports also are configured in IEEE 802.3ad mode.

Figure 3-19 shows the aggregation of two plus one adapters to a single pseudo-Ethernet device, including a backup feature.

Figure 3-19 Link aggregation (Etherchannel) on the Virtual I/O Server

The Ethernet adapters ent0 and ent1 are aggregated for bandwidth and must be connected to the same Ethernet switch, and ent2 connects to a different switch. ent2 is used only for backup, for example, if the main Ethernet switch fails. The adapters ent0 and ent1 are exclusively accessible through the pseudo-Ethernet adapter ent5 and its interface en5. You cannot, for example, attach a network interface en0 to ent0 if ent0 is a member of an Etherchannel or LA.

Support: A LA or Etherchannel of VEAs is not supported, but you can use the NIB feature of LA with VEAs.

A LA with only one primary Ethernet adapter and one backup adapter is operating in NIB.

For examples and scenarios of networking configurations for the VIOS LPAR and the client LPARs, see Scenarios: Configuring the Virtual I/O Server, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=server-scenarios

SEA quality of service

The SEA can enforce quality of service (QoS) based on the IEEE 802.1q standard. This section explains how QoS works for SEA and how it can be configured.

SEA QoS provides a means where the VLAN tagged egress traffic is prioritized among seven priority queues. However, QoS comes into play only when contention is present.

Each SEA instance has some threads (currently seven) for multiprocessing. Each thread has nine queues to take care of network jobs. Each queue takes care of jobs at a different priority level. One queue is kept aside and used when QoS is disabled.

Important: QoS works only for tagged packets, that is, all packets that emanate from the VLAN pseudo-device of the virtual I/O client. Therefore, because virtual Ethernet does not tag packets, its network traffic cannot be prioritized. The packets are placed in queue 0, which is the default queue at priority level 1.

Each thread independently follows the same algorithm to determine from which queue to send a packet. A thread sleeps when no packets are available on any of the nine queues.

Note the following points:

•If QoS is enabled, SEA checks the priority value of all tagged packets and puts that packet in the corresponding queue.

•If QoS is not enabled, then regardless of whether the packet is tagged or untagged, SEA ignores the priority value and places all packets in the disabled queue. This approach ensures that the packets that are enqueued while QoS is disabled are not sent out of order when QoS is enabled.

When QoS is enabled, two algorithms are available to schedule jobs: strict mode and loose mode.

Strict mode

In strict mode, all packets from higher priority queues are sent before any packets from a lower priority queue. The SEA examines the highest priority queue for any packets to send out. If any packets are available to send, the SEA sends that packet. If no packets are available to send in a higher priority queue, the SEA checks the next highest priority queue for any packets to send out.

After a packet from the highest priority queue with packets is sent out, the SEA starts the algorithm over again. This approach allows for high priorities to be serviced before the lower priority queues.

Loose mode

It is possible, in strict mode, that lower priority packets are never serviced if higher priorities packets always are present. To address this issue, the loose mode algorithm was devised.

With loose mode, if the number of bytes that is allowed already was sent out from one priority queue, then the SEA checks all lower priorities at least once for packets to send before packets from the higher priority are sent again.

When packets are initially sent out, SEA checks its highest priority queue. It continues to send packets out from the highest priority queue until either the queue is empty, or the cap is reached. After either of those two conditions are met, SEA moves on to service the next priority queue. It continues by using the same algorithm until either of the two conditions are met in that queue. At that point, it moves on to the next priority queue. On a fully saturated network, this process allocates certain percentages of bandwidth to each priority. The caps for each priority are distinct and nonconfigurable.

A cap is placed on each priority level so that after a number of bytes is sent for each priority level, the following level is serviced. This method ensures that all packets are eventually sent. More important traffic is given less bandwidth with this mode than with strict mode. However, the caps in loose mode are such that more bytes are sent for the more important traffic, so it still gets more bandwidth than less important traffic. Set loose mode by using this command:

chdev -dev -attr qos_mode=loose

You can dynamically configure the QoS priority of a VEA of a running LPAR by using the HMC. You can prioritize the LPAR network traffic by specifying the value of IEEE 802.1Q priority level for each VEA.

SEA performance considerations

When virtual networking is used, some performance implications must be considered. Therefore, networking configurations are site-specific. For this reason, no guaranteed rules for performance tuning exist.

The following considerations apply to VEA and SEA:

•The usage of VEA in a partition does not increase its CPU requirement. However, high levels of network traffic within a partition increase CPU utilization. This behavior is not specific to virtual networking configurations.

•The usage of SEA in a VIOS increases the CPU utilization of the partition due to the bridging function of the SEA.

•Keep the threading option enabled (default) on the SEA when the VIOS also is hosting virtual storage (vSCSI or NPIV).

•SEA configurations that use high-speed physical adapters can be demanding on CPU resources within the VIOS. Ensure that you assign sufficient CPU capacity to the VIOS.

•To reduce CPU processing overhead for TCP workloads on the VIOS and client partitions and to better use the wire speed of high-speed Ethernet adapters:

– Enable large send offload (LSO) on the client partition's interface (on the VIOS it is enabled by default).

– Enable large receive offload on the SEA of the VIOS.

Notes:

•For IBM i, large receive offload is supported by IBM i 7.1 TR5 and later.

•Large receive offload by default is disabled on the VIOS’s SEA to eliminate incompatibility with older Linux distributions. Consider enabling large receive on SEA when a supported Linux distribution is used.

•Consider the usage of jumbo frames and increasing the MTU to 9000 bytes if possible when high-speed adapters are used. Jumbo frames enable higher throughput for fewer CPU cycles. However, the external network also must be configured to support the larger frame size.

For more information about tuning network performance throughput, see IBM PowerVM Virtualization Managing and Monitoring, SG24-7590.

SEA network requirements planning

For network planning guidance for SEA network design with high-speed adapters, see Network requirements, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=adapters-network-requirements

The attributes and performance characteristics of various types of Ethernet adapters help you select which adapters to use in your environment. For more information, see Adapter selection, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=adapters-adapter-selection

Processor allocation guidelines exist for both dedicated processor LPARs and shared processor LPARs. Because Ethernet running MTU size of 1500 bytes consumes more processor cycles than Ethernet running jumbo frames (MTU 9000), the guidelines are different for each situation. In general, the processor utilization for large packet workloads on jumbo frames is approximately half that required for MTU 1500. For more information, see Processor allocation, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=adapters-processor-allocation

In general, 512 MB of memory per LPAR is sufficient for most configurations. Enough memory must be allocated for the VIOS data structures. Ethernet adapters and virtual devices use dedicated receive buffers. These buffers are used to store the incoming packets, which are then sent over the outgoing device.

A physical Ethernet adapter typically uses 4 MB for MTU 1500 or 16 MB for MTU 9000 for dedicated receive buffers for gigabit Ethernet. Other Ethernet adapters are similar. Virtual Ethernet typically uses 6 MB for dedicated receive buffers. However, this number can vary based on workload. Each instance of a physical or virtual Ethernet needs memory for this number of buffers. In addition, the system has a mbuf buffer pool per processor that is used if extra buffers are needed. These mbufs typically occupy 40 MB. For more information, see Planning for Shared Ethernet Adapters, found at:

https://www.ibm.com/docs/en/power10/9080-HEX?topic=planning-shared-ethernet-adapters

Enabling largesend and jumbo_frames

IBM AIX allows you to transmit large packets and frames through a network. To send a large data chunk over the network, TCP breaks it down into multiple segments, which requires multiple calls down the stack and results in higher processor utilization on the host processor. You can address the issue by using the TCP LSO option, which allows the AIX TCP layer to build a TCP message that is up to 64 KB long.

For example, without the TCP LSO option, sending 64 KB of data takes 44 calls down the stack by using 1500-byte Ethernet frames. With the TCP LSO option enabled, the TCP option can send up to 64 KB of data to the network interface card (NIC) in a single transmit-receive call. In a real-time scenario, the required number of processor cycles is controlled by the application and depends on the speed of the physical network. With faster networks, the usage of LSO reduces the host processor utilization and increases throughput.

A jumbo frame is an Ethernet frame with a payload greater than the standard MTU of 1,500 bytes and can be as large as 9,000 bytes. It has the potential to reduce processor usage.

TCP LSO and jumbo_frames in AIX are independent of each other. They can be used together or in isolation. For more information about how to enable largesend and jumbo_frames, see Enabling largesend and jumbo_frames in IBM AIX to reduce processor usage, found at:

https://developer.ibm.com/articles/au-aix-largesend-jumboframes/

3.6.5 SR-IOV planning

SR-IOV is described in 2.4.3, “Single-root I/O virtualization” on page 47.

An SR-IOV architecture defines virtual replicas of PCI functions that are known as virtual functions (VFs). An LPAR can connect directly to an SR-IOV adapter VF without going through a virtual intermediary (VI) such as a PHYP or VIOS. This ability provides for a low latency and lower CPU utilization alternative by avoiding a VI.

An SR-IOV-capable adapter might be assigned to an LPAR in dedicated mode or enabled for shared mode. The management console provides an interface to enable SR-IOV shared mode.

An SR-IOV-capable adapter in shared mode is assigned to the hypervisor for management of the adapter and provisioning of adapter resources to LPARs. With the management console, along with the hypervisor, you can manage the adapter's physical Ethernet ports and logical ports (LPs).

To connect an LPAR to an SR-IOV Ethernet adapter VF, create an SR-IOV Ethernet LP for the LPAR. When you create an Ethernet LP for a partition, select the adapter physical Ethernet port to connect to the LPAR and specify the resource requirements for the LP. Each LPAR can have one or more LPs from each SR-IOV adapter in shared mode. The number of LPs for all configured LPARs cannot exceed the adapter LP limit.

Note: An SR-IOV adapter does not support LPM unless the VF is assigned to a SEA or used together with vNIC.

For an SR-IOV adapter in shared mode, the physical port switch mode can be configured in Virtual Ethernet Bridge (VEB) mode, which is the default setting, or Virtual Ethernet Port Aggregator (VEPA) mode. If the switch is configured in VEB mode, the traffic between the LPs is not visible to the external switch. If the switch is configured in VEPA mode, the traffic between LPs must be routed back to the physical port by the external switch. Before you enable the physical port switch in VEPA mode, ensure that the switch that is attached to the physical port is supported and enabled for reflective relay.

When bridging between VEAs and a physical Ethernet adapter, an SR-IOV Ethernet LP might be used as the physical Ethernet adapter to access the outside network. When an LP is configured as the physical Ethernet adapter for bridging, promiscuous permission must be enabled in the LP. For example, if you create an LP for a VIOS LPAR and the intent is to use the LP as the physical adapter for the SEA, you must select the promiscuous permission for the LP.

Configuration requirements

Consider the following configuration requirements when an Ethernet LP is used as the physical Ethernet device for SEA bridging:

•If diverting all network traffic to flow through an external switch is required, consider the following requirements:

– The hypervisor virtual switch must be set to the VEPA switching mode, and the SR-IOV Ethernet adapter physical port switch mode must be set to the VEPA switching mode.

– In addition, the LP is the only LP that is configured for the physical port.

•When you create an Ethernet LP, you can specify a capacity value. The capacity value specifies the required capacity of the LP as a percentage of the capability of the physical port. The capacity value determines the number of resources that are assigned to the LP from the physical port. The assigned resources determine the minimum capability of the LP. Physical port resources that are not used by other LPs might be temporarily used by the LP when the LP exceeds its assigned resources to allow extra capability. System or network limitations can influence the amount of throughput an LP can achieve. The maximum capacity that can be assigned to an LP is 100%. The sum of the capacity values for all the configured LPs on a physical port must be less than or equal to 100%. To minimize the configuration effort while more LPs are added, you might want to reserve physical port capacity for extra LPs.

•When an Ethernet LP is used as a physical adapter for bridging VEAs, the parameter values such as the number of client virtual adapters and expected throughput must be considered when a capacity value is chosen.

•The Ethernet LPs allow the LP to run diagnostics on the adapter and physical port. Select this permission only while the diagnostics are run by using the LP.

Verifying that the server supports single-root I/O virtualization

Before you enable SR-IOV shared mode for an SR-IOV-capable adapter, verify that the server supports the SR-IOV feature by using the HMC.

To verify that the server supports SR-IOV, complete the following steps:

1. In the navigation pane, click Resources.

2. Click All Systems. The All Systems window opens.

3. In the work pane, select the system and select Actions → View System Properties. The Properties window opens.

4. Click Licensed Capabilities. The Licensed Capabilities window lists the features that are supported by the server.

5. In the Licensed Capabilities window, verify the list of features that are displayed:

– If SR-IOV Capable is marked by the check mark icon, which represents the availability of a feature in the HMC icon, the SR-IOV adapter can be configured in the shared mode and shared by multiple LPARs.

– If SR-IOV Capable is marked by the -- icon, which represents the nonavailability of a feature in the HMC icon, the SR-IOV adapter can be configured in the shared mode, but can be used by only one LPAR.

– If SR-IOV Capable is not displayed, the server does not support the SR-IOV feature.

6. Click OK.

Verifying the logical port limit and the owner of the SR-IOV adapter

You can view the LP limit and the owner of the SR-IOV adapter by using the HMC. To view the LP limit and the owner of the SR-IOV adapter, complete the following steps:

1. In the navigation pane, click Resources.

2. Click All Systems. The All Systems window opens.

3. In the work pane, select the system and select Actions → View System Properties. The Properties window opens.

4. Click Licensed Capabilities. The Licensed Capabilities window lists the features that are supported by the server.

5. In the Properties area, click the Processor, Memory, I/O tab. In the Physical I/O Adapters area, the table displays the SR-IOV capable (Logical Port Limit) and the Owner details about the SR-IOV adapter.

•The SR-IOV capable (Logical Port Limit) column displays whether the slot or the adapter is SR-IOV capable, and the maximum number of LPs that this slot or the adapter can support. If the slot or the adapter is SR-IOV-capable but is assigned to a partition, the SR-IOV capable (Logical Port Limit) column indicates that the slot or the adapter is in the dedicated mode.

•The Owner column displays the name of the current owner of the physical I/O. The value of this column can be any of the following values:

– When an SR-IOV adapter is in shared mode, a hypervisor is displayed in this column.

– When an SR-IOV adapter is in dedicated mode, Unassigned is displayed when the adapter is not assigned to any partition as a dedicated physical I/O.

– When an SR-IOV adapter is in dedicated mode, the LPAR name is displayed when the adapter is assigned to any LPAR as a dedicated physical I/O.

3.6.6 SR-IOV with vNIC planning

To configure a vNIC client, an adapter must be configured in SR-IOV shared mode before a vNIC client is configured. In addition, LPs must be available and physical port capacity must be available. So, the total of activated LP capacity values for the physical port must be less than 100%.

Some limits also apply to the number of vNIC adapters for a partition. FW840.10 allows 10 client vNIC adapters per partition.

Partitions that are configured with vNIC adapters are compatible with LPM and SRR technologies.

Some minimum code levels that support vNIC for each operating system are required. For more information about the exact requirements for your target platform, see PowerVM vNIC and vNIC Failover FAQs, found at:

https://community.ibm.com/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey=96088528-4283-8b61-38b0-a39c9ed990c7&forceDialog=0

LA is supported if a vNIC client has a single backing device. A vNIC client with multiple backing devices (vNIC failover) in combination with LA technologies such as IEEE802.3ad/802.1ax (LACP), AIX NIB, or Linux bonding active backup mode is not supported. SR-IOV LA limitations apply to client vNIC adapters.

vNIC failover considerations

vNIC failover allows a vNIC client to be configured with up to six backing devices. One backing device is active while the others are inactive standby devices. If the hypervisor detects that the active backing device is no longer operational, a failover is initiated to the most favored (lowest Failover Priority value) operational backing device.

Some minimum code levels are required for vNIC failover. In general, HMC, system firmware, and operating systems with support for Power10 processor-based servers include support for vNIC failover.

Backing devices can be dynamically added and removed to a vNIC client.

When backing devices are designed, consider combining separate SR-IOV adapters with different VIOS for the same partitions for redundancy purposes. Example backing devices for a single client vNIC adapter are as follows:

•vNIC Backing Device 1: SR-IOV Adapter 1 that uses VIOS1

•vNIC Backing Device 2: SR-IOV Adapter 2 that uses VIOS2

A comparison of network virtualization technologies can be found in Table 3-6.

Table 3-6 A comparison of network virtualization technologies

Technology	LPM support	QoS	Direct-access performance	Redundancy options	Server-side redundancy	Requires VIOS
SR-IOV	No¹	Yes	Yes	Yes²	No	No
vNIC	Yes	Yes	No³	Yesb	vNIC Failover	Yes
SEA or virtual Ethernet	Yes	Yes⁴	No	Yes	SEA Failover	Yes
Hybrid Network Virtualization (HNV)	Yes	Yes	Yes	Yes	No	No⁵

¹ SR-IOV optionally can be used as the backing device of SEA in VIOS to use higher-level virtualization functions like LPM. However, the client partition does not receive the performance or QoS benefit.

² Some limitations apply. For more information, see FAQs on LA, found at https://community.ibm.com/community/user/power/viewdocument/sr-iov-vnic-and-hnv-information.

³ Generally, provides better performance and requires fewer system resources compared to SEA or virtual Ethernet.

⁴ SR-IOV has a superior QoS capability compared to SEA.

⁵ VIOS is not required during regular operations. However, VIOS is used to host the backup (vNIC) adapter during LPM operations.

3.7 Further considerations

For more information about best practices, recommendations, and special considerations, see the following resources:

•IBM Power Virtualization Best Practices Guide, found at:

https://www.ibm.com/downloads/cas/JVGZA8RW

•Power10 Performance Quick Start Guides, found at:

https://www.ibm.com/support/pages/system/files/inline-files/Power10_Performance_Quick_Start_Guides.pdf

•Power10 Performance Best Practices - A brief checklist, found at:

https://www.ibm.com/support/pages/system/files/inline-files/power10_performance_best_practices.pdfD

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3. Planning for IBM PowerVM

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 3. Planning for IBM PowerVM