CHAPTER 13

image

Optimizing Linux for Oracle Databases

Companies are migrating away from large IBM, HP, or Sun hardware to commodity hardware running Linux. With the readily available 18-core Intel chips at affordable prices, companies can run massive Linux compute nodes at a fraction of the cost of enterprise UNIX servers on IBM, HP, and Sun. Linux is proven to have rock-solid OSs that provide extreme reliability similar to its counterpart UNIX OSs such as IBM AIX, HP/UX, and Sun Solaris. With proper optimizations, Linux can scale like its counterparts and run Oracle databases even faster. Think about a single compute node on an Exadata X5-2 configuration. It comes with 36 CPU cores and up to 768GB of physical memory. If you run a 2-node RAC for a Quarter Rack, you can harness 72 cores of processing power and approximately 1.5TB of memory.

Optimizing the Linux OS is an integral part of the Oracle database server. Having a well-tuned Linux engine is crucial to database performance. Companies incorporate best practices and optimizations into a reference architecture for Linux builds. You will focus on creating a lean and secure Linux infrastructure for Oracle that will perform consistently with each build.

Laying out a well-designed Linux architecture for an Oracle database is a critical factor for success. This chapter will lay out the optimal foundation for Linux reference architecture for an Oracle database. You will learn how to optimize Linux from a memory perspective, tweak kernel parameters for additional throughput, choose the right I/O scheduler, and increase network bandwidth for Oracle.

13-1. Minimalizing the Linux Environment for Performance and Security

Problem

You want to install only the required packages that are needed to install and configure Oracle Database 12c.

Solution

During the installation of Oracle or Red Hat Linux, you want to choose the minimal installation option. You can select the Minimal Install option from the top-left corner of the Software Selection screen (see Figure 13-1).

9781484212554_Fig13-01.jpg

Figure 13-1. Software selection for Linux installation

With the Minimal Install method, you get only the basic functionality of the Linux OS. You will follow up after the minimal Linux installation and leverage the Oracle 12c RDBMS Server PreInstall RPM to continue to the next level. The Oracle RDBMS PreInstall RPM installs only the required set of RPMS needed to install Oracle. A majority of the Linux database servers don’t need much more RPMS beyond what is installed by the Oracle RDBMS PreInstall RPM.

Image Note  For information on leveraging the Oracle 12c RDBMS Server PreInstall RPM for automating Linux server configurations for Oracle databases, please review Chapter 11.

You can then build additional infrastructure—such as hugepages, I/O optimizations, network optimizations, and kernel optimizations—to take the Linux optimizations to another level.

How It Works

You want to install just the minimum packages to effectively limit the amount of miscellaneous processes running on the database server and to run only database-related processes on the database server. Lots of customers make the mistake of installing the full GUI desktop on the Linux server. Although this method is great for educational purposes, you should never install the GUI desktop on a production database server. From a performance perspective, minimal installation helps by keeping the server “lean and mean.” In addition, the less you have on the server, the less you have to secure. When it comes to patching the Linux server, the less you have installed, the fewer times you have to patch the server with updates.

A leaner server with less software usually runs faster than a server with all the software installed on it. For example, X-Server or VNC Server is not installed with the minimal installation, and you don’t see X-Server processes running on a database server. At the same time, if you execute the chkconfig --list command to review all the services with run level settings, a majority of the non-Oracle related services should be turned off. You should also disable services such as http on the database server. A good rule to follow for optimal database performance is to separate out database-server processes from application-server processes.

On database servers, you should run in run level 3 (Multiuser mode with networking) and never run a database server in run level 5, which is GUI Desktop mode (the same as run level 3 + display manager). The default run level is controlled in the /etc/inittab file. In general, the fewer processes you have running in the background, the better off you will be. The parameter to control the run level looks like this:

id:3:initdefault:

13-2. Configuring Hugepages

Problem

You want to configure hugepages to save memory and increase the performance of Oracle running on the Linux server.

Solution

Setting up hugePages requires one kernel parameter change and modifications to two entries to the /etc/securities/limits.conf file. The kernel parameter that has to be modified is vm.nr_hugepages in the /etc/sysctl.conf file.

The /etc/security/limits.conf file has to be adjusted to account for soft and hard limits for Oracle, respectively, to increase the max locked memory limit:

oracle          soft    memlock        50331648
oracle          hard    memlock        50331648

After the kernel parameters are set, it is highly recommended that you reboot the server.

How It Works

Leveraging hugepages provides great advantages on the Linux server. You will recognize performance through increased Translation Lookaside Buffer (TLB) hits. Hugepages ensure that System Global Area (SGA) components are locked in memory and are never swapped out. Hugepages also reduce the overhead associated with bookkeeping work for the kernel to manage kernel pages.

Suppose that the default kernel page is 4KB. By increasing the kernel page to 2M, you effectively reduce the number of kernel pages that you have to manage by a factor of 500x. The kernel parameters are typically set in granules of 2MB. For example, if you are allocating 48GB as your hugepage size, you have to set your kernel parameter to 24576 (48*1024/2) in the /etc/sysctl.conf file.

There is no such thing as a free meal: you can’t use the automatic memory management (AMM) feature of Oracle when you implement hugepages. More specifically, the MEMORY_TARGET initialization parameter of the Oracle database can’t be used. The Program Global Area (PGA), which is also part of AMM, can’t be automatically changed, so when hugepages are used, the SGA_TARGET and SGA_MAX_SIZE parameters have to be used. The good news is that the PGA_AGGREGATE_TARGET is a dynamic parameter and can be adjusted as needed. You can view the dynamic view associated with the PGA and set this parameter intelligently, as suggested by Oracle’s advisor view: V$PGA_TARGET_ADVICE.

Settings for hugepages should be set based on the amount of physical memory and total amount of SGA of each database on the Linux server. As a standard, you can determine default settings for hugepages for servers, depending on the amount of physical memory on the server. The default settings will just be a standard for new server builds as a starting point. The amount of memory to allocate for hugepages varies with different databases and applications. Some applications may need more PGA memory allocation. For example, you can specify the following set of guidelines for your company:

  • =<20GB of physical memory and below; 1/2 of the physical memory will be configured for hugepages
  • >20GB of physical memory, 3/4 of the physical memory will be configured for hugepages

Automating Hugepages Server Configuration

Here’s a handy script to add to your arsenal, which will one day save you tons of valuable time. Whether you are setting hugepages on a new server or an existing server with multiple databases, you can set up hugepages for the server with a single shell script. This script, set_hugepages.ksh, accepts a single command-line option for the amount of memory in MB that you want to allocate for hugepages. If you don’t specify the amount of memory to allocate for hugepages, the script will determine the total physical memory from the server and take a majority percentage of the memory specified by the DEFAULT_HP_PERCENTAGE variable. The current script is set to 60% of the physical memory.

The goal of the following script is to determine the values for the vm.nr_hugepages kernel parameter and the soft and hard memlock parameters for the security limits configuration files:

# cat set_hugepages.ksh
#!/bin/ksh
export HP_SIZE_MB=$1
DEFAULT_HP_PERCENTAGE=60

if [ "$HP_SIZE_MB" = “” ]; then
  echo "HugePage Value not specified in the command-line options."
  echo "We will set the HugePages value based on $DEFAULT_HP_PERCENTAGE % of physical memory"
  SERVER_MEMORY_KB=$(grep ^MemTotal /proc/meminfo |sed -e ’s/[^0-9]*//g’ -e ’s/ //g’)
  let SERVER_MEMORY_MB=$SERVER_MEMORY_KB/1024
  echo "Server Memory: $SERVER_MEMORY_MB"

  let DEFAULT_HP_SIZE=$SERVER_MEMORY_MB*$DEFAULT_HP_PERCENTAGE/100
  echo "Default HugePage size based on $DEFAULT_HP_PERCENTAGE %: $DEFAULT_HP_SIZE"

  export HP_SIZE_MB=$DEFAULT_HP_SIZE
fi

LINUX_VER=$(cat /etc/redhat-release |sed -e ’s/[^0-9]*//g’ -e ’s/ //g’)
echo "Linux Version is: $LINUX_VER"
echo “”

echo "Checking to see if HugePages is already set"
grep -i vm.nr_hugepages /etc/sysctl.conf
RC=$?
echo “”

function calc_hp {
let HP_KB=$HP_SIZE_MB*1024
echo "HugePages KB = $HP_KB"
let HP_PRESETTING=$HP_KB/2048
let HP_SETTING=$HP_PRESETTING+6
echo "HP Settings:  $HP_PRESETTING $HP_SETTING"
echo "New HugePage Setting for /etc/sysctl.conf"
echo "vm.nr_hugepages=$HP_SETTING"
}
calc_hp

export TMP_SYSCTL=/tmp/sysctl.conf.tmp
if [ "$RC" -eq 1 ]; then
  echo "Return Code for HugePages: $RC"
  echo "HugePages is not set!"
  echo "# -- HugePage Setting for Oracle Databases -- #" >>/etc/sysctl.conf
  echo "vm.nr_hugepages=$HP_SETTING" >>/etc/sysctl.conf

elif [ "$RC" -eq 0 ]; then
  echo "HugePages is set..."
  cp /etc/sysctl.conf /tmp/sysctl.conf.$$
  cat /etc/sysctl.conf |grep -v "vm.nr_hugepages" >$TMP_SYSCTL
  echo "vm.nr_hugepages=$HP_SETTING" >>$TMP_SYSCTL
  cp $TMP_SYSCTL /etc/sysctl.conf
fi

let MEMLOCK_VALUE=$HP_SETTING*2048
cat /etc/security/limits.conf |grep -v ^# |grep -i memlock |grep -v grep 2>/dev/null
export MEMLOCK_RC=$?
if [ "$MEMLOCK_RC" -eq 0 ]; then
  export SECURITY_LIMITS_FILE=/etc/security/limits.conf
else
  export SECURITY_LIMITS_FILE=$(grep -il memlock /etc/security/limits.d/*.conf)
fi

# -- MEMLOCK has never been set so we need to find the limits.conf file
cat /etc/security/limits.conf |egrep -v "^#" |grep nproc
export NPROC_RC=$?
if [ "$SECURITY_LIMITS_FILE" = “” ]; then
  if [ "$NPROC_RC" -eq 0 ]; then
    export SECURITY_LIMITS_FILE=/etc/security/limits.conf
  else
    # -- We need to find the limits file for RHEL 6 directory structure
    export SECURITY_LIMITS_FILE=$(grep -i nproc /etc/security/limits.d/* |awk -F ":" {’print $1’} |tail -1)
    [ "$SECURITY_LIMITS_FILE" = “” ] && export SECURITY_LIMITS_FILE=/etc/security/limits.d/90-memlock.conf
  fi
fi

  export TMP_LIMITS_FILE=/tmp/limits.conf.tmp
  #echo "Security Limits File:  $SECURITY_LIMITS_FILE"
  cp $SECURITY_LIMITS_FILE /tmp/limits.conf.$$
  cat $SECURITY_LIMITS_FILE |egrep -v "memlock" >$TMP_LIMITS_FILE

  echo “”
  echo "# -- HugePage Setting for Oracle Databases -- #" >>$TMP_LIMITS_FILE
  echo "# -- Here’s the changes that were made to the $SECURITY_LIMITS_FILE"
  echo "oracle soft memlock $MEMLOCK_VALUE" >>$TMP_LIMITS_FILE
  echo "oracle hard memlock $MEMLOCK_VALUE" >>$TMP_LIMITS_FILE
  cp $TMP_LIMITS_FILE $SECURITY_LIMITS_FILE
  grep -i memlock $SECURITY_LIMITS_FILE

echo “”
echo "# ------------------------------------------------------- #"
echo "# Your system has been set for hugepages.  "
echo "# Please reboot your server to see the changes!"
echo “”

Before you make changes to the current kernel parameters and configuration file, back up the files by copying them to the /tmp directory. Also create a temporary file in the /tmp directory with the .tmp extension to massage before overlaying the original files. You can download this script from the DBAExpert.com web site. This script is evolving and will be improved upon with each release of Oracle and Red Hat Linux.

Executing the previous set_hugepages.ksh script produces the following output:

# ./set_hugepages.ksh
HugePage Value not specified in the command-line options.
We will set the HugePages value based on 60 % of physical memory
Server Memory: 3587
Default HugePage size based on 60 %: 2152
Linux Version is: 66

Checking to see if HugePages is already set
vm.nr_hugepages=15006

HugePages KB = 2203648
HP Settings:  1076 1082
New HugePage Setting for /etc/sysctl.conf
vm.nr_hugepages=1082
HugePages is set...

# -- Here’s the changes that were made to the /etc/security/limits.d/oracle-rdbms-server-12cR1-preinstall.conf
oracle soft memlock 2215936
oracle hard memlock 2215936

# ------------------------------------------------------- #
# Your system has been set for hugepages.
# Please reboot your server to see the changes!

To specify an exact value for hugepages, pass in a parameter into the script with a value in megabytes:

# ./set_hugepages.ksh 64000

You must reboot the server to recognize hugepage changes. If you log in as the Oracle Linux account owner, you can confirm the hugepage settings by typing the ulimit command with the –l option. The –l option reports the maximum size that can be locked into memory:

# su - oracle
[oracle@dal66a ~]$ ulimit -l
30732288

Notice that the value of ulimit –l matches the soft and hard memlock values from the security limits configuration files.

Computing the Value of Huge Pages on an Existing Linux Server

Oracle Support provides a nifty shell script that calculates the total hugepages for a server that has one or more Oracle databases running. In a nutshell, the script will compute recommended values for HugePages/HugeTLB configuration for the current database server by rolling up all the shared memory segments. Please review the shell script to calculate the recommended for Linux Huge Pages/HugeTLB Configuration (Doc ID 401749.1).

You can copy and paste the script called hugepages_settings.sh and execute on a server that has existing databases already running on it. The script is dependent on the fact that the database is running, and the database has allocated shared memory that is visible with the ipcs –m command. The script also assumes that the database is not running in AMM mode. Here’s a sample output of the hugepages_settings.sh script:

$ ./hugepages_setting.sh

This script is provided by Doc ID 401749.1 from My Oracle Support
(http://support.oracle.com) where it is intended to compute values for
the recommended HugePages/HugeTLB configuration for the current shared
memory segments. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The ’pga_aggregate_target’ is outside the SGA and
   you should accommodate this while calculating SGA size.
 * In case you changes the DB SGA size,
   as the new SGA will not fit in the previous HugePages configuration,
   it had better disable the whole HugePages,
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m

Press Enter to proceed...

Recommended setting: vm.nr_hugepages = 603

Disabling Transparent Huge Pages

Red Hat and Oracle Linux 6 have a new feature called Transparent Huge Pages (THP), which is enabled by default. THP was intended to simplify configuration of hugepages for the SAs because manual configuration of hugepages can be difficult for SAs new to Oracle. Hugepages are assigned at boot time and are usually used for highly static memory allocation of Oracle databases. THP can be dynamically set at runtime by the khugepaged thread in the kernel.

For Oracle databases, Oracle recommends disabling THP on database servers by executing the following command:

# echo never > /sys/kernel/mm/transparent_hugepage/enabled

Please refer to the Oracle document “ALERT: Disable Transparent HugePages on SLES11, RHEL6, OL6 and UEK2 Kernels” (Doc ID 1557478.1). To permanently disable transparent huge pages, add transparent_hugepage=never to the kernel boot line in /etc/grub.conf and reboot the server:

# cat /etc/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/mapper/vg_rac01-lv_root
#          initrd /initrd-[generic-]version.img
#boot=/dev/sda
default=2
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server Red Hat Compatible Kernel (2.6.32-358.23.2.el6.x86_64.debug)
      root (hd0,0)
      kernel /vmlinuz-2.6.32-358.23.2.el6.x86_64.debug ro root=/dev/mapper/vg_rac01-lv_root rd_NO_LUKS rd_LVM_LV=vg_rac01/lv_root LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16  rd_LVM_LV=vg_rac01/lv_swap  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet numa=off transparent_hugepage=never crashkernel=auto
      initrd /initramfs-2.6.32-358.23.2.el6.x86_64.debug.img
title Oracle Linux Server Unbreakable Enterprise Kernel (3.8.13-16.2.1.el6uek.x86_64)
      root (hd0,0)
      kernel /vmlinuz-3.8.13-16.2.1.el6uek.x86_64 ro root=/dev/mapper/vg_rac01-lv_root rd_NO_LUKS rd_LVM_LV=vg_rac01/lv_root LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_rac01/lv_swap  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet numa=off transparent_hugepage=never
      initrd /initramfs-3.8.13-16.2.1.el6uek.x86_64.img
title Oracle Linux Server Red Hat Compatible Kernel (2.6.32-431.el6.x86_64)
      root (hd0,0)
      kernel /vmlinuz-2.6.32-431.el6.x86_64 ro root=/dev/mapper/vg_rac01-lv_root rd_NO_LUKS rd_LVM_LV=vg_rac01/lv_root LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_rac01/lv_swap  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet numa=off transparent_hugepage=never
      initrd /initramfs-2.6.32-431.el6.x86_64.img

Optionally, you can add the following lines in the /etc/rc.local file to disable THP at boot time.

[ -f /sys/kernel/mm/transparent_hugepage/enabled ] && echo never > /sys/kernel/mm/transparent_hugepage/enabled
[ -f /sys/kernel/mm/transparent_hugepage/defrag ] &&  echo never > /sys/kernel/mm/transparent_hugepage/defrag

13-3. Enabling Jumbo Frames

Problem

You want to increase the packet size on your private network transmission unit to increase network performance. You also want to enable jumbo frames on the dedicated network for NFS or iSCSI storage performance optimizations.

Solution

To enable jumbo frames on the Linux server, as the root user, execute the ifconfig command to set new MTU to 9000:

# ifconfig eth0 mtu 9000

Modify the network interface configuration file specific to the network interface to make the changes permanent. The following example demonstrates that by adding the parameter line MTU=9000 at the end of the network interface file, you are permanently setting jumbo frames:

# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
IPADDR=10.0.0.100
NETMASK=255.255.255.0
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
VLAN=yes
MTU=9000

After you make changes to the network interface configuration file, you have to restart the network interface, eth0 in the example, by executing the ifdown and ifup commands:

# ifdown eth0
# ifup eth0

Alternatively, you can restart all the network interfaces with the service network restart command. Typically, you must coordinate the jumbo frame configuration with your network engineers. Jumbo switch support must be enabled on the switches, and all the switches must be enabled to support jumbo frames if you have multiple switches between the source and target database servers. The ping command will test end-to-end connectivity between the source and target hostnames or IPs.

Once jumbo frames are enabled for the OS and network switches, perform a simple ping test with the -M, do, and -s options to test jumbo frame connectivity from end to end:

$ ping -M do -s 8972 -c 2 dalrac01a-priv
$ ping -M do -s 8972 -c 2 dalrac01b-priv
$ ping -M do -s 8972 -c 2 dalrac02a-priv
$ ping -M do -s 8972 -c 2 dalrac02b-priv
PING dalrac01a (10.17.33.31) 8972(9000) bytes of data.
8980 bytes from dalrac01a-priv (10.17.33.31): icmp_seq=1 ttl=64 time=0.017 ms
8980 bytes from dalrac01a-priv (10.17.33.31): icmp_seq=2 ttl=64 time=0.018 ms

The -s option specifies the packet size. You can specify a packet size of only 8,972 bytes to the network. The remaining 28 bytes make up the header information: 20 bytes of IP header information and 8 bytes of ICMP header data. Sending a packet size larger than 8,972 bytes results in an error for the ping command. If you don’t specify the packet size with the –s option, you will send the default packet size for the ping command, which happens to be 56 bytes without the ICMP header data or the IP header bytes. The example also specified the -c option to send just two iterations of the ping command.

The general recommendation is to enable jumbo frames for the private network interfaces for RAC configurations. When you are dealing with NFS or iSCSI disks, setting jumbo frames on the network associated with the dedicated network for NFS or iSCSI traffic significantly improves performance.

You can take advantage of the cluvfy command with the healthcheck option to verify whether jumbo frames are configured on the RAC database server. As you check for best practices, which includes the setup of jumbo frames, you can also leverage the healthcheck option with the –bestpractice option.

Here’s a short snippet of the syntax and high-level output to the cluvfy command with the healthcheck and –bestpractice options:

$ cluvfy comp healthcheck -collect cluster -bestpractice -deviations
Verifying OS Best Practice
Verifying Hardware Clock synchronization at shutdown ...warning
Verifying Clusterware Best Practice
Verifying Ethernet Jumbo Frames ...warning
Verifying disk free space for Oracle Clusterware home "/u01/app/12.1.0/grid"passed
...
...
...

How It Works

In a nutshell, jumbo frames are Ethernet frames with more than 1,500 bytes of payload Maximum Transmission Unit (MTU). With jumbo frames, you can effectively increase the size of the Ethernet frame to be larger than the IEEE 802 specification for an MTU of 1,500 bytes to a value of up to 9,000 bytes. When an application such as Cluster Interconnect or the Parallel Query Engine sends a message greater than 1,500 bytes, the message is fragmented into 1,500-byte or smaller frames from one end-point to another. By setting the MTU size to 9,000 bytes, you can improve network throughput performance; because the packet size is larger, fewer packets are sent across the network for the application data, resulting in faster transfers and less CPU overhead on both the transmitting and receiving servers.

Jumbo frames should be enabled as part of your standard for Oracle RAC Interconnect. As mentioned earlier, the configuration of jumbo frames has to be enabled from end to end. Not doing it correctly results in suboptimal performance. Jumbo frames are not just used for RAC; other use cases include running Oracle Database file on NFS or even iSCSI protocols. Increasing the jumbo frames for databases running on NFS or iSCSI significantly improve performance. If your RMAN backups go to a distributed NFS, you should definitely run jumbo frames from the database server and the network attached storage (NAS) server.

Make sure that you are running Oracle’s direct NFS to maximize all the performance and configuration benefits. For RAC Interconnect traffic, network interfaces correctly configured with jumbo frames improves performance by reducing the TCP and UDP overhead that occurs when large messages have to be broken up into the smaller frames of standard Ethernet. Properly setting jumbo frames is especially important if you are working with 10GB (gigE) or higher network interfaces.

13-4. Determining and Implementing the Right I/O Scheduler

Problem

You want to maximize I/O potential for Oracle databases by choosing the right I/O scheduler.

Solution

Starting from Red Hat/Oracle Linux 5, you can dynamically modify the I/O scheduler for block devices without a server reboot. For best performance, Oracle recommends the deadline scheduler for devices that are used by heavy I/O intensive requests. The deadline scheduler is highly recommended for Oracle databases to achieve higher throughput and lower latency. For flash disks or solid-state drives (SSDs), set the I/O scheduler to noop. You can dynamically set the I/O scheduler to deadline by manipulating the contents of the /sys/block/BLOCKDEVICE_NAME/queue/scheduler directory. As the root or privileged user, execute the following command for each of the disks:

# echo deadline > /sys/block/BLOCKDEVICE1/queue/scheduler
# echo deadline > /sys/block/BLOCKDEVICE2/queue/scheduler

Setting the I/O scheduler with the preceding syntax for block devices will not persist after a server reboot. Changes can be made permanent by setting the boot parameter elevator=deadline or elevator=noop for flash disks and SSDs to the active kernel in /etc/grub/grub.conf. Here’s a sample grub.conf file that demonstrates how the configuration can be set at server startup:

default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-238.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-238.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet elevator=deadline
        initrd /initrd-2.6.18-238.el5.img

If you are running the Oracle Unbreakable Enterprise Kernel (UEK), the settings for /etc/grub.conf for the I/O deadline scheduler are automatically set as default settings.

The following one-liner script can be leveraged to validate that disks are correctly configured with the right I/O scheduler. Notice from the output that the block devices are set with the deadline scheduler:

$ find /sys/block/*/queue -name scheduler -exec sh -c ’echo -n "$0 : "; cat $0’ {} ; |tail -10
/sys/block/sda/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sdb/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sdc/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sdd/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sde/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sdf/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sdg/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sdh/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sdi/queue/scheduler : noop anticipatory [deadline] cfq
/sys/block/sr0/queue/scheduler : noop anticipatory deadline [cfq]

How It Works

With Red Hat and Oracle Linux, different I/O schedulers are available with options suited to perform better under heavy Oracle database workload conditions. The default I/O scheduler shipped with Red Hat is the completely fair queuing (CFQ) scheduler, which provides a good compromise between latency and throughput. The default I/O scheduler that comes with the Oracle UEK is the deadline scheduler. The noop I/O scheduler is ideal for SSDs or flash-based systems in which the read/write head has been proven to not affect application performance. The anticipatory I/O scheduler is similar to the deadline scheduler. Although it is heuristic and can improve performance, it can also decrease performance. With the deadline I/O scheduler, hard limits are put on latency, and it guarantees a start service time for a request.

For databases running on virtualized environments such as VMware, the general recommendation is to set the I/O scheduler to noop. In a virtualized infrastructure, the hypervisor also performs I/O scheduling and optimizations. You don’t want both the hypervisor and the guest OS (VM) to perform I/O scheduling; the guest OS should relinquish I/O scheduling to the hypervisor.

13-5. Setting Pertinent Kernel Parameters for Oracle Databases

Problem

You want to optimize pertinent Linux kernel parameters to effectively run the Oracle database(s).

Solution

This recipe reviews pertinent kernel parameters relevant to Oracle databases in the /etc/sysctl.conf file. In your environment, if the suggested kernel parameter value is higher than the value listed below, you should not lower the value. Range values (such as net.ipv4.ip_local_port_range or /proc/sys/net/ipv4/ip_local_port_range) should match.

The /proc/sys/net/ipv4/ip_local_port_range value defines the local port range that is used by TCP and UDP traffic. You have to set this parameter with two numbers: the first number represents the first local port allowed for TCP and UDP traffic on the server, and the second represents the last local port number. In previous releases of Oracle, the recommended values for net.ipv4.ip_local_port_range were 1024 and 65500.

Parameters such as SHMMAX should be adjusted according to the amount of physical memory on the database server. For example, you should usually tell customers that a good number to start is 1/2 or 2/3 or more of physical memory, depending on how much physical memory they have, how much PGA they have, and the number of dedicated server processes they expect. SHMALL is also derived based on physical RAM size/page size. The key thing to mention is that the value of SHMMAX is set in bytes, but the value of SHMMALL is set in pages. To determine the page size for a system, execute getconf, as shown here:

# getconf PAGE_SIZE
4096

In the following example, you opt to allocate approximately 66% of the physical memory (approximately 170GB) for SHMALL on a server with 256GB of RAM. You can use the following equation to derive the SHMALL value:

1024 * 1024 * 1024 * 170 /4096
kernel.shmall=44564480

Here’s a comprehensive list of the kernel parameters modified by the Oracle-rdbms-server-12cR1-preinstall RPM:

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

# oracle-rdbms-server-12cR1-preinstall setting for fs.file-max is 6815744
fs.file-max = 6815744

# oracle-rdbms-server-12cR1-preinstall setting for kernel.sem is ’250 32000 100 128’
kernel.sem = 250 32000 100 128

# oracle-rdbms-server-12cR1-preinstall setting for kernel.shmmni is 4096
kernel.shmmni = 4096

# oracle-rdbms-server-12cR1-preinstall setting for kernel.shmall is 1073741824 on x86_64

# oracle-rdbms-server-12cR1-preinstall setting for kernel.shmmax is 4398046511104 on x86_64
kernel.shmmax = 4398046511104

# oracle-rdbms-server-12cR1-preinstall setting for net.core.rmem_default is 262144
net.core.rmem_default = 262144

# oracle-rdbms-server-12cR1-preinstall setting for net.core.rmem_max is 4194304
net.core.rmem_max = 4194304

# oracle-rdbms-server-12cR1-preinstall setting for net.core.wmem_default is 262144
net.core.wmem_default = 262144

# oracle-rdbms-server-12cR1-preinstall setting for net.core.wmem_max is 1048576
net.core.wmem_max = 1048576

# oracle-rdbms-server-12cR1-preinstall setting for fs.aio-max-nr is 1048576
fs.aio-max-nr = 1048576

# oracle-rdbms-server-12cR1-preinstall setting for net.ipv4.ip_local_port_range is 9000 65500
net.ipv4.ip_local_port_range = 9000 65500

You have to adjust the settings for SHMALL and SHMMAX according to SGA requirements. Other kernel settings set by the Oracle-rdbms-server-12cR1-preinstall RPM should be adequate for a majority of the database servers. If you have requirements for large amounts of concurrent dedicated sessions, you may have to adjust the settings for semmsl and semmns.

How It Works

You will focus on the list of parameters that is mentioned in the solution of this recipe. Table 13-1 provides all the relevant kernel parameters for Oracle databases:

Table 13-1. Pertinent Kernel Parameters for Oracle Databases

Kernel Parameter

Description

kernel.shmall

Represents the maximum total shared memory in 4Kb pages.

fs.file-max

Represents the maximum number of open files.

kernel.sem

Has four parameters (in order): SEMMSL, SEMMNS, SEMOPM, and SEMMNI:

    •  semmsl specifies the maximum number of semaphores per set.

    •  semmns specifies the maximum number of semaphores.

    •  semopm specifies the maximum operations per semop call.

    •  semmni specifies the maximum number of semaphore sets.

kernel.shmmni

Represents the maximum number of shared memory segments.

kernel.shmmax

Represents the maximum size of a single shared memory segment. We recommend a starting value of 1/2 or 2/3 of the size of physical memory (in bytes) and to go as high as 90% on some of the larger enterprise servers.

net.core.rmem_default

Represents the default OS receive buffer size.

net.core.rmem_max

Represents the maximum OS receive buffer size.

net.core.wmem_default

Represents the default OS send buffer size.

net.core.wmem_max

Represents the maximum OS send buffer size.

fs.aio-max-nr

Represents the total number of concurrent outstanding I/O requests.

net.ip_local_port_range

Represents the range of ports to be used for client connections.

For Oracle databases 11g and above versions, set the following network–related kernel parameters in the /etc/sysctl.conf file:

  • net.core.rmem_default to 262144
  • net.core.wmem_default to 262144
  • net.core.rmem_max to 4194304
  • net.core.wmem_max to 1048576

In Linux, the kernel is designed to overcommit memory beyond its physical memory to make memory usage more efficient. The overcommit model sometimes becomes problematic when all available memory, including disk swap space, is consumed. When this state is reached, the kernel will start killing processes to stay operational. It is the job of the Linux out-of-memory (OOM) killer to sacrifice one or more processes to free up memory for the system.

For Red Hat/Oracle Linux 5 customers, it is important to set the vm.min_free_kbytes kernel parameter to protect from the OOM killer condition. In this example, you will set this to a value of 51200KB (50 MB), which will tell the kernel to reserve 50MB of memory at all times. To activate these new settings into the running kernel space, run the sysctl –p command as root.

For Red Hat/Oracle Linux 6 customers, set the panic_on_oops kernel parameter. As the kernel panics (or oops) or when a fatal bug is encountered, you will encounter potential problems with the server. If this parameter is set to 0, the kernel will attempt to continue to run with consequences. If the parameter is set to 1, the kernel will enter panic state and shut down/reboot. As the root user, execute the following command:

# echo 1 > /proc/sys/kernel/panic_on_oops

To make this kernel parameter change permanent, modify the /etc/sysctl.conf file and add the following entry:

kernel.panic_on_oops = 1

Optimizing virtual memory requires the changes to a set of kernel parameters, which impacts Oracle database performance by affecting the rate at which virtual memory is consumed and released with Oracle databases: vm.swappiness, vm.dirty_background_ratio, vm.dirty_ratio, vm.dirty_expire_centisecs, and vm.dirty_writeback_centisecs.

Some enterprise Linux SAs don’t like the idea of disabling swap. The vm.swappiness parameter determines how aggressively memory pages are swapped to disk. The value can range from 0 to 100, and the default value is 60. The higher the value, the more aggressive is the rate of swapping out the physical memory when it is not active. We recommend setting the vm.swappiness parameter to 0. The Oracle-recommended value is 0 for Red Hat 6. You can monitor swap usage using native commands such as top, mem, and vmstat.

The vm.dirty_background_ratio parameter dictates the number of pages, specified in percentage, at which the pdflush background write back daemon will start writing out dirty data. The default value is 10; the Oracle recommended the value is 3.

The vm.dirty_ratio parameter dictates the number of pages, in percentage of total pages, at which a process that is generating disk writes starts writing out dirty data. This is the ratio at which dirty pages created by application disk writes will be flushed out to disk. The default value is 20; the Oracle-recommended value is 80.

The vm.dirty_expire_centisecs parameter dictates when dirty in-memory data is old enough to be eligible for writeout by the kernel flusher threads. Dirty in-memory data older than the value of this parameter will be written out the next time a flusher thread wakes up. The default value is 3000, expressed in hundredths of a second.

The vm.dirty_writeback_centisecs parameter dictates the interval when writes of dirty in-memory data are written out to disk. The default value is 500, expressed in hundredths of a second. The Oracle recommended value is 100.

In the /etc/sysctl.conf file, the following Linux kernel parameters can be set for database servers:

vm.swappiness = 0
vm.dirty_background_ratio = 3
vm.dirty_ratio = 80
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100

With different database workloads, you can tweak these kernel parameters as needed.

13-6. Configuring NTP for Oracle

Problem

You want to configure network time protocol (NTP) to keep your server clock in sync with all the servers.

Solution

You should synchronize your system time between your RAC nodes and even for your primary and standby database server by enabling the ntp daemon, which should be configured with the –x option to accept gradual time changes, also referred to as slewing. The slewonly option is mandatory for RACs and is also recommended for data guard configurations. To set up ntp with the –x option, modify the /etc/sysconfig/ntpd file, add the desired flag to the OPTIONS variable, and then restart the service with the service ntpd restart command:

# Drop root to id ’ntp:ntp’ by default.
#OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"
# SYNC_HWCLOCK=no

Some SAs choose to update the system clock by executing the nptdate command via a scheduled job in cron, which does a brute force update of the system clock. If the system clock is off by 40 minutes, the time is immediately corrected. For RAC environments, setting the system clock with ntpdate can cause problems. As time is instantly caught up or sudden variances in time are detected, NTP can cause node evictions.

After you modify the ntpd file with the slewonly option, you have to push the files across all the database servers:

# for i in rac02 rac03 rac04; do scp ntpd ${i}:$PWD; done

ntpd                                          100%  255     0.3KB/s   00:00
ntpd                                          100%  255     0.3KB/s   00:00
ntpd                                          100%  255     0.3KB/s   00:00

You can check your current NTP configuration by checking the process status and filtering on the ntp daemon. In the following example, the ntpd service starts and checks to confirm that the settings are correct with the ps command:

[root@rac1 sysconfig]# service ntpd start
Starting ntpd:                                             [  OK  ]

[root@rac1 sysconfig]# ps -ef |grep -i ntp
ntp       3496     1  0 10:38 ?        00:00:00 ntpd -x -u ntp:ntp -p /var/run/ntpd.pid
root      3500  2420  0 10:39 pts/1    00:00:00 grep -i ntp

Set up the NTP process for time synchronization on the host to restart after a server reboot. After enabling the ntpd daemon, execute the chkconfig command to validate that it is set up to start at the appropriate run levels:

# chkconfig ntpd on
# chkconfig --list|grep ntpd ntpd

0:off 1:off 2:on 3:on 4:on

How It Works

Time consistency across database servers is essential for every aspect of managing, troubleshooting, running, and debugging database events because everything is centered on time. You have to make sure that you have a point of reference when you are reviewing incidents and logfiles. You have to maintain an accurate system clock with NTP so that the OS time between the application server and the database server are the same. If you are also leveraging database links, you have to make sure that the source server and the remote server times are in sync.

Starting with Oracle Clusterware 11g Release 2 (11.2), Oracle provides another option for time synchronization that is intended for Oracle customers who can’t leverage NTP services: Oracle Cluster Time Synchronization Service (ctssd). If you leverage NTP, Oracle ctssd starts up only in observer mode. If you don’t leverage NTP, ctssd starts up in active mode and synchronizes time across all the RAC nodes. If you plan to leverage ctssd, deactivate NTP with the following commands:

# service ntpd stop
# chkconfig ntpd off

To confirm the mode in which the ctss daemon (ctssd) is working, use the following command:

$ crsctl check ctss

13-7. Bonding Network Interfaces

Problem

You want to pair two network interface cards (NICs) to increase bandwidth and provide high availability.

Solution

This recipe reviews the processes to combine multiple network interfaces (known as channel bonding or Ethernet bonding) to provide redundancy for your database server. Bonding a NIC is synonymous with port trunking and is relatively straightforward. First, you have to configure the Linux bond drivers. For example, in Red Hat 5, you must modify the /etc/modprobe.conf file to enable the bonding driver. For Red Hat 6, you must modify the /etc/modprobe.d/bonding.conf file. You must add entries for each of the logical interfaces in the modprobe.conf file that resemble the following:

alias bond0 bonding alias bond1 bonding
options bonding miimon=100 mode=1

A mode value of 0 indicates that you want a balanced round robin. A mode value of 1 indicates that you want an active backup for fault protection in which only one of the slave interfaces is active at a given time. The different slave interfaces become active only when the active slave fails. A mode value of 5 specifies that you want an adaptive transmit load balancing. Outgoing transmission will be distributed to the load of each slave interface. If one of the devices fails, the other device will assume responsibility and complete the network request. This configuration is popular because network switch support is not required. The miimon value of 100 specifies the amount of time in milliseconds when the link will be checked for failure.

In this particular solution, you are adding two bonded interfaces: one for the private interconnect and the other for the public network. You also have four network interfaces: eth0, eth1, eth2, and eth3.

If you have not bonded network interfaces before, most likely the bonding module is not loaded into the kernel. As root, execute the insmod bonding.ko command from the /lib/modules/ uname -r/kernel/drivers/net/bonding directory to insert the module into the kernel. To confirm that the bonding module is loaded, you can leverage the lsmod command piped to the grep command, as shown here, to provide the status of the modules in the kernel:

# lsmod |grep -i bonding
bonding     65128  0

Once you confirm that the bonding module is loaded into the kernel, you can proceed by configuring the logical interfaces by creating or modifying two configuration files in the /etc/sysconfig/network-scripts directory: ifcfg-bond0 and ifcfg-bond1. The entries for ifcfg-bond0 look like this for the private network:

DEVICE=bond0
IPADDR=192.168.1.20
NETWORK=192.168.1.0
NETMASK=255.255.255.0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes

You must modify the ifcfg-eth0 and ifcfg-eth1 files, which are the NICs for ifcfg-bond0. Start by modifying the ifcfg-eth0 file with these settings:

DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

Similarly, you can modify the ifcfg-eth1 file so it looks like what is shown here:

DEVICE=eth1
USERCTL=no ONBOOT=yes MASTER=bond0
SLAVE=yes
BOOTPROTO=none

Now you have to repeat the procedures described earlier to configure the ifcfg-bond1 interface for the public network interface. The ifcfg-bond1 interface file has to resemble this:

DEVICE=bond1
IPADDR=72.99.67.100
NETWORK=72.99.67.0
NETMASK=255.255.255.0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes

The key differences between ifcfg-bond0 and ifcfg-bond1 are the IPADDR, NETWORK, and NETMASK lines. After the ifcfg-bond1 file is created, you can proceed to modify the ifcfg-eth3 and ifcfg-eth4 files. You can create these two files to look like ifcfg-eth0 and ifcfg-eth1 and modify the DEVICE and MASTER names accordingly.

To enable the newly configured bonded network, you have to bounce the networking services. You can shut down all the interfaces with the service network stop command. As the final step, you have to start the bonded network interfaces by executing the service network start command.

How It Works

The Linux kernel comes with a bonding module that provides NIC teaming capabilities. The kernel bonding module teams multiple physical interfaces to a single logical interface.

Bonding or pairing a network is an important concept for RAC. Network interfaces that are not bonded are a single point of failure. Just as every other component of the RAC is built for redundancy, the network infrastructure must be, too.

In the /etc/modprobe.conf file, you specified options bonding miimon=100 mode=1. The miimon parameter, which stands for Media Independent Interface Monitor, represents the frequency for link monitoring. The value for miimon is specified in milliseconds (ms), is set to 0 by default, and is disabled.

The mode parameter specifies the type of configuration to be deployed. A value of 0, which is the default, indicates that a round-robin policy will be implemented, and each of the interfaces will take turns servicing requests. You can use a round-robin policy for load balancing. A value of 1 indicates that an active backup policy will be deployed. In an active backup policy, only one slave in the bond is active. One and only one device will transmit at any given moment. A value of 6 indicates adaptive load balancing.

In the ifcfg-eth[x] files, the MASTER parameter indicates the logical interface to which the particular NIC belongs. The SLAVE parameter indicates that the participating NIC is a member of bond interface. A SLAVE can belong to only one master.

13-8. Enabling Network Services Cache Daemon (nscd)

Problem

You want to enable nscd to better tolerate network failures associated with NAS devices or NFS mount points.

Solution

To enable ncsd, start by modifying the /etc/nscd.conf configuration file. You have to disable nscd options for passwd, group, and netgroup by modifying the enable-cache lines to no, as shown here:

# cat /etc/nscd.conf |grep enable-cache |grep -v ^# |sort -k3
enable-cache      netgroup   no
enable-cache      group      no
enable-cache      passwd     no
enable-cache      services   yes
enable-cache      hosts      yes

As the root or privileged user, start the nscd services with the service start nscd command. You have to enable nscd to start when the system is rebooted by issuing the following chkconfig command:

# chkconfig –level 345 nscd on

To confirm that nscd is enabled, issue the nscd command with the –g parameter, which prints the current configuration statistics. To see all valid options for nscd, pass the -? parameter.

If you performed a minimal OS install, you have to manually install nscd with the following yum command:

# yum –y install nscd

How It Works

The nscd is a small footprint daemon that provides a caching facility for the most common name service requests. The nscd can help when you have network hiccups and minimum changes are needed to enable ncsd. Oracle database servers can house heavy network-based workloads and issue lots of name lookups NFS or in a clustered environment. The goal is to reduce latency of service requests and reduce impact on a shared infrastructure with nscd. You can also expect performance improvements when using naming services such as DNS, NIS, NIS+, LDAP.

13-9. Aligning Disk Partitions Correctly

Problem

Lots of DBAs are not aware that they have to create a partition on the database disks. Correctly aligned partitions improve the performance of database workloads.

Solution

There are several techniques to properly configure partition alignment: parted, fdisk, or sfdisk. In this example, you will focus on the parted partition editor to create a partition on a LUN that will be served to house database files for Oracle.

Image Note  In Oracle Database 11g, the maximum size of an Oracle ASM disk can’t be larger than 2TB. Starting with Oracle Database 12c, the maximum size of an ASM disk can be up to 32 petabytes (PB) for allocation size (AU) of 8MB. For other AU sizes, the maximum ASM disk sizes are: 4PB for 1MB AU size, 8PB for 2MB AU size, and 16PB for 4MB AU size.

You will see the examples using the parted command because parted can handle disk sizes larger than 2TB. For examples of leveraging the sfdisk command, please visit http://www.dbaexpert.com/blog/partition-alignment-with-sfdisk/. You can add, remove, clone, or modify partitions on the disk with the parted command. This example creates a partition alignment of 1MB with the GNU parted executable:

# /sbin/parted -s /dev/sdb mklabel gpt mkpart /dev/sdb1 asm1 2048s 32.0GB

In the preceding example, quite a number of parameters are passed. You start with the –s option, which specifies that you want to script out the command-line options instead of it being an interactive prompt. You also pass the device name that you are manipulating, followed by the gpt value for the label type, which is the GUID partition table. The mklabel option creates a new disk label for the partition table. The mkpart option tells the parted command to create a partition on the specified device name with the file system type, start points, and end points for the partition. The start and end point can be either in sectors or megabytes. The parameter 2048s represents 2048s sectors, which equates to 1MB because each sector represents 512 bytes. In the example, you chose to create a partition of 32GB in size. In your environment, you will probably create the partitions to the maximum size of the LUN.

After you create partitions for the ASM disks, double-check the settings to ensure that the partition alignment is set up correctly. To check for partition alignment on an existing device, use the following:

# /sbin/parted -s /dev/sdb print
Model: VMware, VMware Virtual S (scsi)
Disk /dev/sdb: 34.4GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number  Start   End     Size    File system  Name       Flags
1049kB  32.0GB  32.0GB               /dev/sdb1

How It Works

Misaligned ASM disks can cause suboptimal performance for Oracle database workloads. By default on Linux, the first 63 blocks are reserved for the master boot record (MBR). The first data partition starts with offset at 31.5KB, which is derived from 63 blocks multiplied by 512 bytes. The offset of 31.5KB can cause misalignment situations on many storage arrays’ memory cache or RAID configurations, causing suboptimal performance due to overlapping I/Os. You want the partition offset to be 1MB or 4MB for Oracle databases.

Most importantly, use parted instead of fdisk or even sfdisk because you can create partitions for disks larger than 2TB with parted. Prior to Oracle Database 12c, you had the flexibility to use sfdisk because an ASM LUN couldn’t be larger than 2TB in size. As of Oracle Database 12c, the 2TB limitation for LUN size is lifted, and you have to start migrating to commands such as parted.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset