Digging deeper with perf

In most cases, working through this tiny checklist will help you track down the majority of problems in a pretty fast and efficient way. However, even the information that's been extracted from the database engine is sometimes not enough.

The perf tool is an analysis tool for Linux that allows you to directly see which C functions are causing problems on your system. Usually, perf is not installed by default, so it is recommended that you install it. To use perf on your server, just log in to a root and run the following command:

perf top

The screen will refresh itself every couple of seconds, and you will have a chance to see what is going on live. The following listing shows you what a standard, read-only benchmark might look like:

Samples: 164K of event 'cycles:ppp', Event count (approx.): 109789128766 
Overhead Shared Object Symbol
3.10% postgres [.] AllocSetAlloc
1.99% postgres [.] SearchCatCache
1.51% postgres [.] base_yyparse
1.42% postgres [.] hash_search_with_hash_value
1.27% libc-2.22.so [.] vfprintf
1.13% libc-2.22.so [.] _int_malloc
0.87% postgres [.] palloc
0.74% postgres [.] MemoryContextAllocZeroAligned
0.66% libc-2.22.so [.] __strcmp_sse2_unaligned
0.66% [kernel] [k] _raw_spin_lock_irqsave
0.66% postgres [.] _bt_compare
0.63% [kernel] [k] __fget_light
0.62% libc-2.22.so [.] strlen

You can see that no single function takes too much CPU time in our sample, which tells us that the system is just fine.

However, this may not always be the case. There is a problem called spinlock contention that is quite common. Spinlocks are used by the PostgreSQL core to synchronize things such as buffer access. A spinlock is a feature provided by modern CPUs to avoid operating system interaction for small operations (such as incrementing a number). If you think you may be facing spinlock contention, the symptoms are as follows:

  • Really high CPU load.
  • Incredibly low throughput (queries that usually take milliseconds suddenly take seconds).
  • I/O is usually low, because the CPU is busy trading locks.

In many cases, spinlock contention happens suddenly. Your system is just fine, and all of a sudden, the load goes up and the throughput drops like a stone. The perf top command will reveal that most of this time is spent in a C function called s_lock. If this is the case, you should try and do the following:

huge_pages = try               # on, off, or try

Change huge_pages from try to off. It can be a good idea to turn off huge pages altogether at the operating system level. In general, it seems that some kernels are more prone to producing these kinds of problems than others. The Red Hat 2.6.32 series seems to be especially bad (note that I have used the word seems here).

The perf tool is also interesting if you are using PostGIS. If the top functions in the list are all GIS-related (as in, from some underlying library), you know that the problem is most likely not coming from bad PostgreSQL tuning, but is simply related to expensive operations that take time to complete.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset