9.10. Run-time library functions

In addition to the directives described in previous sections, OpenMP provides a set of run-time library functions. It includes run-time execution functions and lock functions. The run-time functions allow an application to specify the mode in which to run. An application developer may wish to maximize throughput performance of the system, rather than time to completion. In such cases, the developer may tell the system to dynamically set the number of processes used to execute parallel regions. This can have a dramatic effect on the throughput performance of a system with only a minimal impact on the time to completion for a program.

The run-time functions also allow a developer to specify when to enable nested parallelism. Enabling nested parallelism allows the system to act accordingly when it encounters nested parallel constructs. On the other hand, by disabling nested parallelism, a developer can write a parallel library that will perform in an easily predictable fashion, whether executed dynamically from within or outside a parallel region.

For further information about these library functions, please refer to the “Parallel Processing Support” section of the VisualAge C++ for AIX Compiler Reference, SC09-4959.

9.10.1. Execution environment functions

In this section, let us look at the following important OpenMP run-time functions supported on AIX. The functions described in this section affect and monitor threads, processors, and the parallel environment:

  • omp_set_num_threads

  • omp_get_num_threads

  • omp_get_max_threads

  • omp_get_thread_num

  • omp_get_num_procs

  • omp_set_dynamic

  • omp_get_dynamic

  • omp_in_parallel

  • omp_set_nested

  • omp_get_nested

An example program using these functions is provided in 9.10.3, “Example usage of run-time library functions” on page 370.

omp_set_num_threads

The omp_set_num_threads() function sets the default number of threads to use for subsequent parallel regions that do not specify a num_threads clause. The syntax is as follows:

void omp_set_num_threads(int num_threads);

The dynamic threads mechanism modifies the effect of this routine.

  • Enabled: Specifies the maximum number of threads that can be used for any parallel region.

  • Disabled: Specifies exact number of threads to use until next call to this routine.

This routine can only be called from the serial portions of the code. This call has precedence over the OMP_NUM_THREADS environment variable (see 9.11, “Environment variables” on page 373).

omp_get_num_threads

The omp_get_num_threads() function returns the number of threads currently in the team executing the parallel region from which it is called. The syntax is as follows:

int omp_get_num_threads(void);

The num_threads clause, the omp_set_num_threads() function, and the OMP_NUM_THREADS environment variable control the number of threads in a team. If the number of threads has not been explicitly set by the user, the default value is chosen. In AIX, it is the number of equipped processors on the system. This function binds to the closest enclosing parallel directive. If called from a serial portion of a program, or from a nested parallel region that is serialized, this function returns 1.

omp_get_max_threads

It returns the maximum value that can be returned by calls to the omp_get_num_threads() function. The syntax is as follows:

int omp_get_max_threads(void);

This function returns the maximum value, whether executing from a serial region or from a parallel region. If a program uses omp_set_num_threads to change the number of threads, subsequent calls to omp_get_max_threads() will return the new value.

When the omp_set_dynamic() routine is set to TRUE, we can use omp_get_max_threads() to allocate data structures that are maximally sized for each thread.

omp_get_thread_num

The omp_get_thread_num() function returns the thread number, within its team, of the thread executing the function. The thread number lies between 0 and omp_get_num_threads() – 1, inclusive. The master thread of the team is thread 0. The syntax is as follows:

int omp_get_thread_num(void);

This function binds to the closest enclosing parallel directive. The function returns zero when called from a serial region or from within a nested parallel region that is serialized.

omp_get_num_procs

The omp_get_num_procs() function returns the number of processors that are available to the program at the time the function is called. The syntax is as follows:

int omp_get_num_procs(void);

omp_set_dynamic

The omp_set_dynamic() function enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. The syntax is as follows:

void omp_set_dynamic(int dynamic_threads);

To obtain the best use of system resources, certain run-time environments automatically adjust the number of threads that are used for executing sub- sequent parallel regions. This adjustment is enabled only if the value of the scalar logical expression to omp_set_dynamic is set to TRUE. If the value of the scalar logical expression is set as FALSE, dynamic adjustment is disabled.

When dynamic adjustment is enabled, the number of threads specified by the user becomes the maximum thread count. The number of threads remains fixed throughout each parallel region and is reported by omp_get_num_threads(). A call to omp_set_dynamic() has precedence over the OMP_DYNAMIC environment variable.

The default for dynamic thread adjustment is implementation dependent. In AIX, by default, dynamic adjustment is enabled. A user code that depends on a specific number of threads for correct execution should explicitly disable dynamic threads. Implementations are not required to provide the ability to dynamically adjust the number of threads, but they are required to provide the interface in order to support portability across platforms.

omp_get_dynamic

The omp_get_dynamic() function returns a nonzero value if dynamic adjustment of threads is enabled, and returns 0 otherwise. The syntax is as follows:

int omp_get_dynamic(void);

This function returns 1 if dynamic thread adjustment is enabled; otherwise, it returns 0. The function always returns 0 if dynamic adjustment of the number of threads is not implemented.

omp_in_parallel

The omp_in_parallel() function returns a nonzero value if it is called within the dynamic extent of a parallel region executing in parallel; otherwise, it returns 0. The syntax is as follows:

int omp_in_parallel(void);

The omp_in_parallel() function determines whether a region is executing in parallel. A parallel region that is serialized is not considered to be a region executing in parallel.

omp_set_nested

The omp_set_nested() function enables or disables nested parallelism. The syntax is as follows:

void omp_set_nested(int nested);

If the value of the scalar logical expression is FALSE, nested parallelism is disabled, and nested parallel regions are serialized and executed by the current thread. This is the default.

omp_get_nested

The omp_get_nested() function returns a nonzero value if nested parallelism is enabled and 0 if it is disabled. The syntax is as follows:

int omp_get_nested(void);

If an implementation does not implement nested parallelism, this function always returns 0.

Note

In the current implementation, nested parallel regions are always serialized. As a result, omp_set_nested() does not have any effect, and omp_get_nested() always returns 0 on AIX.


9.10.2. Lock functions

The functions described in this section manipulate locks used for synchronization. For the following functions, the lock variable must have type omp_lock_t. This variable must only be accessed through these functions. All lock functions require an argument that has a pointer to omp_lock_t type.

  • omp_init_lock

  • omp_destroy_lock

  • omp_set_lock

  • omp_unset_lock

  • omp_test_lock

For the following functions, the lock variable must have type omp_nest_lock_t. This variable must only be accessed through these functions. All nestable lock functions require an argument that has a pointer to omp_nest_lock_t type.

  • omp_init_nest_lock

  • omp_destroy_nest_lock

  • omp_set_nest_lock

  • omp_unset_nest_lock

  • omp_test_nest_lock

An example program using these functions is provided in 9.10.3, “Example usage of run-time library functions” on page 370.

omp_init_lock and omp_init_nest_lock

These functions provide the only means of initializing a lock. Each function initializes the lock associated with the parameter lock for use in subsequent calls. The syntax is as follows:

void omp_init_lock(omp_lock_t *lock);
void omp_init_nest_lock(omp_nest_lock_t *lock);

The initial state is unlocked (that is, no thread owns the lock). For a nestable lock, the initial nesting count is zero. It is noncompliant to call either of these routines with a lock variable that has already been initialized.

omp_destroy_lock and omp_destroy_nest_lock

These functions ensure that the pointed to lock variable lock is uninitialized. The syntax is as follows:

void omp_destroy_lock(omp_lock_t *lock);
void omp_destroy_nest_lock(omp_nest_lock_t *lock);

It is noncompliant to call either of these routines with a lock variable that is uninitialized or unlocked.

omp_set_lock and omp_set_nest_lock

Each of these functions blocks the thread executing the function until the specified lock is available and then sets the lock. A simple lock is available if it is unlocked. A nestable lock is available if it is unlocked or if it is already owned by the thread executing the function. The syntax is as follows:

void omp_set_lock(omp_lock_t *lock);
void omp_set_nest_lock(omp_nest_lock_t *lock);

For a simple lock, the argument to the omp_set_lock() function must point to an initialized lock variable. Ownership of the lock is granted to the thread executing the function. For a nestable lock, the argument to the omp_set_nest_lock() function must point to an initialized lock variable. The nesting count is incremented, and the thread is granted, or retains, ownership of the lock.

omp_unset_lock and omp_unset_nest_lock

These functions provide the means of releasing ownership of a lock. The syntax is as follows:

void omp_unset_lock(omp_lock_t *lock);
void omp_unset_nest_lock(omp_nest_lock_t *lock);

The argument to each of these functions must point to an initialized lock variable owned by the thread executing the function. The behavior is undefined if the thread does not own that lock.

For a simple lock, the omp_unset_lock() function releases the thread executing the function from ownership of the lock. For a nestable lock, the omp_unset_nest_lock() function decrements the nesting count, and releases the thread executing the function from ownership of the lock if the resulting count is zero.

omp_test_lock and omp_test_nest_lock

These functions attempt to set a lock but do not block execution of the thread. The syntax is as follows:

int omp_test_lock(omp_lock_t *lock);
int omp_test_nest_lock(omp_nest_lock_t *lock);

The argument must point to an initialized lock variable. These functions attempt to set a lock in the same manner as omp_set_lock() and omp_set_nest_lock(), except that they do not block execution of the thread.

For a simple lock, the omp_test_lock() function returns a nonzero value if the lock is successfully set; otherwise, it returns zero. For a nestable lock, the omp_test_nest_lock() function returns the new nesting count if the lock is successfully set; otherwise, it returns zero.

9.10.3. Example usage of run-time library functions

This section provides two example source codes in order to demonstrate the usage of run-time library functions.

Note

Before calling any OpenMP run-time library functions, the omp.h header file must be included in the source code.


Usage of run-time execution functions

The sample code shown in Example 9-8 illustrates the basic use of various run-time routines that we have discussed so far. It displays the values returned by the various run-time routines both in the serial and parallel region. For simplicity, the single directive is used to just print the values returned by one instance of a executing thread.

Example 9-8. omp_runtime.c
# include <stdio.h>
# include <omp.h>

int main(int argc, char *argv[])
{
    printf("Before forking a parallel region.
");
    printf("---------------------------------
");
    printf("omp_get_num_threads returns     %d
", omp_get_num_threads());
    printf("omp_get_max_threads return  %d
", omp_get_max_threads());
    printf("omp_get_thread_num returns  %d
", omp_get_thread_num());
    printf("omp_get_num_procs returns   %d
", omp_get_num_procs());
    printf("omp_get_dynamic returns     %d
", omp_get_dynamic());
    printf("omp_in_parallel returns     %d
", omp_in_parallel());
    printf("omp_get_nested returns      %d
", omp_get_nested());
    printf("
After forking a parallel region.
");
    printf("---------------------------------
");

    omp_set_num_threads(6);/* set the number of threads at run time to 10. */
    /* does not have any effect, just to illustrate that here. */
    omp_set_nested(1);

    # pragma omp parallel
    {
         #pragma omp single /* to print the values once. */
      {
         printf("omp_get_num_threads returns    %d
"
             , omp_get_num_threads());
         printf("omp_get_max_threads return %d
", omp_get_max_threads());
         printf("omp_get_thread_num returns %d
", omp_get_thread_num());
         printf("omp_get_num_procs returns  %d
", omp_get_num_procs());
         printf("omp_get_dynamic returns    %d
", omp_get_dynamic());
         printf("omp_in_parallel returns    %d
", omp_in_parallel());
         printf("omp_get_nested returns     %d
", omp_get_nested());
      }
   } /* All threads join master thread and terminate. */
}

When executed, the program prints the following output:

Before forking a parallel region.
---------------------------------
omp_get_num_threads returns     1
omp_get_max_threads return      3
omp_get_thread_num returns      0
omp_get_num_procs returns       3
omp_get_dynamic returns         1
omp_in_parallel returns         0
omp_get_nested returns          0

After forking a parallel region.
--------------------------------
omp_get_num_threads returns    6
omp_get_max_threads return     6
omp_get_thread_num returns     4
omp_get_num_procs returns      3
omp_get_dynamic returns        1
omp_in_parallel returns        1
omp_get_nested returns         0

Usage of lock functions

The race condition problem that we discussed in 9.9.6, “reduction clause” on page 358 can also be solved by using the lock functions discussed so far. In the modified version of the program shown in Example 9-9 on page 372, we lock and unlock the inner for loop section using the omp_set_lock() and omp_unset_lock() functions respectively. This prevents the simultaneous entry of threads into the for block; therefore, the variable result is incremented properly.

Example 9-9. omp_lock.c
#include <stdio.h>
#include <omp.h>

int main(int arg6, char *argv[])
{
    int i = 0, j = 0;
    int result = 0;
    omp_lock_t lock; /* lock variable to be initialized. */

    omp_init_lock(&lock); /* Initializes the lock before using it. */

    #pragma omp parallel for private(i)
    for (i = 0; i < 3; i++) {
        omp_set_lock(&lock); /* sets the lock here. */
        for (j = i + 1; j < 4; j++) {
            /*
             * already locked, therefore it doesn't block the execution of the
             * thread.
             */
            if (!omp_test_lock(&lock)) {
                printf("Hello.
");
                result = result + 1;
            }
        }
        omp_unset_lock(&lock); /* releases the lock. */
    }
    printf("Number of times printed Hello = %d
", result);
    omp_destroy_lock(&lock); /*Finally, destroys the lock */
}

When executed, the program prints the following output:

Hello.
Hello.
Hello.
Hello.
Hello.
Hello.
Number of times printed Hello = 6

This is the same output produced by Example 9-7 on page 361.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset