Using the STATPACK library

This chapter describes how to compile and run programs that use the STATPACK library, and introduces its main conventions.

Example program

Note, first, that in order to use one of the STATPACK parameterized kind types, routines or constants in your program, you must include an use Statpack statement in your Fortran program, like:

use Statpack, only: stnd, i4b, solve_lin

The module Statpack, used in the example above, exports all the constants, routines and functions publicly available in STATPACK. Alternatively, you can also refer directly to the Fortran module, which contains the routine you want to use in your use statement, but it is much less convenient since you must remember for each STATPACK entity, the module which contains this entity.

The following complete program illustrates the use of the STATPACK function solve_lin() for solving a system of linear equations.

After, using an appropriate use Statpack statement for all the needed STATPACK entities, the code first allocates a square matrix mat, a solution vector x, a right hand side vector y and two working vectors, x2 and x2, of real kind stnd.

Next, the matrix mat and solution vector x are filled up with real random numbers of kind stnd and the corresponding right hand side vector y is generated by a matrix multiplication using the intrinsic matmul() function.

The resulting linear system composed by the coefficient matrix mat and the right hand side y is solved with the help of the solve_lin() STATPACK function to recover the solution vector x in the vector x2.

The accuracy of the solution is checked and the program finally prints some information collected during the process (e.g. error and timing of the computations).

program example_using_statpack
!
!
! Purpose
! =======
!
!   This program is intended to demonstrate the use of fonction SOLVE_LIN
!   in STATPACK.
!
! LATEST REVISION : 13/06/2018
!
! ================================================================================================
!
!
! USED MODULES
! ============
!
    use Statpack, only : i4b, stnd, solve_lin, merror, allocate_error
!
!
! STRONG TYPING IMPOSED
! =====================
!
    implicit none
!
!
! PARAMETERS
! ==========
!
! prtunit IS THE PRINTING UNIT, n IS THE SIZE OF THE LINEAR SYSTEM
!
    integer(i4b), parameter :: prtunit=6, n=4000
!
    character(len=*), parameter :: name_proc='Example of solve_lin'
!
!
! SPECIFICATIONS FOR VARIABLES
! ============================
!
    real(stnd)                              :: err, eps, elapsed_time
    real(stnd), dimension(:,:), allocatable :: a
    real(stnd), dimension(:),   allocatable :: b, x, x2, res
!
    integer :: iok, istart, iend, irate
!
!
! EXECUTABLE STATEMENTS
! =====================
!
!
!   EXAMPLE 1 : REAL MATRIX AND ONE RIGHT HAND-SIDE.
!
!   SET THE REQUIRED PRECISION OF THE RESULTS.
!
    eps = sqrt( epsilon( err ) )
!
!   ALLOCATE WORK ARRAYS.
!
    allocate( a(n,n), b(n), x(n), x2(n), res(n), stat=iok )
!
    if ( iok/=0 ) then
        call merror( name_proc//allocate_error )
    end if
!
!   GENERATE A n-by-n RANDOM DATA MATRIX a .
!
    call random_number( a )
!
!   GENERATE A n RANDOM SOLUTION VECTOR x .
!
    call random_number( x )
!
!   COMPUTE THE MATRIX-VECTOR PRODUCT b = a*x .
!
    b(:n) = matmul( a(:n,:n), x(:n) )
!
!   START TIMING THE COMPUTATIONS.
!
    call system_clock( count=istart, count_rate=irate )
!
!   COMPUTE THE SOLUTION VECTOR FOR LINEAR SYSTEM
!
!            a*x = b .
!
!   BY COMPUTING THE LU DECOMPOSITION WITH PARTIAL PIVOTING AND
!   IMPLICIT ROW SCALING OF MATRIX a WITH FUNCTION solve_lin.
!   ARGUMENTS a AND b ARE NOT MODIFIED BY THE FUNCTION.
!
    x2(:n) = solve_lin( a(:n,:n), b(:n) )
!
!   STOP THE TIMER.
!
    call system_clock( count=iend )
!
    elapsed_time = real( iend - istart, stnd )/real( irate, stnd )
!
!   CHECK THE RESULTS FOR SMALL RESIDUALS.
!
    res(:n) = x2(:n) - x(:n)
    err     = sum( abs(res(:n)) ) / sum( abs(x(:n)) )
!
!   DEALLOCATE WORK ARRAYS.
!
    deallocate( a, b, x, x2, res )
!
!   CHECK THE RESULTS FOR SMALL RESIDUALS.
!
    if ( err<=eps ) then
        write (prtunit,*) name_proc//' is correct'
    else
        write (prtunit,*) name_proc//' is incorrect'
    end if
!
    write (prtunit,*)
    write (*,'(a,i5,a,0pd12.4,a)')    &
      'The elapsed time for computing the solution of a linear real system of size ', &
       n, ' is', elapsed_time, ' seconds'
!
!
! END OF PROGRAM example_using_statpack
! =====================================
!
end program example_using_statpack

Assuming that this sample program is in the file example_using_statpack.f90, the steps to compile and link this sample program are detailed in the following section.

Compiling and linking

The simplest way is to copy the source file example_using_statpack.f90 (or your own program) in the $STATPACKDIR/myprograms directory. For illustration purpose, a copy of example_using_statpack.f90 is already stored in this directory.

To see the programs, which can be compiled and/or executed in the $STATPACKDIR/myprograms directory (the files must have the suffix .f90 or .F90), enter the make command:

$ make list

in this directory. The program example_using_statpack will appear as a target in the list. If you want to compile and execute directly the program example_using_statpack, just enter the make command:

$ make example_using_statpack

This command creates the executable example_using_statpack.out in the current directory, based on the informations given in your $STATPACKDIR/make.inc file, and executes it.

Alternatively, if you just want to compile this program, just enter the make command:

$ make example_using_statpack.compil

This also creates the executable example_using_statpack.out in the current directory, but without executing it. Moreover, the exact command used for creating the executable is printed on the screen and you can use this command as a model to compile and link any program using your STATPACK library outside from the $STATPACKDIR/myprograms directory.

To see the list of programs which can be compiled (e.g. the files with the suffix .f90 or .F90), enter the make command:

$ make list_compil

Finally, to clean the $STATPACKDIR/myprograms directory, enter the make command:

$ make clean

Shared libraries

To run a program linked with the shared version of the STATPACK library, the operating system must be able to locate the corresponding .so file at runtime. This shared version of the library can be created with the make dynlib command under the $STATPACKDIR directory (see the section Basic installation for more details).

If a shared library cannot be found (for example the STATPACK library), the following error will occur:

$ ./example_using_statpack.out
./example_using_statpack.out: error while loading shared libraries:
lib_statpack.so: cannot open shared object file: No such file or directory

Typically, this means that the Shell variable LOADFLAGS in your $STATPACKDIR/make.inc file is not correctly defined and you must correct it. To avoid this error, either modify your $STATPACKDIR/make.inc file or define the Shell variable LD_LIBRARY_PATH to include the directory where the library is installed.

For example, in the Bourne Shell (e.g. /bin/sh or /bin/bash), the library search path can be updated with the following command:

$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/statpack/library

where /path/to/statpack/library is the directory where the STATPACK library is installed. In the C-shell (e.g. /bin/csh or /bin/tcsh) the equivalent command is:

% setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH:/path/to/statpack/library

The standard prompt for the C-shell in the example above is the percent character %, and should not be typed as part of the command. To save retyping these commands each session, they can be placed in an individual or system-wide login file.

Finally, remember that the shared library dependencies of an executable can be listed with the ldd command (on a Unix system):

$ ldd ./example_using_statpack.out
linux-vdso.so.1 =>  (0x00007ffec16ed000)
lib_statpack.so => /usr/home/terray/statpack2/lib_statpack.so (0x00002b7193d05000)
libopenblas.so.0 => /usr/home/terray/lib-OpenBLAS-0.2.20-icc-ifort-bulldozer/lib/libopenblas.so.0 (0x00002b71946f5000)
libifport.so.5 => /opt/intel/15.0.6.233/composer_xe_2015.6.233/compiler/lib/intel64/libifport.so.5 (0x00002b7195975000)
libifcoremt.so.5 => /opt/intel/15.0.6.233/composer_xe_2015.6.233/compiler/lib/intel64/libifcoremt.so.5 (0x00002b7195ba5000)
libimf.so => /opt/intel/15.0.6.233/composer_xe_2015.6.233/compiler/lib/intel64/libimf.so (0x00002b7195f0d000)
libsvml.so => /opt/intel/15.0.6.233/composer_xe_2015.6.233/compiler/lib/intel64/libsvml.so (0x00002b71963cd000)
libm.so.6 => /lib64/libm.so.6 (0x00002b71972c5000)
libiomp5.so => /opt/intel/15.0.6.233/composer_xe_2015.6.233/compiler/lib/intel64/libiomp5.so (0x00002b71975cd000)
libintlc.so.5 => /opt/intel/15.0.6.233/composer_xe_2015.6.233/compiler/lib/intel64/libintlc.so.5 (0x00002b7197915000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b7197b75000)
libc.so.6 => /lib64/libc.so.6 (0x00002b7197d95000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b7198165000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b719837d000)
libirng.so => /opt/intel/15.0.6.233/composer_xe_2015.6.233/compiler/lib/intel64/libirng.so (0x00002b7198585000)
/lib64/ld-linux-x86-64.so.2 (0x000055e74098c000)

Parallel execution

Users may request a specific number of OpenMP threads to distribute the work done by an application using the STATPACK library, when OpenMP support has been activated at compilation of STATPACK.

As a general rule, don’t request more OpenMP threads than the number of processors available on your machine (excluding also processors used for hyperthreading), this will result in large loss of performance. Keep also in mind that the efficiency of shared memory parallelism as implemented in STATPACK with OpenMP also depends heavily on the workload of your shared memory computer at runtime.

More generally, threading performance of an application using STATPACK will depend on a variety of factors including the compiler, the version of the OpenMP library, the processor type, the number of cores, the amount of available memory, whether hyperthreading is enabled and the mix of applications that are executing concurrently with the application.

At the simplest level, the number of OpenMP threads used by an OpenMP multi-threaded application can be controlled by setting the OMP_NUM_THREADS OpenMP environment variable to the desired number of threads and the number of threads will be the same throughout the execution of the application. The OMP_NUM_THREADS OpenMP environment variable must be defined before the execution of the multi-threaded application to activate OpenMP parallelism.

Setting OpenMP environment variables is done the same way you set any other environment variables, and depends upon which Shell you use:

Setting the number of OpenMP threads to be used
Shell
Command line
csh/tcsh
setenv OMP_NUM_THREADS 8
sh/bash
export OMP_NUM_THREADS=8

In some cases, an OpenMP program will perform better if its OpenMP threads are bound to processors/cores (this is called “thread affinity”, “thread binding” or “processor affinity”) because this can result in better cache utilization, thereby reducing costly memory accesses. OpenMP version 3.1 API provides an environment variable to turn processor binding “on” or “off”. For example, to turn “on” thread binding you can use:

$ export OMP_PROC_BIND=TRUE   #if you are using a sh/bash Shell

Keep also in mind, that the OpenMP standard does not specify how much stack space an OpenMP thread should have. Consequently, implementations will differ in the default thread stack size and the default thread stack size can be easily exhausted for moderate/large applications on some systems. Threads that exceed their stack allocation may give a segmentation fault or the application may continue to run while data is being corrupted. If your OpenMP environment supports the OpenMP 3.0 OMP_STACKSIZE environment variable, you can use it to set the thread stack size prior to program execution. For example:

$ export OMP_STACKSIZE=10M   #if you are using a sh/bash Shell
$ export OMP_STACKSIZE=3000k #if you are using a sh/bash Shell

More generally, the run-time behaviour of an OpenMP multi-threaded application is also determined by setting some other OpenMP environment variables (e.g. OMP_NESTED or OMP_DYNAMIC for example) just before the execution of the application. See the official OpenMP documentation available at OpenMP or the more friendly tutorial OpenMP Environment Variables for more details and examples about OpenMP environment variables you can use.

All this management of the OpenMP threads can also be controlled and done inside your Fortran program with the help of the OpenMP API run-time library routines [openmp]. Consult the relevant information here OpenMP Run Time Library

Note, in particular, that the STATPACK routines may use OpenMP nested parallelism if the OMP_NESTED variable is set to TRUE or if the OpenMP run-time routine omp_set_nested() is used in your program to enable nested parallelism (e.g. calling the OpenMP subroutine omp_set_nested() with the value .true. will enabled nested parallelism after the call at runtime).

However, the usage of OpenMP nested parallelism is not recommended if you have compiled the STATPACK library with BLAS support and you have linked with a multi-threaded version of BLAS, such as [gotoblas], [openblas] or vendor BLAS like Intel MKL [mkl]. In such cases, it is strongly recommended to first desactivate OpenMP nested parallelism before executing of your application by using first the command:

$ export OMP_NESTED=FALSE   #if you are using a sh/bash Shell

and also to let OpenMP controls the multi-threading in the BLAS library, if possible.

In the case of OpenBLAS [openblas] or GotoBLAS [gotoblas], this can be done by using the makefile USE_OPENMP=1 option when compiling OpenBLAS or GotoBLAS. Consult the OpenBLAS manual for more details [openblas].

On the other hand, if your OpenBLAS or GotoBLAS library has already been compiled with multi-threading enabled, but no support for OpenMP (this is the default setting), it is strongly recommended to make sure that the number of threads used by these libraries is equal to one when STATPACK routines are called. Otherwise, OpenMP will not control the multi-threading in the BLAS routines called by the STATPACK routines and this will likely results in large loss of performance. To do this, use a command like (for OpenBLAS):

$ export OPENBLAS_NUM_THREADS=1   #if you are using a sh/bash Shell

or (for GotoBLAS):

$ export GOTO_NUM_THREADS=1   #if you are using a sh/bash Shell

before executing your application. In both cases, OpenBLAS or GotoBLAS will use only one thread throughout the execution of your program/application. Executing call openblas_set_num_threads(1) or call gotoblas_set_num_threads(1) right before a call to a STATPACK routine will do also. These calls have higher priority than the OPENBLAS_NUM_THREADS and GOTOBLAS_NUM_THREADS environment variables, respectively, and allow a finer control over the parallelism in your application (see the OpenBLAS or GotoBLAS documentation for more details).

Similarly, for Intel MKL [mkl], it is better to let OpenMP controls the multi-threading in the MKL BLAS. This can be done simply by undefining the Shell variable MKL_NUM_THREADS, which controls the number of threads (cores) for the Intel MKL BLAS library, before executing your application:

$ unset MKL_NUM_THREADS   #if you are using sh/bash Shell
Flag Counter