Using the STATPACK library¶
This chapter describes how to compile and run programs that use the STATPACK library, and introduces its main conventions.
Example program¶
Note, first, that in order to use one of the STATPACK parameterized kind types, routines or constants in your program, you must include an
use Statpack
statement in your Fortran program, like:
use Statpack, only: stnd, i4b, solve_lin
The module Statpack, used in the example above, exports all the constants, routines and functions publicly available in STATPACK.
Alternatively, you can also refer directly to the Fortran module, which contains the routine you want to use in your use
statement, but it is much less
convenient since you must remember for each STATPACK entity, the module which contains this entity.
The following complete program illustrates the use of the STATPACK function solve_lin()
for solving a system of linear equations.
After, using an appropriate use Statpack
statement for all the needed STATPACK entities, the code first allocates
a square matrix mat
, a solution vector x
, a right hand side vector y
and two working vectors, x2
and x2
,
of real kind stnd.
Next, the matrix mat
and solution vector x
are filled up with real random numbers of kind stnd and the corresponding right
hand side vector y
is generated by a matrix multiplication using the intrinsic matmul() function.
The resulting linear system composed by the coefficient matrix mat
and the right hand side y
is solved with the help of the solve_lin()
STATPACK function to recover the solution vector x
in the vector x2
.
The accuracy of the solution is checked and the program finally prints some information collected during the process (e.g. error and timing of the computations).
program example_using_statpack
!
!
! Purpose
! =======
!
! This program is intended to demonstrate the use of fonction SOLVE_LIN
! in STATPACK.
!
! LATEST REVISION : 13/06/2018
!
! =================================================================================
!
!
! USED MODULES
! ============
!
use Statpack, only : i4b, stnd, solve_lin, merror, allocate_error
!
!
! STRONG TYPING IMPOSED
! =====================
!
implicit none
!
!
! PARAMETERS
! ==========
!
! prtunit IS THE PRINTING UNIT, n IS THE SIZE OF THE LINEAR SYSTEM
!
integer(i4b), parameter :: prtunit=6, n=4000
!
character(len=*), parameter :: name_proc='Example of solve_lin'
!
!
! SPECIFICATIONS FOR VARIABLES
! ============================
!
real(stnd) :: err, eps, elapsed_time
real(stnd), dimension(:,:), allocatable :: a
real(stnd), dimension(:), allocatable :: b, x, x2, res
!
integer :: iok, istart, iend, irate
!
!
! EXECUTABLE STATEMENTS
! =====================
!
!
! EXAMPLE 1 : REAL MATRIX AND ONE RIGHT HAND-SIDE.
!
! SET THE REQUIRED PRECISION OF THE RESULTS.
!
eps = sqrt( epsilon( err ) )
!
! ALLOCATE WORK ARRAYS.
!
allocate( a(n,n), b(n), x(n), x2(n), res(n), stat=iok )
!
if ( iok/=0 ) then
call merror( name_proc//allocate_error )
end if
!
! GENERATE A n-by-n RANDOM DATA MATRIX a .
!
call random_number( a )
!
! GENERATE A n RANDOM SOLUTION VECTOR x .
!
call random_number( x )
!
! COMPUTE THE MATRIX-VECTOR PRODUCT b = a*x .
!
b(:n) = matmul( a(:n,:n), x(:n) )
!
! START TIMING THE COMPUTATIONS.
!
call system_clock( count=istart, count_rate=irate )
!
! COMPUTE THE SOLUTION VECTOR FOR LINEAR SYSTEM
!
! a*x = b .
!
! BY COMPUTING THE LU DECOMPOSITION WITH PARTIAL PIVOTING AND
! IMPLICIT ROW SCALING OF MATRIX a WITH FUNCTION solve_lin.
! ARGUMENTS a AND b ARE NOT MODIFIED BY THE FUNCTION.
!
x2(:n) = solve_lin( a(:n,:n), b(:n) )
!
! STOP THE TIMER.
!
call system_clock( count=iend )
!
elapsed_time = real( iend - istart, stnd )/real( irate, stnd )
!
! CHECK THE RESULTS FOR SMALL RESIDUALS.
!
res(:n) = x2(:n) - x(:n)
err = sum( abs(res(:n)) ) / sum( abs(x(:n)) )
!
! DEALLOCATE WORK ARRAYS.
!
deallocate( a, b, x, x2, res )
!
! CHECK THE RESULTS FOR SMALL RESIDUALS.
!
if ( err<=eps ) then
write (prtunit,*) name_proc//' is correct'
else
write (prtunit,*) name_proc//' is incorrect'
end if
!
write (prtunit,*)
write (*,'(a,i5,a,0pd12.4,a)') &
'The elapsed time for computing the solution of a linear real system of size ',&
n, ' is', elapsed_time, ' seconds'
!
!
! END OF PROGRAM example_using_statpack
! =====================================
!
end program example_using_statpack
Assuming that this sample program is in the file example_using_statpack.f90
, the steps to compile and link this sample program
are detailed in the following section.
Compiling and linking¶
The simplest way is to copy the source file example_using_statpack.f90
(or your own program) in the $STATPACKDIR/myprograms
directory.
For illustration purpose, a copy of example_using_statpack.f90
is already stored in this directory.
To see the programs, which can be compiled and/or executed in the $STATPACKDIR/myprograms
directory (the files must have the suffix .f90 or .F90),
enter the make command:
$ make list
in this directory. The program example_using_statpack will appear as a target in the list. If you want to compile and execute directly the program example_using_statpack, just enter the make command:
$ make example_using_statpack
This command creates the executable example_using_statpack.out
in the current directory, based on the informations given in
your $STATPACKDIR/make.inc
file, and executes it.
Alternatively, if you just want to compile this program, just enter the make command:
$ make example_using_statpack.compil
This also creates the executable example_using_statpack.out
in the current directory, but without executing it. Moreover, the exact command used
for creating the executable is printed on the screen and you can use this command as a model to compile and link any program using your STATPACK library
outside from the $STATPACKDIR/myprograms
directory.
To see the list of programs which can be compiled (e.g. the files with the suffix .f90 or .F90), enter the make command:
$ make list_compil
Finally, to clean the $STATPACKDIR/myprograms
directory, enter the make command:
$ make clean
Parallel execution¶
Users may request a specific number of OpenMP threads to distribute the work done by an application using the STATPACK library, when OpenMP support has been activated at compilation of STATPACK.
As a general rule, don’t request more OpenMP threads than the number of processors available on your machine (excluding also processors used for hyperthreading), this will result in large loss of performance. Keep also in mind that the efficiency of shared memory parallelism as implemented in STATPACK with OpenMP also depends heavily on the workload of your shared memory computer at runtime.
More generally, threading performance of an application using STATPACK will depend on a variety of factors including the compiler, the version of the OpenMP library, the processor type, the number of cores, the amount of available memory, whether hyperthreading is enabled and the mix of applications that are executing concurrently with the application.
At the simplest level, the number of OpenMP threads used by an OpenMP multi-threaded application can be controlled by setting
the OMP_NUM_THREADS
OpenMP environment variable to the desired number of threads
and the number of threads will be the same throughout the execution of the application. The OMP_NUM_THREADS
OpenMP
environment variable must be defined before the execution of the multi-threaded application to activate OpenMP parallelism.
Setting OpenMP environment variables is done the same way you set any other environment variables, and depends upon which Shell you use:
|
|
---|---|
|
|
|
|
In some cases, an OpenMP program will perform better if its OpenMP threads are bound to processors/cores (this is called “thread affinity”, “thread binding” or “processor affinity”) because this can result in better cache utilization, thereby reducing costly memory accesses. OpenMP version 3.1 API provides an environment variable to turn processor binding “on” or “off”. For example, to turn “on” thread binding you can use:
$ export OMP_PROC_BIND=TRUE #if you are using a sh/bash Shell
Keep also in mind, that the OpenMP standard does not specify how much stack space an OpenMP thread should have. Consequently,
implementations will differ in the default thread stack size and the default thread stack size can be easily exhausted for moderate/large
applications on some systems. Threads that exceed their stack allocation may give a segmentation fault or the application may continue to run while
data is being corrupted. If your OpenMP environment supports the OpenMP 3.0 OMP_STACKSIZE
environment variable, you can use
it to set the thread stack size prior to program execution. For example:
$ export OMP_STACKSIZE=10M #if you are using a sh/bash Shell
$ export OMP_STACKSIZE=3000k #if you are using a sh/bash Shell
More generally, the run-time behaviour of an OpenMP multi-threaded application is also determined by setting some other OpenMP
environment variables (e.g. OMP_NESTED
or OMP_DYNAMIC
for example) just before the execution of the application.
See the official OpenMP documentation available at OpenMP or the more friendly tutorial
OpenMP Environment Variables for more details
and examples about OpenMP environment variables you can use.
All this management of the OpenMP threads can also be controlled and done inside your Fortran program with the help of the OpenMP API run-time library routines [openmp]. Consult the relevant information here OpenMP Run Time Library
Note, in particular, that the STATPACK routines may use OpenMP nested parallelism if the OMP_NESTED
variable is set to TRUE
or if the OpenMP run-time routine omp_set_nested() is used in your program to enable nested parallelism (e.g. calling the OpenMP subroutine
omp_set_nested() with the value .true.
will enabled nested parallelism after the call at runtime).
However, the usage of OpenMP nested parallelism is not recommended if you have compiled the STATPACK library with BLAS support and you have linked with a multi-threaded version of BLAS, such as [gotoblas], [openblas] or vendor BLAS like Intel MKL [mkl]. In such cases, it is strongly recommended to first desactivate OpenMP nested parallelism before executing of your application by using first the command:
$ export OMP_NESTED=FALSE #if you are using a sh/bash Shell
and also to let OpenMP controls the multi-threading in the BLAS library, if possible.
In the case of OpenBLAS [openblas] or GotoBLAS [gotoblas], this can be done by using the makefile USE_OPENMP=1
option when compiling OpenBLAS or GotoBLAS.
Consult the OpenBLAS manual for more details [openblas].
On the other hand, if your OpenBLAS or GotoBLAS library has already been compiled with multi-threading enabled, but no support for OpenMP (this is the default setting), it is strongly recommended to make sure that the number of threads used by these libraries is equal to one when STATPACK routines are called. Otherwise, OpenMP will not control the multi-threading in the BLAS routines called by the STATPACK routines and this will likely results in large loss of performance. To do this, use a command like (for OpenBLAS):
$ export OPENBLAS_NUM_THREADS=1 #if you are using a sh/bash Shell
or (for GotoBLAS):
$ export GOTO_NUM_THREADS=1 #if you are using a sh/bash Shell
before executing your application. In both cases, OpenBLAS or GotoBLAS will use only one thread throughout the execution of your program/application.
Executing call openblas_set_num_threads(1)
or call gotoblas_set_num_threads(1)
right before a call to a STATPACK routine will do also.
These calls have higher priority than the OPENBLAS_NUM_THREADS
and GOTOBLAS_NUM_THREADS
environment variables, respectively, and
allow a finer control over the parallelism in your application (see the OpenBLAS or GotoBLAS documentation for more details).
Similarly, for Intel MKL [mkl], it is better to let OpenMP controls the multi-threading in the MKL BLAS. This can be done simply
by undefining the Shell variable MKL_NUM_THREADS
, which controls the number of threads (cores) for the Intel MKL
BLAS library, before executing your application:
$ unset MKL_NUM_THREADS #if you are using sh/bash Shell
Using long integers¶
If you are using huge data arrays (i.e., if indexing exceeds 2^32-1
), it may be useful or even mandatory to define the i4b integer type used in STATPACK library as 64 bit.
In order to do this, the first step is to select the appropriate parameterized i4b integer kind type used in STATPACK. By default, the i1b, i2b, i4b and i8b integer types used in STATPACK refer to 1-, 2-, 4- and 8-bytes integers, respectively; but these default correspondances can be altered simply by commenting/uncommenting lines in the Select_Parameters module, as in the following example:
! ! use The_Kinds, only : i1b , i2b , i4b , i8b ! ! use The_Kinds, only : i1b=>i4b1, i2b=>i4b2, i4b , i8b ! ! use The_Kinds, only : i1b=>i4b1, i2b=>i4b2, i4b , i8b=>i4b8 ! ! use The_Kinds, only : i1b=>i8b1, i2b=>i8b2, i4b=>i8b4, i8b ! use The_Kinds, only : i1b , i2b , i4b=>i8b4, i8b
The last statement redefines all the i4b integers used in STATPACK as i8b integers. Once you have customized appropriately the file $STATPACKDIR/sources/Module_Select_Parameters.F90
with these choices for the integer kind types (i1b, i2b, i4b and i8b) used in STATPACK, execute the make command:
$ make lib
in the $STATPACKDIR
or $STATPACKDIR/sources
directory to recompile the full STATPACK library with 64-bit integers. See the section Basic installation for more details.
Note, however, that if you want to use your new 64-bit integer STATPACK library with BLAS support and you plan to link your 64-bit integer STATPACK library with a (multi-threaded) version of BLAS, such as [gotoblas], [openblas] or vendor BLAS like Intel MKL [mkl], both the STATPACK and BLAS libraries should also be compiled with the -i8
compiler flag (this compiler option defines automatically default integers used in a Fortran program as 64 bit); otherwise the generic interfaces for the BLAS subroutines defined in the module BLAS_interfaces will not work properly. In the case of Intel MKL BLAS [mkl], this means that you must link STATPACK with the ilp64 version of the Intel MKL library, which defines default integers as 64 bit (the standard lp64 version of Intel MKL library assumes that default integers are standard 32 bit). In the case of OpenBLAS [openblas] or GotoBLAS [gotoblas], this can be done by using the makefile INTERFACE64=1
and LIBNAMESUFFIX=64
options when compiling OpenBLAS or GotoBLAS. Consult the OpenBLAS manual for more details [openblas].
These considerations apply also if you are planning to use both the STATPACK and LAPACK libraries [lapack] in your Fortran code with the help of the generic interfaces for the LAPACK subroutines defined in the module Lapack_interfaces.