Introduction¶
Presentation¶
STATPACK is a Fortran 95/2003 multi-threaded library for solving the most commonly occurring mathematical and statistical problems in the processing of climate model outputs and datasets and more generally in the analysis of huge datasets. It is a freely-available software, and started from version 2, STATPACK is released under the GNU LGPL license. Details about the license can be found at LGPL License.
All the information related to STATPACK can be found at the following web site STATPACK. The distribution tar file of the software is available for download at this site. Other information is provided there, such as installation instructions and contact information. Instructions for installing the software can also be found below, in the chapter Installation.
The distribution tar file of STATPACK contains the Fortran 95/2003 sources of the library and the associated test and example programs. It also contains an ensemble of portable makefiles, which allows fast and automatic compilation of the STATPACK library on most UNIX/Linux systems, AIX and Mac OSX. This ensemble of portable makefiles also defines a friendly and useful environment for compiling (and executing) Fortran 95/2003 programs on a computer.
The routines available in STATPACK currently include:
- Numerical linear algebra subroutines for statistical computations (QR and QL decompositions, linear solvers, least square solvers, full and partial eigenvalue and singular value decompositions, …)
- Approximate partial eigenvalue and singular value decompositions based on randomized power subspace and block Krylov iterations
- Probability functions and their inverses
- Out of core statistical univariate and multivariate functions and subroutines
- Random numbers generation and Monte Carlo procedures
- Fast Fourier Transforms for both real and complex data of general length
- Time series analysis functions and subroutines
- Utilities for printing and sorting matrices and vectors
- Utilities for manipulating strings, dates and times
- …
The emphasis of the software is on fast and robust methods appropriate for problems in which the associated matrices are large and dense, for example, those arising in the statistical analysis of very high resolution climate model outputs for which standard commercial or free softwares currently used in the climate scientific community (e.g. MATLAB, IDL, Grads, Ferret, Ncl, …) may have difficulties.
Note also that prior to building STATPACK library, none other packages are required or must be installed. In other words, STATPACK is a fully portable software and the efficiency of STATPACK on a specific machine does not depend directly on the implementation of other pre-installed softwares such as the BLAS (e.g. Basic Linear Algebra Subprograms; [blas] ), but only on the quality/performance of the Fortran 95/2003 compiler (in particular of the performance of the built-in intrinsic procedures like the Fortran 90 matmul() and dot_product() functions) and its OpenMP [openmp] parallelism features. Optionally, however, the STATPACK library can also benefit from an optimized/multi-threaded BLAS library [blas] for enhanced performance at the user option. See the section Parallelism and BLAS for details.
STATPACK has been built successfully on a variety of UNIX systems (including Mac OSX and AIX) and with different Fortran 95/2003 compilers. It is believed that STATPACK is a portable software.
Finally, for most users in the climate community, best flexibility and usability are simply achieved with the use of the NCSTAT software [ncstat] in addition to the STATPACK library. The NCSTAT software is a collection of many UNIX stand-alone operators for statistical processing and analysis of huge climate model outputs and datasets stored in the NetCDF format [netcdf]. These stand-alone operators are also written in Fortran 90/95 using the NetCDF Fortran 90 interface [netcdf-f90] of the NetCDF library [netcdf] for input/output data transfer and the STATPACK software for numerical and parallel computations. More information about NCSTAT can be found at the following web site NCSTAT.
Language¶
The STATPACK library consists of numerical algorithms written in pure and portable Fortran 95/2003 language constructs without any obsolescent Fortran77 features [Fortran]. STATPACK uses all the new features of the Fortran 95/2003 standard, including:
- Fortran 90 array data types
- Symbolic names for the parameterization of the kind parameters for real, complex, integer and logical data
- Fortran 90 modules
- Fortran 90 explicit interfaces
- Overloading of procedures and functions for ease of use (e.g. generic routines)
- Allocatable and automatic arrays (dynamical storage)
- Optional arguments to functions and subroutines
- Assumed-shape arrays for arguments passing
- …
At the user option, new Fortran 2003 constructs and intrinsic modules, like the IEEE_EXCEPTIONS, IEEE_ARITHMETIC and IEEE_FEATURES modules [Fortran], can also be used in STATPACK. See the section Preprocessor cpp macros for more details.
Thus, to use this product you should be familiar with the Fortran 95/2003 language and you must have access to a Fortran 95/2003 compiler to build the STATPACK library. For more information on Fortran 95/2003, see [Fortran] or consult one of the many tutorials available on the Web, for example Fortran tutorial.
While many current standard Fortran packages, like LAPACK or SCALAPACK (e.g. two famous libraries of Fortran routines for solving problems in numerical linear algebra on shared-memory and distributed-memory architectures, respectively), include different versions of the subroutines and functions for different Fortran data types (e.g. real or complex single- and double-precision arithmetic), the systematic use of Fortran 95/2003 parameterized data types in STATPACK allows the user to choose exactly the precision of the version of the STATPACK library he wants [Buckley:1994a] [Buckley:1994b]. See the Fortran program ex1_svd_cmp.F90 for an illustration of the use of Fortran 95/2003 parameterized data types in STATPACK.
Currently, it is possible to build single, double and also quadruple precision versions of STATPACK, as well as specific versions requesting precise and portable precision specifications for real and complex computations included in the software [Buckley:1994a]. The choice for the precision specification for a particular version of the library is done when building the library. See the chapter Installation for more details.
The interface loading features of the Fortran 95/2003 language are also heavily used in STATPACK, an useful feature,
which is also missing in standard Fortran packages. As an illustration, the generic solve_lin()
function exported by the module
Lin_Procedures, which can be used to solve a linear system with one or multiple right hand sides, accepts the following calls:
x(:n) = solve_lin( mat(:n,:n) , b(:n) , tol=tol ) x(:n,:m) = solve_lin( mat(:n,:n) , b(:n,:m) , tol=tol )
The advantages of the overloading are obvious since the same interface is used for one or several right hand sides, or for different precisions.
Parallelism and BLAS¶
STATPACK is a parallel, multi-threaded software based on the OpenMP standard. Therefore, it will run on multi-core or, more generally, shared memory multiprocessor computers. It is also possible to build sequential versions of STATPACK (e.g. if OpenMP compilation is disabled or if an OpenMP-enabled Fortran compiler is not available), even if it is not at all recommended for efficiency reasons. STATPACK does not run on distributed memory (e.g. clusters) parallel computers.
In both the LAPACK [lapack] and ScaLAPACK [scalapack] libraries, the exploitation of parallelism comes from the availability of a parallel BLAS implementation. In the LAPACK case, a number of BLAS libraries can be used to take advantage of multiple processing units on shared-memory systems; for example, the freely distributed ATLAS [atlas], GotoBLAS [gotoblas] and OpenBlas [openblas] libraries or other vendor BLAS like Intel MKL [mkl] are popular choices. In the ScaLAPACK case, parallelism is exploited by PBLAS [pblas], which is a parallel BLAS implementation that uses the Message Passing Interface (MPI; [mpi]) for communications on a distributed memory system.
In both cases, parallelism is enclosed inside the BLAS routines. In a typical multi-core implementation this means that each BLAS routine contains at least one parallel section and for each call to BLAS, a whole set of threads is started and stopped at least with each BLAS call. This thread management overhead is relatively small for level 3 BLAS routines [blas3], but it could be very significant for level 1 and 2 BLAS operations [blas1] [blas2] due to the low computational intensity in these level 1 and 2 BLAS kernels.
On the other hand, STATPACK does not rely directly on the BLAS to obtain high performance in its numerical linear algebra subroutines, since STATPACK uses parameterized data types and allows the user to build quadruple-precision version of the library (remember that BLAS, LAPACK and ScaLAPACK exist only in single- and double-precision). Instead, relatively good performance is obtained in STATPACK by reformulating old algorithms or developing new algorithms in a way that their implementations can be easily mapped on recent multi-core systems and take advantage of shared memory parallelization at a level well-above to the BLAS level.
In this spirit, most of the computer intensive methods offered by the STATPACK library are parallelized with the OpenMP Application Program Interface (OpenMP API; [openmp]), which is one of the most common techniques for shared memory parallelization available with modern Fortran 95/2003 compilers. The OpenMP API is a collection of compiler directives, library routines, and environment variables that can be used to specify shared-memory parallelism in C, C++ and Fortran programs. More information about OpenMP can be found at the following web site OpenMP or in the more friendly tutorial available at OpenMP tutorial. The best place to view OpenMP support by a large range of Fortran compilers is OpenMP compilers. Support for at least OpenMP 2.5 is requested for activation of OpenMP parallelism in STATPACK and most Fortran compilers are currently supporting the OpenMP 2.5 standard.
As explained above, the STATPACK library has been designed to take advantages of OpenMP shared-memory parallelism at a high level (e.g. well above the BLAS level) in the algorithms offered in the software. In most cases, this means that each STATPACK routine contains only one or two “global” OpenMP parallel regions and the critical parts of the algorithm are implemented with OpenMP synchronization and barrier primitives inside each parallel region. The programming challenge is thus to minimize the number of these artificial synchronization points in each OpenMP parallel region/routine. In this way, STATPACK users can benefit of good speedup in their programs on (shared memory) parallel computers for all choices of the parameterized Fortran real/complex data types.
Note, however, that the parallel paradigm used in STATPACK is still based on the classical fork-and-join scheduling model and, thus, differs from the more advanced methods used in the ongoing PLASMA project. PLASMA lays out matrices in small square tiles, such that each tile occupies a continuous memory region, and is based on new algorithms working on tiles [plasma].
Currently, PLASMA also relies on the BLAS and OpenMP for dynamic, task-based, scheduling [YarKhan_etal:2016], but offers only a collection of routines for solving linear systems of equations and linear least square problems [Abalenkovs_etal:2017], which are not sufficient for the goals of STATPACK. Furthermore, at least OpenMP 4.0 is required for compiling PLASMA and only a few Fortran compilers are currently supporting the full OpenMP 4.0 standard. This is why STATPACK is not currently using PLASMA for linear algebra computations for maximum portability across current Fortran compilers/platforms.
On the other hand, through the use of the OpenMP 2.5 API, it is expected that the STATPACK software will achieve portability to a wide range of platforms with good (parallel) performance and maximum flexibility and usability.
Moreover, optionally for single- and double-precision versions of the library, STATPACK can also benefit from an optimized/multi-threaded BLAS library (e.g. [gotoblas], [mkl], [gotoblas], …) and includes generic interfaces for several drivers in LAPACK [lapack], if these libraries are available on your computer. Note, however, that STATPACK does not share any code with the BLAS and LAPACK packages and it is a completely independent software. An optimized BLAS library will provide enhanced speedup with STATPACK if the quality of your Fortran 95/2003 compiler is not enough to obtain the best performance on your computer. This is typical for many compilers, for example with the GNU gfortran compiler, but will also restrict the available precisions of STATPACK since the BLAS (and also LAPACK) software is only available in single- and double-precisions.
Most of the modern Fortran 90 compilers (e.g. gfortran, flang, ifort, pgfortran, xlf95, nagfor, …) include a simple command line option to the compiler that activates and allows interpretation of all OpenMP directives included in the STATPACK library. If parallel computing is required, it is the responsibility of the user to include the relevant OpenMP command line options to compile the STATPACK library on his (parallel) computer. See the section OpenMP compilation below, on how to do this.
Finally, the number of processors used when executing an OpenMP conforming program using the STATPACK library and more generally the behaviour of such programs are determined
by setting some OpenMP environment variables (e.g. OMP_NUM_THREADS
, OMP_NESTED
, OMP_DYNAMIC
, …) just before the execution of the program.
See the OpenMP documentation available at OpenMP, OpenMP tutorial or the section
Parallel Execution for more details and some examples.