Release History¶

Release: 2.3

Date: Mar 18, 2024

This is a major release with many bug corrections, speed and accuracy improvements, improved documentation in linear algebra subroutines and a set of new routines, including new fast SVD drivers in module SVD_Procedures (e.g., see subroutines svd_cmp7() and svd_cmp8()), new faster and accurate subroutines for computing singular values of real bidiagonal and, more generally, rectangular matrices based on the differential quotient difference with shifts (dqds) algorithm (e.g., see subroutines bd_lasq1() and bd_dqds() in module SVD_Procedures), a new (deterministic) SVD-based linear least squares solver based the one-sided Rhala-Barlow bidiagonalization algorithm in module LLSQ_Procedures and, finally, new or updated routines for generation of random general, symmetric or orthogonal matrices in module Random.

Changes and updates from version 2.2 of Statpack:

module Lapack_Interfaces:

Generic interfaces have been added for the following BLAS subroutines: Xgeqr2(), Xgeqrf(), Xgeqp3(), Xtpqrt(), Xtpmqrt() where X can be s, d, c and z. These Lapack subroutines perform various (blocked) QR factorizations. The new generic interfaces have the form: geqr2, geqrf, geqp3, tpqrt, tpmqrt.

module Select_Parameters:

Changes of the default values of the public parameters max_num_threads_symtrid_cmp, max_num_threads_symtrid_cmp2, max_num_threads_bd_cmp and max_num_threads_bd_cmp2 from 18 to 128 to deliver maximum performance on many systems.

module Num_Constants:

Two new cpp processor macros are defined, _SCALE and _NOT_DENORMALIZED. The first one allows to select how numerical constants defined as integer powers of two are computed in the module. If the processor macro _SCALE is activated at compilation these numerical constants are computed with the scale() intrinsic function instead of direct exponentiation. This macro improves the portability of Statpack for compilers which do not implement correctly exponentiation with integers or, alternatively, the scale() intrinsic function. The cpp processor macro _NOT_DENORMALIZED can be activated for improving the accuracy of some scaling constants for computing Euclidean or Frobenius norms if you are sure that denormalized floating point numbers do not exist in the floating point arithmetic of your computer.

module String_Procedures:

Subroutine int_to_string() has been modified for bug corrections with more recent Fortran compilers.

module Time_Procedures:

Functions ymd_to_dayweek() et daynum_to_dayweek() have been modified for bug corrections.

Documentation of the function time_to_string() has been modified to give the correct format of the string result of the function.

module QR_Procedures:

The CPP macros CCP _WHERE and _USE_INTEL are no more used in the module.

Improved use of numerical constants directly imported from module Num_Constants in the subroutines of the module.

A new module parameter has been included, eps_qr_pivoting, which allows to control when the norms of the columns of a matrix must be recomputed in subroutines performing a (partial) QR with column pivoting (e.g., qrfac(), partial_qr_cmp() and partial_qr_cmp_fixed_precision()).

module Lin_Procedures:

Speed improvements in the chol_cmp() and gchol_cmp() subroutines when the optional logical argument upper is used and set to false. These routines now use the intrinsic function transpose() or, alternatively, the Statpack function transpose2() (if the cpp processor macro _TRANSPOSE is defined at compilation) instead of a simple do loop for transposing the input matrix.

module FFT_Procedures:

“select case” constructs have been replaced by simple “if then else” constructs in (generic) subroutines fftxy_dim() and fft_dim() when the input (real or complex) matrix is tridimensional. This improves the portability of Statpack for some compilers when OpenMP is used.

module Random:

Improved documentation of subroutines random_qr_cmp(), gen_random_sym_mat() and gen_random_mat().

A new subroutine, gen_random_ortho_mat(), which allows the fast generation of a random orthogonal matrix (or only its first columns) has been added in the module. This subroutine generates the random orthogonal matrix as a Householder transformation matrix applied to a (uniform) random vector.

An optional logical parameter, hous, has been added to the arguments of subroutines gen_random_sym_mat() and gen_random_mat(). This argument allows the user to choose the method for the generation of the random eigenvectors in gen_random_sym_mat() or of the singular vectors in gen_random_mat(). If hous=true, the eigenvectors/singular vectors are generated as Householder transformation matrices; if hous=false, they are generated as pseudo-random orthogonal matrices following the Haar distribution from the group of orthogonal matrices (this is the method used in previous versions of Statpack).

Improved BLAS and OpenMP supports for subroutines gen_random_sym_mat() and gen_random_mat().

Improved dynamical allocations in subroutines gen_random_mat() and gen_random_ortho_mat().

Improved use of numerical constants directly imported from module Num_Constants in the subroutines included in the module.

The codes of the subroutines partial_rqr_cmp() and partial_rqr_cmp_fixed_precision() have been modified to deal with a bug in the new ifx Fortran Intel compiler (version 23.1) related to the allocate statement when we reallocate an allocatable array with a new (larger) size.

module EIG_Procedures:

Improved code, BLAS and OpenMP supports (and documentation) in subroutine symtrid_cmp2(), which leads to speed improvements for large matrices. A bug when BLAS support is activated has also been corrected in subroutine symtrid_cmp2().

Improved documentation in subroutines symtrid_ratqri2(), symtrid_bisect() and select_eigval_cmp2().

Minor changes in the code of the generic subroutines select_eigval_cmp(), select_eigval_cmp2() and select_eigval_cmp3().

One new cpp processor macro is defined, _SCALE. This allows to select how numerical constants defined as integer powers of two are computed in the module. If the processor macro _SCALE is activated at compilation these numerical constants are computed with the scale() intrinsic function instead of direct exponentiation. This macro improves the portability of Statpack for compilers which do not implement correctly exponentiation with integers or, alternatively, the scale() intrinsic function.

module SVD_Procedures:

Two new public subroutines for computing singular values of a real bidiagonal matrix based on the differential quotient difference with shifts (dqds) algorithm are now included in the module (e.g., bd_lasq1() and bd_dqds()). The bd_lasq1() subroutine is (much) faster than bd_svd(), bd_singval() and bd_singval2() subroutines, which can also compute singular values of bidiagonal matrices, in most cases. Furthermore, bd_lasq1() and bd_dqds() deliver about the same accuracy as bd_singval() and bd_singval2(), which are based on the bisection algorithm.

Two new very fast SVD drivers, svd_cmp7() and svd_cmp8(), based on the dqds and inverse iteration algorithms have been added to the module. These two new routines allow the user to compute the full or partial SVD of a real matrix as the svd_cmp6() driver already included in the previous version of Statpack. However, these two new SVD drivers are usually faster than all other SVD drivers included in Statpack, but are less robust and accurate than svd_cmp(), svd_cmp2() or svd_cmp5() for matrices with multiple singular values or with a very high condition number.

A new subroutine, las2(), for computing the singular values of a 2-by-2 real matrix, is now included in the module.

All the drivers and computational routines for computing the full SVD of a real matrix now offer the possibility of computing singular values with the dqds algorithm as the default choice or at the user option. This concerns the following routines: singvalues(), bd_svd(), bd_svd2(), svd_cmp(), svd_cmp2(), svd_cmp3(), svd_cmp4(), svd_cmp5(), svd_cmp6(), svd_cmp7(), svd_cmp8(), rsvd_cmp(), rsvd_cmp_fixed_precision(), reig_pos_cmp(), rqr_svd_cmp(), rqr_svd_cmp_fixed_precision(), rqlp_svd_cmp(), rqlp_svd_cmp2(), rqlp_svd_cmp_fixed_precision(), select_singval_cmp(), select_singval_cmp2(), select_singval_cmp3(), select_singval_cmp4(), product_svd_cmp(), ginv(), comp_ginv().

Updates of the code of the module for correcting deficiencies in new versions (from version 21.4.0) of the Intel ifort fortran compiler related to the evaluation of IF clauses in OpenMP directives.

The module parameter thr_bd_svd_blas3 has now a smaller value of 2, which leads to more frequent use of a blocked (e.g., BLAS3 if BLAS support is activated) algorithm when updating singular vectors in the generic subroutines bd_svd() and bd_svd2(). This gives speed improvements if BLAS support is activated on many machines. These changes also give speed improvements in the SVD drivers included in the module or in the SVD-based linear least squares solvers in module LLSQ_Procedures.

Improved code, BLAS and OpenMP supports (and documentation) in the generic subroutines bd_cmp2() and bd_cmp3(), which leads to speed improvements for large matrices. These changes also give speed improvements in the SVD drivers or SVD linear least squares solvers (in module LLSQ_Procedures) based on the one-sided Rhala-Barlow bidiagonalization algorithm.

An optional logical parameter, reortho, has been added to the arguments of the generic subroutine bd_cmp2(), which allows now the user to control if reorthogonalization is activated or not in the one-sided Rhala-Barlow bidiagonalization algorithm if a “deficient” matrix is processed (in previous versions of Statpack, reorthogonalization was always activated in such case in bd_cmp2() ).

Corrections in the documentation of the generic subroutines select_singval_cmp3() and select_singval_cmp4() concerning the description of the one-sided Rhala-Barlow bidiagonalization algorithm.

An optional logical parameter, reortho, has been added to the arguments of the generic SVD drivers select_singval_cmp3(), select_singval_cmp4(), svd_cmp3() and svd_cmp4(), which uses the one-sided Rhala-Barlow bidiagonalization algorithm due to the above modification in the generic subroutine bd_cmp2(). The documentation of these drivers has also been updated accordingly.

Introduction of 2 new module parameters, eps_bd_cmp2 and eps_bd_cmp3, which are used in the new tests of “deficiency” in generic subroutines bd_cmp2() and bd_cmp3() (e.g., the output value of the logical argument failure in bd_cmp2() and bd_cmp3() ).

The output value of the logical argument failure is now computed in the same coherent way in subroutines bd_cmp2() and bd_cmp3() and the documentation of these subroutines has been updated to reflect these changes.

Update of the condition for activating the OpenMP parallelization in the subroutines bd_svd() and bd_svd2() when computing singular vectors.

Improved BLAS support for the generic subroutine comp_ginv().

Change of the meaning and value of the module parameter thr_bt, which controls the variants of the back-transformation algorithm for updating singular vectors in (generic) subroutines bd_inviter2() and bd_deflate2(). If almost the full set of singular vectors is requested the initial orthogonal factors of the bidiagonal decomposition of the input matrix are generated and the final singular vectors are computed by matrix multiplication (if enough memory is available) instead of applying directly the Householder transformations to the singular vectors of the bidiagonal form of the input matrix. This gives large speed improvements in subroutines bd_inviter2() and bd_deflate2() when BLAS support and/or OpenMP are activated in many cases.

Improved documentation of generic subroutines bd_inviter2() and bd_deflate2().

Improved accuracy of of generic subroutines bd_inviter() and bd_inviter2() for exactly rank deficient real matrices.

Cleaning of the importation of numerical constants from the Num_Constants module in all routines of the module.

One new cpp processor macro is defined, _SCALE. This allows to select how numerical constants defined as integer powers of two are computed in the module. If the processor macro _SCALE is activated at compilation these numerical constants are computed with the scale() intrinsic function instead of direct exponentiation. This macro improves the portability of Statpack for compilers which do not implement correctly exponentiation with integers or, alternatively, the scale() intrinsic function.

module LLSQ_Procedures:

Improved BLAS and OpenMP supports in the generic subroutine llsq_svd_solve(), which computes minimun 2-norm solutions of least square problems with a SVD of the coefficient matrix.

A new generic subroutine, llsq_svd_solve2() has been added in the module. This generic subroutine computes the minimun 2-norm solutions of least square problems with a SVD of the coefficient matrix as subroutine llsq_svd_solve(), but uses a SVD algorithm based on the one-sided Rhala-Barlow bidiagonalization algorithm for this purpose. This SVD-based linear least squares driver will be faster than llsq_svd_solve() for very large coefficient matrix, but is restricted to the case that coefficient matrix has more rows than columns.

The generic subroutines llsq_svd_solve() and llsq_svd_solve2() now offer the possibility to compute singular values by the dqds algorithm at the user option.

Updates of the code of many routines for correcting deficiencies in new versions (from version 21.4.0) of the Intel ifort fortran compiler related to the evaluation of IF clauses in OpenMP directives.

module Mul_Stat_Procedures:

The codes of the generic subroutine phase_scramble_cor() have been modified to deal with a bug in the new Fortran ifx Intel compiler (version 23.1) related to the allocate statement when we reallocate an allocatable array with a new (larger) size.

Improved documentation of the subroutines included in the module.

subroutine comp_ortho_rot_eof() has been slightly modified to deliver more accurate results and the checking of the input arguments has been improved.

examples directory:

Inclusion of new example programs and update of most older example programs to illustate the new features added in Statpack2.3 or to better illustrate the use of other existing Statpack subroutines.

tests directory:

Inclusion of new test programs and update of older test programs to test the new features added in Statpack2.3 or to improve the testing of other Statpack subroutines.

Release: 2.2

Date: Apr 27, 2022

This version is a major release with large accuracy and speed improvements in linear algebra subroutines and a large set of new subroutines, including very fast subroutines for computing a variety of randomized (and deterministic) matrix factorizations: full or partial QR decompositions, partial SVD decomposition, full or partial QB and QLP decompositions, column Interpolative (ID), two-sided Interpolative (tsID) and CUR decompositions. Fast randomized linear least squares solvers are now also included in Statpack. New deterministic SVD subroutines, which are faster and more accurate, are also provided in this release (e.g., see subroutines svd_cmp4(), svd_cmp5() and svd_cmp6() in module SVD_Procedures). Finally, state-of-the-art routines for computing Euclidean/Frobenius norms of vectors and matrices with due regard to under/overflows and routines for performing orthogonal rotations of a partial Principal Component Analysis (PCA) model (see subroutines comp_ortho_rot_eof(), comp_filt_rot_pc(), comp_lfc_rot_pc() and comp_smooth_rot_pc() in module Mul_Stat_Procedures) are also included.

Updated June 29, 2022 to correct for:

deficiencies in new versions (from version 21.4.0) of the Intel ifort fortran compiler related to the evaluation of IF clauses in OpenMP directives. The updates concern only small changes in the SVD_Procedures and LLSQ_Procedures modules;

bug corrections in the Random and EIG_Procedures modules. The bug corrections concern the gen_random_sym_mat() subroutine in the Random module (its optional argument eigvec) and the symtrid_cmp2() subroutine in the EIG_Procedures module when BLAS support is activated (the first dimension of the vmat array argument in the calls to the gemm() BLAS routine in symtrid_cmp2() was not correctly specified);

the specifications of the default values of some parameters in the Select_Parameters module (e.g. the max_num_threads_symtrid_cmp, max_num_threads_symtrid_cmp2, max_num_threads_bd_cmp and max_num_threads_bd_cmp2 parameters have now a default value of 128).

Changes and updates from version 2.1 of Statpack:

module BLAS_Interfaces:

Generic interfaces have been added for the following BLAS subroutines: Xsymv(), Xspmv() where X can be s, d. The new generic interfaces have the form: symv, spmv

module Select_Parameters:

Integration of new public parameters (max_num_threads_symtrid_cmp, max_num_threads_symtrid_cmp2, max_num_threads_bd_cmp, max_num_threads_bd_cmp2), which can be used to limit the number of cores in the OpenMP path of subroutines symtrid_cmp, symtrid_cmp2 (in module EIG_Procedures) and of subroutines bd_cmp, bd_cmp2 and bd_cmp3 (in module SVD_Procedures). This is useful on some systems with a large number of cores per node, as on these systems, these subroutines do not always scale well when the number of cores is large.

Optimization of the block parameters blksz_qr, blksz_eig, blksz2_eig, blksz2_svd, max_francis_steps_svd and max_francis_steps_eig in order to deliver maximum performance on most systems.

Improved documentation in the module.

module Real_constants:

Inclusion of new real constants for use in the new subroutines included in this release.

module Char_constants:

Inclusion of new error messages for the new subroutines in modules QR_Procedures, Random, SVD_Procedures and Mul_Stat_Procedures.

module Num_Constants:

New numerical constants (e.g., safmin, safmax, machsmlnum, machbignum, rtmin, rtmax, tbig, sbig, ubig, tsml, ssml and usml) used for computing Euclidean/Frobenius norms of matrices or vectors, sum of squares and in scaling subroutines are now included in the module Num_Constants. See the paper by E. Anderson (2018, https://doi.org/10.1145/3061665) for more details about the use of these new numerical constants for computing norms. Moreover, with the definition of these new numerical constants, all the calls to the inquiry function lamch() in Statpack have been replaced by a module approach in which the required numerical constants in each Statpack routine are imported via a single USE Num_Constants instruction.

The values of the numerical constants machulp and macheps have been interchanged to be consistent with the definitions in the paper by E. Anderson (2018) and the code of the subroutine comp_mach() has been modified accordingly.

A new cpp processor macro is defined, _ISNAN, which allows the use of the intrinsic function isnan(), when it is supported by the compiler (e.g., Gnu and Intel compilers) and when the intrinsic function ieee_is_nan() is not used (e.g., when the cpp processor macro _F2003 is not activated at compilation) for detecting NaNs in the generic is_nan() replace_nan() routines included in module Num_Constants.

The documentation of the lamch(), mach(), is_nan() and replace_nan() routines has been improved.

modules Utilities:

All the routines in the module, which compute Euclidean/Frobenius norms of vectors/matrices and scaled sum of squares have been updated and extended (e.g., generic routines abse(), lassq(), norm(), lassqe(), norme(), lassq2e() and norm2e() ). Now, in Statpack2.2:

the generic function abse() computes norms with a (one-pass) compensated summation algorithm;

the generic routines norm() and lassq() compute, respectively, norms and sums of squares with a (one-pass) Blue’s algorithm;

the generic routines norme() and lassqe() compute norms and sums of squares with a (one-pass) Blue’s algorithm and also use compensated summation for improved accuracy;

the generic routines norm2e() and lassq2e() compute norms and sums of squares with a (two-pass) LAPACK3E’s algorithm and also use compensated summation for improved accuracy.

For more details on Blue, LAPACK3E and compensated summation algorithms, see the papers by E. Anderson (2018, https://doi.org/10.1145/3061665) and Hanson and Hopkins (2018, https://doi.org/10.1145/3134441).

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

Improved BLAS support in the generic subroutine matmul2() (e.g., use of the BLAS2 Xgemv() subroutine instead of the BLAS1 Xdot() function when one of the arguments to matmul2() is a real or complex vector and the cpp processor macro _BLAS is activated at compilation).

An improved scaling algorithm (with respect to NaNs) is now used in the generic subroutine lascl().

The subroutine pythag(), which computes sqrt( a * a + b * b ), has been updated. The computations are now performed in extended precision (e.g., extd kind type) rather than at the standard precision (e.g., stnd kind type) defined in the module Select_Parameters.

A new subroutine pythage() has also been included, which computes also sqrt( a * a + b * b ), but without destructive underflow or overflow using the Blue’s scaling method. See the paper by E. Anderson (2018, https://doi.org/10.1145/3061665) for more details.

modules Prob_Procedures, Giv_Procedures, Lin_Procedures and Stat_Procedures:

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

modules Hous_Procedures:

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

The codes in the generic subroutines hous1() and hous2() have been updated. The new versions use improved algorithms for computing the Euclidean norm of vectors as implemented in the new generic routine norm2e() (e.g., LAPACK3E’s two-pass algorithm with compensated summation), now included in the module Utilities, see details above. The new algorithms use compensated summations for improving accuracy and better scaling in order to avoid overflows or underflows. See the papers by E. Anderson (2018, https://doi.org/10.1145/3061665) and Hanson and Hopkins (2018, https://doi.org/10.1145/3134441) for more details.

The codes in the generic subroutines h1() and h2() have been updated. The new versions use improved algorithms for computing the Euclidean norm of vectors. The new algorithms use compensated summations for improving accuracy, see the paper by Hanson and Hopkins (2018, https://doi.org/10.1145/3134441) for more details.

Improved BLAS and OpenMP supports in the generic subroutines apply_hous1(), apply_hous2(), h1(), h2(), apply_h1() and apply_h2() when applying Housholder reflections to matrices or vectors. These changes deliver large speed improvements for these subroutines when BLAS and OpenMP supports is activated.

module QR_Procedures:

Correction of a bug in lq_cmp() subroutine when use_qr=false and OpenMP are used and when the number of columns of the matrix is less than the block size blksz_qr specified in module Select_Parameters. In such a case, the serial algorithm is now used in the Statpack2.2 version of lq_cmp().

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

Speed improvements in qrfac() and qr_cmp2() subroutines for computing the minimum-norm solutions of least squares problems for rank deficient matrices. Moreover, the codes in these subroutines now use the generic routine norm2e() (e.g., LAPACK3E’s two-pass algorithm with compensated summation), included in the module Utilities, for computing Euclidean norm of vectors for improved accuracy.

Improved BLAS support (e.g., systematic use of Xdot() and Xaxpy() BLAS1 subroutines) in subroutines lq_cmp(), ortho_gen_lq(), apply_q_lq(), qr_cmp(), ortho_gen_qr() and apply_q_qr() included in the module.

Inclusion of 2 new subroutines partial_qr_cmp() and partial_qr_cmp_fixed_precision(), which perform a partial QR decomposition with column pivoting of a matrix based on a deterministic BLAS2 algorithm. These two new subroutines can be used to solve the fixed-rank and fixed-precision problems, respectively, by deterministic methods. Similar, but faster, routines based on randomized methods are also now provided in module Random, see below for details.

Improved documentation of the subroutines in the module.

module Random:

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

Improved documentation of subroutines random_qr_cmp(), ortho_gen_random_qr(), gen_random_sym_mat() and gen_random_mat()

A test for checking that all elements of the input vector argument singval(:) are positive is now included in subroutine gen_random_mat().

Inclusion of the new subroutines partial_rqr_cmp(), partial_rqr_cmp2(), partial_rtqr_cmp(), partial_rqr_cmp_fixed_precision(), rqb_cmp(), rqb_cmp_fixed_precision(), id_cmp(), ts_id_cmp() and cur_cmp(), which perform a variety of randomized (and deterministic) matrix decompositions, including randomized full or partial QR decompositions, randomized or deterministic QB decompositions, randomized or deterministic column Interpolative Decompositions (ID), randomized or deterministic two-sided Interpolative Decompositions (tsID) and randomized or deterministic CUR decompositions.

Improved BLAS support in the module.

module SVD_Procedures:

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

Improved BLAS support (e.g., systematic use of the BLAS subroutines Xdot(), Xaxpy() and Xgemv() if the cpp processor macro _BLAS is defined at compilation) in the (generic) subroutines bd_cmp(), bd_cmp2(), bd_cmp3(), bd_inviter2() and bd_deflate2() (and also in the private subroutine ortho_vec()).

Significant speed improvement in the serial version of the bd_svd2() subroutine for computing singular vectors.

Importation of the parameter blksz_qr from module Select_Parameters for use in subroutines bd_cmp2() and bd_cmp3(). as a blocking factor for applying Householder transformations.

Importation of the new parameters max_num_threads_bd_cmp and max_num_threads_bd_cmp2 from module Select_Parameters for use in subroutines bd_cmp(), bd_cmp2() and bd_cmp3() for eventually limiting the number of cores used in the OpenMP path of these subroutines.

Improved argument checking in subroutines bd_svd() and bd_svd2().

The subroutines bd_singval() and bd_singval2(), which compute singular values of bidiagonal matrices, now include a check for NaNs in the computed singular values.

A new subroutine for computing the largest singular value of a bidiagonal matrix by a bisection method, bd_max_singval(), is now included in the module.

Inclusion of a new optional logical argument bisect in subroutines bd_svd(), bd_svd2(), svd_cmp(), svd_cmp2(), svd_cmp3(), svd_cmp4(), rsvd_cmp(), rsvd_cmp_fixed_precision(), reig_pos_cmp(), rqr_svd_cmp(), rqr_svd_cmp_fixed_precision(), rqlp_svd_cmp(), rqlp_svd_cmp2(), product_svd_cmp(), ginv() and comp_ginv(). When bisect=true and perfect_shift=true, the singular values in these subroutines are computed to higher precision using an accurate bisection algorithm (instead of the faster, but less accurate, Pal-Walker-Kahan algorithm used by default). This gives higher accuracy in the final (partial) SVD results in all these subroutines at the expense of a slight increase in the execution time.

Accuracy improvements in rsvd_cmp_fixed_precision() subroutine, which performs a randomized partial SVD to solve the fixed-precision problem. The new version is also more robust with respect to possible overflows.

Two new SVD drivers have been added, svd_cmp5() and svd_cmp6(), which give more accurate results than previous SVD drivers included in Statpack. The svd_cmp4() driver has also been updated for improved accuracy.

Inclusion of the new subroutines rqr_svd_cmp(), rqr_svd_cmp_fixed_precision(), rqlp_svd_cmp(), rqlp_svd_cmp2() and rqlp_svd_cmp_fixed_precision(), which perform randomized or deterministic partial SVD to solve the fixed-rank or fixed-precision problems using QR-SVD or QLP-SVD algorithms. The rqlp_svd_cmp(), rqlp_svd_cmp2() and rqlp_svd_cmp_fixed_precision() subroutines are more accurate (and as fast for rqlp_svd_cmp2()) than the rsvd_cmp() and rsvd_cmp_fixed_precision() subroutines already included in the module, which perform the same task.

Inclusion of the new subroutines qlp_cmp(), qlp_cmp2() and rqlp_cmp(), which compute randomized or deterministic (partial) QLP decompositions.

The (generic) subroutines bd_svd() and bd_svd2(), which compute singular values and vectors of bidiagonal matrices, now use a faster BLAS3 algorithm when updating the singular vectors during the bidiagonal QR iterations if the number of elements in the singular vectors is sufficiently large. The shift from the standard BLAS1 algorithm to the new BLAS3 algorithm is controlled by the new module parameter thr_bd_svd_blas3, specified at the beginning of the module. These improvements in subroutines bd_svd() and bd_svd2() lead also to speed improvements in the generic routines svd_cmp(), svd_cmp2(), svd_cmp3(), svd_cmp4(), ginv() and comp_ginv() (and also marginally in subroutines rsvd_cmp(), rsvd_cmp_fixed_precision() and other randomized partial SVD subroutines).

Speed improvements and better BLAS support in routines ginv() and comp_ginv().

Large speed improvements in subroutine bd_inviter() when large clusters of singular values are present. A fast QR factorization is now used to reorthogonalize the singular vectors in this case instead of a slower Gram-Schmidt algorithm.

Inclusion of a new optional logical argument tol_reortho in generic subroutines bd_inviter2() and bd_deflate2(). This argument controls the reorthogonalization of the left singular vectors in the versions of these generic subroutines, which assume inputs from the Rhala-Barlow one-sided bidiagonalization algorithm.

Improved documentation of the routines in the module (especially the generic subroutines bd_svd(), bd_svd2(), svd_cmp(), svd_cmp2(), svd_cmp3(), svd_cmp4() and bd_inviter2() ).

module EIG_Procedures:

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

Improved BLAS support (e.g., systematic use of the BLAS subroutines Xdot(), Xaxpy(), Xgemv(), Xsymv(), Xspmv(), Xspr2() if the cpp processor macro _BLAS is defined at compilation) in the (generic) subroutines symtrid_cmp(), symtrid_cmp2(), ortho_gen_symtrid(), apply_q_symtrid(), trid_inviter() and also in the private subroutine ortho_vec().

The generic routine abse() (e.g., one-pass algorithm with compensated summation), included in module Utilities, is now used for computing Euclidean norm of vectors for improved accuracy in the generic subroutine symtrid_cmp().

The generic routine norm2e() (e.g., LAPACK3E’s two-pass algorithm with compensated summation), included in the module Utilities, is now used for computing Euclidean norm of vectors for improved accuracy in the (generic) subroutines reig_cmp() and trid_inviter().

The scaling algorithm has been improved in the generic subroutines eig_cmp(), eig_cmp2(), eig_cmp3(), eigvalues(), eigval_cmp(), eigval_cmp2(), eigval_cmp3(), select_eigval_cmp(), select_eigval_cmp2(), and select_eigval_cmp3().

The subroutines symtrid_qri2() and symtrid_qri3(), which compute eigenvalues and eigenvectors of symmetric tridiagonal matrices, now use a faster BLAS3 algorithm when updating the eigenvectors during the QR iterations if the number of elements in the eigenvectors is sufficiently large. The shift from the standard BLAS1 algorithm to the new BLAS3 algorithm is controlled by the new module parameter thr_trid_eig_blas3, specified at the beginning of the module. These improvements in subroutines symtrid_qri2() and symtrid_qri3() lead also to speed improvements in the generic subroutines eig_cmp2() and eig_cmp3().

Improved documentation of the routines in the module and more specifically in the generic subroutines symtrid_qri(), symtrid_qri2() and symtrid_qri3().

Importation of the parameters blksz_qr from module Select_Parameters for use in subroutines symtrid_cmp2(), apply_q_symtrid(), ortho_gen_symtrid(), trid_deflate() and trid_inviter() as a blocking factor for applying Householder transformations.

Importation of the parameters max_num_threads_symtrid_cmp and max_num_threads_symtrid_cmp2 from module Select_Parameters for use in subroutines symtrid_cmp() and symtrid_cmp2() for eventually limiting the number of cores used in the OpenMP path of these subroutines.

Large speed improvements in subroutine trid_inviter() when large clusters of eigenvalues are present. A fast QR factorization is now used to reorthogonalize the eigenvectors in this case instead of a slower Gram-Schmidt algorithm.

Improved documentation in the module.

module LLSQ_Procedures:

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

Improved BLAS support and speed improvements in the generic subroutines solve_llsq(), llsq_qr_solve(), llsq_qr_solve2() and qr_solve2() when computing minimun 2-norm solutions of least square problems.

The generic routine norm2e() (e.g., LAPACK3E’s two-pass algorithm with compensated summation), included in the module Utilities, is now used for computing Euclidean norm of vectors for improved accuracy in the subroutines included in the module.

Inclusion of the new generic subroutine rqb_solve(), which solves overdetermined or underdetermined real linear systems, using a randomized QR or complete orthogonal factorization as computed by subroutine rqb_cmp() in module Random.

Inclusion of a new optional logical argument bisect in subroutine llsq_svd_solve(). When bisect=true and perfect_shift=true, the singular values in this subroutine are computed to higher precision using an accurate bisection algorithm (instead of the faster, but less accurate, Pal-Walker-Kahan algorithm used by default). This gives higher accuracy in the final solution of the least square problem at the expense of a slight increase in the time of execution.

Improved documentation of the subroutines in the module.

module Mul_Stat_Procedures:

Inclusion of 4 new subroutines comp_ortho_rot_eof(), comp_smooth_rot_pc(), comp_lfc_rot_pc() and comp_filt_rot_pc(), which perform orthogonal rotations of Principal Components from a Principal Component Analysis according to a large set of different criteria.

All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.

Improved documentation of the subroutines in the module.

modules Sort_Procedures, Prob_Procedures, Lin_Procedures, EIG_Procedures and FFT_Procedures, QR_Procedures, SVD_Procedures, LLSQ_Procedures, Mul_Stat_Procedures and Random:

The calls to the deprecated OpenMP function omp_get_nested() have been replaced by calls to the OpenMP function omp_get_max_active_levels().

examples directory:

Inclusion of new example programs and update of many older example programs to illustate the new features added in Statpack2.2 or to better illustrate the use of other existing Statpack subroutines.

tests directory:

Inclusion of new test programs and update of many older test programs to test the new features added in Statpack2.2 or to improve the testing of other Statpack subroutines.

Release: 2.1

Date: Feb 12, 2021

This version provides bug corrections, large speed improvements for linear algebra subroutines and new routines based on randomized algorithms for computing restricted singular value and eigenvalue decompositions.

Updated June 07, 2021 to correct for deficiencies in new versions of the PGI fortran compiler related to the managing of (internal) memory. The updates concern only small changes in the SVD_Procedures and LLSQ_Procedures modules and the STATPACK testing programs in the tests directory.

Updated Dec 20, 2021 for bug corrections. The bug corrections concern only a small change for the lq_cmp() subroutine in the QR_Procedures module when used in a multi-threaded environment.

Changes and updates from version 2.0 of Statpack:

module BLAS_Interfaces:

Generic interfaces have been added for the following BLAS subroutines: Xsyr2(), Xher2(), Xspr2(), Xhpr2(), Xsymm(), Xhemm(), Xtrsm(), Xtrmm(), Xsyrk(), Xherk(), Xsyr2k(), Xher2k() where X can be s, d, c and z. The new generic interfaces have the form: syr2, her2, spr2, hpr2, symm, hemm, trsm, trmm, syrk, herk, syr2k, her2k

All COMPLEX*16 declarations in the module have been replaced by DOUBLE COMPLEX declarations.

module Char_constants:

Inclusion of new error messages for modules SVD_Procedures and QR_Procedures.

module Select_Parameters:

Integration of new public parameters, which specify the default block sizes for the different blocked algorithms in modules Utilities, FFT_Procedures, Lin_Procedures, QR_Procedures, Eig_Procedures and SVD_Procedures.

module Sort_Procedures:

Improved BLAS support and speed improvement for reorder() generic subroutine. A faster and in-place algorithm is now used.

Improved documentation of the subroutines in the module.

module Giv_Procedures:

Large speed improvement for givens_mat_left(), givens_vec_mat_left(), fastgivens_mat_left() and fastgivens_vec_mat_left() subroutines. A faster transpose algorithm is now used by default.

Improved documentation of the subroutines in the module.

module Lin_Procedures:

Improved BLAS support for the following subroutines: lu_solve(), solve_lin(), lin_lu_solve(), triang_solve(), comp_uut_ltl() and comp_triang_inv().

Improved documentation of the subroutines in the module.

module QR_Procedures:

Improved BLAS support and speed improvements for the following subroutines: ortho_gen_lq(), apply_q_lq(), qr_cmp(), ortho_gen_qr() and apply_q_qr(). New blocked algorithms (with BLAS3 support) have been implemented in subroutines qr_cmp(), ortho_gen_qr() and apply_q_qr().

Speed improvements and new formulations in qrfac() and qr_cmp2() subroutines.

Improved documentation of the subroutines in the module.

module Random:

Improved BLAS support for the following subroutines: gen_random_sym_mat() and gen_random_mat().

module SVD_Procedures:

Large speed improvements in generic bd_cmp2() subroutine, which performs bidiagonalization of a rectangular matrix using the one-sided Ralha-Barlow bidiagonal reduction algorithm. A general blocked algorithm is now used.

Inclusion of a new subroutine, bd_cmp3(), which performs a partial bidiagonalization of a rectangular matrix using the one-sided Ralha-Barlow bidiagonal reduction algorithm. The subroutine computes the bidiagonal and the right orthogonal matrices only. A general blocked algorithm is also used in bd_cmp3().

Correction of a small bug in generic bd_cmp2() subroutine when the first column of the matrix is zero.

A specific version of the generic bd_cmp() subroutine, which does a preliminary QR or LQ factorization before bidiagonalization, has been added in the module (e.g., bd_cmp_c() subroutine).

Large speed improvements in the generic svd_cmp(), svd_cmp2(), svd_cmp3() and svd_cmp4() subroutines when the number of rows and columns of the input matrix differ significantly and, in all the cases, for svd_cmp3() and svd_cmp4() subroutines because of the use of the new bd_cmp2() subroutine. All these SVD subroutines benefit also from the speed improvements done in module QR_Procedures.

Large speed improvements in the subroutines apply_q_bd(), apply_p_bd(), ortho_gen_bd(), ortho_gen_bd2(), ortho_gen_q_bd() and ortho_gen_p_bd(), which also benefit from the speed improvements done in module QR_Procedures.

New forms of calls, which allow pre-processing of the input matrix. by a preliminary QR or LQ factorization, and large speed improvements in the (now generic) select_singval_cmp(), select_singval_cmp2(), select_singval_cmp3() and select_singval_cmp4() subroutines.

Large speed improvements in the generic bd_inviter2() and bd_deflate2() subroutines and new forms of calls, which can be used with the new forms of calls of select_singval_cmp(), select_singval_cmp2() and select_singval_cmp3() and select_singval_cmp4() subroutines to obtain the (partial) SVD of a matrix after a pre-processing by a preliminary QR or LQ decomposition. This results in large speed improvement when the number of rows and columns of the input matrix differ significantly.

Inclusion of three new subroutines rsvd_cmp(), rsvd_cmp_fixed_precision() and reig_pos_cmp(), which compute, respectively, approximations of partial SVD of rectangular matrices and partial EVD of symmetric positive semi-definite matrices using randomized algorithms.

Improved documentation of the subroutines in the module.

module LLSQ_Procedures:

Improved BLAS support for the following subroutines: solve_llsq(), llsq_qr_solve() and llsq_qr_solve2().

New and faster formulation for computing mimimun 2-norm solutions in subroutines solve_llsq(), llsq_qr_solve(), llsq_qr_solve2() and qr_solve2() subroutines.

All subroutines also benefit from the speed improvements in modules QR_Procedures and SVD_Procedures.

Improved documentation of the subroutines in the module.

module EIG_Procedures:

Inclusion of a new subroutine, symtrid_cmp2(), which reduces a real n-by-n symmetric matrix cross-product using the one-sided Ralha tridiagonal reduction algorithm and without forming this cross-product. A general blocked version of the algorithm is used.

Correction of a bug in the generic apply_q_symtrid() subroutine with the optional logical argument upper and when the matrix is in packed form (upper was used in the apply_q_symtrid_c() subroutine, but its presence in the call of the subroutine was not tested). In this version of the library, the old apply_q_symtrid_c() subroutine has been replaced by the subroutines apply_q_symtrid_c() and apply_q_symtrid_d(), which offer the same functionalities and the bug is removed.

Improved BLAS support in the generic symtrid_cmp() subroutine.

Improved BLAS support and and large speed improvements in the generic subroutines ortho_gen_symtrid() and apply_q_symtrid() when the argument upper is not used. A more efficient blocked algorithm is now used (with BLAS3 support).

Large speed improvements in the generic eig_cmp(), eig_cmp2(), eig_cmp3(), trid_deflate() and trid_inviter() subroutines due to the updates of ortho_gen_symtrid() and apply_q_symtrid() subroutines when the argument upper is not used.

Inclusion of a new subroutine, reig_cmp(), which computes approximations of partial EVD of symmetric matrices using randomized algorithms.

Improved documentation of the subroutines in the module.

modules Time_Procedures, Time_Series_Procedures, Stat_Procedures, Mul_Stat_Procedures, Prob_Procedures, Random, Sort_Procedures, Giv_Procedures, Hous_Procedures:

Improved documentation of the subroutines in the modules.

examples directory:

Inclusion of new example programs illustrating the new features described above: ex1_bd_cmp3.F90, ex1_symtrid_cmp2.F90, ex2_symtrid_cmp2.F90, ex1_rsvd_cmp.F90, ex1_rsvd_cmp_fixed_precision.F90, ex1_reig_cmp.F90, ex1_reig_pos_cmp.F90, ex1_random_svd.F90, ex1_random_eig.F90, ex1_random_eig_pos.F90 ex1_random_svd_with_blas.F90, ex1_random_eig_with_blas.F90, ex1_random_eig_pos_with_blas.F90 ex1_bd_inviter2_bis.F90, ex1_select_singval_cmp3_bis.F90, ex1_select_singval_cmp4_bis.F90 ex2_select_singval_cmp.F90, ex2_select_singval_cmp2.F90, ex2_select_singval_cmp3.F90, ex2_select_singval_cmp3_bis.F90, ex2_select_singval_cmp4.F90, ex2_select_singval_cmp4_bis.F90.

Update of the following example programs illustrating the new features described above: ex1_bd_deflate2.F90, ex1_bd_deflate2_bis.F90, ex1_bd_inviter2.F90, ex1_select_singval_cmp.F90, ex1_select_singval_cmp2.F90, ex1_select_singval_cmp3.F90, ex1_select_singval_cmp4.F90.

Update of the following example programs: ex1_eigval_cmp.F90, ex1_eigval_cmp2.F90, ex1_eigval_cmp3.F90 ex1_select_eigval_cmp.F90, ex1_select_eigval_cmp2.F90, ex1_select_eigval_cmp3.F90 ex2_eigval_cmp.F90, ex2_eigval_cmp2.F90, ex2_eigval_cmp3.F90 ex2_select_eigval_cmp.F90, ex2_select_eigval_cmp2.F90, ex2_select_eigval_cmp3.F90.

tests directory:

Inclusion of new test programs and update of some older test programs to test the new features added in statpack2.1 or to improve the testing of other statpack subroutines.

Release: 2.0

Date: Dec 3, 2018

First open source release

Updated Sep 23, 2020 for bug corrections

Updated Dec 20, 2021 for bug corrections. The bug corrections concern only a small change for the lq_cmp() subroutine in the QR_Procedures module when used in a multi-threaded environment.

Release: 1.1

Date: Jan 21, 2004

Release: 1.0

Date: May 7, 2003