Release History¶
Release: 2.2 Date: Apr 27, 2022 This version is a major release with large accuracy and speed improvements in linear algebra subroutines and a large set of new subroutines, including very fast subroutines for computing a variety of randomized (and deterministic) matrix factorizations: full or partial QR decompositions, partial SVD decomposition, full or partial QB and QLP decompositions, column Interpolative (ID), two-sided Interpolative (tsID) and CUR decompositions. Fast randomized linear least squares solvers are now also included in Statpack. New deterministic SVD subroutines, which are faster and more accurate, are also provided in this release (e.g., see subroutines svd_cmp4(), svd_cmp5() and svd_cmp6() in module SVD_Procedures). Finally, state-of-the-art routines for computing Euclidean/Frobenius norms of vectors and matrices with due regard to under/overflows and routines for performing orthogonal rotations of a partial Principal Component Analysis (PCA) model (see subroutines comp_ortho_rot_eof(), comp_filt_rot_pc(), comp_lfc_rot_pc() and comp_smooth_rot_pc() in module Mul_Stat_Procedures) are also included.
Updated June 29, 2022 to correct for:
(1) deficiencies in new versions (from version 21.4.0) of the Intel ifort fortran compiler related to the evaluation of IF clauses in OpenMP directives. The updates concern only small changes in the SVD_Procedures and LLSQ_Procedures modules;
(2) bug corrections in the Random and EIG_Procedures modules. The bug corrections concern the gen_random_sym_mat() subroutine in the Random module (its optional argument eigvec) and the symtrid_cmp2() subroutine in the EIG_Procedures module when BLAS support is activated (the first dimension of the vmat array argument in the calls to the gemm() BLAS routine in symtrid_cmp2() was not correctly specified);
(3) the specifications of the default values of some parameters in the Select_Parameters module (e.g. the max_num_threads_symtrid_cmp, max_num_threads_symtrid_cmp2, max_num_threads_bd_cmp and max_num_threads_bd_cmp2 parameters have now a default value of 128).
Changes and updates from version 2.1 of Statpack:
module BLAS_Interfaces:
- Generic interfaces have been added for the following BLAS subroutines: Xsymv(), Xspmv() where X can be s, d. The new generic interfaces have the form: symv, spmv
module Select_Parameters:
- Integration of new public parameters (max_num_threads_symtrid_cmp, max_num_threads_symtrid_cmp2, max_num_threads_bd_cmp, max_num_threads_bd_cmp2), which can be used to limit the number of cores in the OpenMP path of subroutines symtrid_cmp, symtrid_cmp2 (in module EIG_Procedures) and of subroutines bd_cmp, bd_cmp2 and bd_cmp3 (in module SVD_Procedures). This is useful on some systems with a large number of cores per node, as on these systems, these subroutines do not always scale well when the number of cores is large.
- Optimization of the block parameters blksz_qr, blksz_eig, blksz2_eig, blksz2_svd, max_francis_steps_svd and max_francis_steps_eig in order to deliver maximum performance on most systems.
- Improved documentation in the module.
module Real_constants:
- Inclusion of new real constants for use in the new subroutines included in this release.
module Char_constants:
- Inclusion of new error messages for the new subroutines in modules QR_Procedures, Random, SVD_Procedures and Mul_Stat_Procedures.
module Num_Constants:
- New numerical constants (e.g., safmin, safmax, machsmlnum, machbignum, rtmin, rtmax, tbig, sbig, ubig, tsml, ssml and usml) used for computing Euclidean/Frobenius norms of matrices or vectors, sum of squares and in scaling subroutines are now included in the module Num_Constants. See the paper by E. Anderson (2018, https://doi.org/10.1145/3061665) for more details about the use of these new numerical constants for computing norms. Moreover, with the definition of these new numerical constants, all the calls to the inquiry function lamch() in Statpack have been replaced by a module approach in which the required numerical constants in each Statpack routine are imported via a single USE Num_Constants instruction.
- The values of the numerical constants machulp and macheps have been interchanged to be consistent with the definitions in the paper by E. Anderson (2018) and the code of the subroutine comp_mach() has been modified accordingly.
- A new cpp processor macro is defined, _ISNAN, which allows the use of the intrinsic function isnan(), when it is supported by the compiler (e.g., Gnu and Intel compilers) and when the intrinsic function ieee_is_nan() is not used (e.g., when the cpp processor macro _F2003 is not activated at compilation) for detecting NaNs in the generic is_nan() replace_nan() routines included in module Num_Constants.
- The documentation of the lamch(), mach(), is_nan() and replace_nan() routines has been improved.
modules Utilities:
All the routines in the module, which compute Euclidean/Frobenius norms of vectors/matrices and scaled sum of squares have been updated and extended (e.g., generic routines abse(), lassq(), norm(), lassqe(), norme(), lassq2e() and norm2e() ). Now, in Statpack2.2:
- the generic function abse() computes norms with a (one-pass) compensated summation algorithm;
- the generic routines norm() and lassq() compute, respectively, norms and sums of squares with a (one-pass) Blue’s algorithm;
- the generic routines norme() and lassqe() compute norms and sums of squares with a (one-pass) Blue’s algorithm and also use compensated summation for improved accuracy;
- the generic routines norm2e() and lassq2e() compute norms and sums of squares with a (two-pass) LAPACK3E’s algorithm and also use compensated summation for improved accuracy.
For more details on Blue, LAPACK3E and compensated summation algorithms, see the papers by E. Anderson (2018, https://doi.org/10.1145/3061665) and Hanson and Hopkins (2018, https://doi.org/10.1145/3134441).
All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
Improved BLAS support in the generic subroutine matmul2() (e.g., use of the BLAS2 Xgemv() subroutine instead of the BLAS1 Xdot() function when one of the arguments to matmul2() is a real or complex vector and the cpp processor macro _BLAS is activated at compilation).
An improved scaling algorithm (with respect to NaNs) is now used in the generic subroutine lascl().
The subroutine pythag(), which computes sqrt( a * a + b * b ), has been updated. The computations are now performed in extended precision (e.g., extd kind type) rather than at the standard precision (e.g., stnd kind type) defined in the module Select_Parameters.
A new subroutine pythage() has also been included, which computes also sqrt( a * a + b * b ), but without destructive underflow or overflow using the Blue’s scaling method. See the paper by E. Anderson (2018, https://doi.org/10.1145/3061665) for more details.
modules Prob_Procedures, Giv_Procedures, Lin_Procedures and Stat_Procedures:
- All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
modules Hous_Procedures:
- All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
- The codes in the generic subroutines hous1() and hous2() have been updated. The new versions use improved algorithms for computing the Euclidean norm of vectors as implemented in the new generic routine norm2e() (e.g., LAPACK3E’s two-pass algorithm with compensated summation), now included in the module Utilities, see details above. The new algorithms use compensated summations for improving accuracy and better scaling in order to avoid overflows or underflows. See the papers by E. Anderson (2018, https://doi.org/10.1145/3061665) and Hanson and Hopkins (2018, https://doi.org/10.1145/3134441) for more details.
- The codes in the generic subroutines h1() and h2() have been updated. The new versions use improved algorithms for computing the Euclidean norm of vectors. The new algorithms use compensated summations for improving accuracy, see the paper by Hanson and Hopkins (2018, https://doi.org/10.1145/3134441) for more details.
- Improved BLAS and OpenMP supports in the generic subroutines apply_hous1(), apply_hous2(), h1(), h2(), apply_h1() and apply_h2() when applying Housholder reflections to matrices or vectors. These changes deliver large speed improvements for these subroutines when BLAS and OpenMP supports is activated.
module QR_Procedures:
- Correction of a bug in lq_cmp() subroutine when use_qr=false and OpenMP are used and when the number of columns of the matrix is less than the block size blksz_qr specified in module Select_Parameters. In such a case, the serial algorithm is now used in the Statpack2.2 version of lq_cmp().
- All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
- Speed improvements in qrfac() and qr_cmp2() subroutines for computing the minimum-norm solutions of least squares problems for rank deficient matrices. Moreover, the codes in these subroutines now use the generic routine norm2e() (e.g., LAPACK3E’s two-pass algorithm with compensated summation), included in the module Utilities, for computing Euclidean norm of vectors for improved accuracy.
- Improved BLAS support (e.g., systematic use of Xdot() and Xaxpy() BLAS1 subroutines) in subroutines lq_cmp(), ortho_gen_lq(), apply_q_lq(), qr_cmp(), ortho_gen_qr() and apply_q_qr() included in the module.
- Inclusion of 2 new subroutines partial_qr_cmp() and partial_qr_cmp_fixed_precision(), which perform a partial QR decomposition with column pivoting of a matrix based on a deterministic BLAS2 algorithm. These two new subroutines can be used to solve the fixed-rank and fixed-precision problems, respectively, by deterministic methods. Similar, but faster, routines based on randomized methods are also now provided in module Random, see below for details.
- Improved documentation of the subroutines in the module.
module Random:
- All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
- Improved documentation of subroutines random_qr_cmp(), ortho_gen_random_qr(), gen_random_sym_mat() and gen_random_mat()
- A test for checking that all elements of the input vector argument singval(:) are positive is now included in subroutine gen_random_mat().
- Inclusion of the new subroutines partial_rqr_cmp(), partial_rqr_cmp2(), partial_rtqr_cmp(), partial_rqr_cmp_fixed_precision(), rqb_cmp(), rqb_cmp_fixed_precision(), id_cmp(), ts_id_cmp() and cur_cmp(), which perform a variety of randomized (and deterministic) matrix decompositions, including randomized full or partial QR decompositions, randomized or deterministic QB decompositions, randomized or deterministic column Interpolative Decompositions (ID), randomized or deterministic two-sided Interpolative Decompositions (tsID) and randomized or deterministic CUR decompositions.
- Improved BLAS support in the module.
module SVD_Procedures:
- All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
- Improved BLAS support (e.g., systematic use of the BLAS subroutines Xdot(), Xaxpy() and Xgemv() if the cpp processor macro _BLAS is defined at compilation) in the (generic) subroutines bd_cmp(), bd_cmp2(), bd_cmp3(), bd_inviter2() and bd_deflate2() (and also in the private subroutine ortho_vec()).
- Significant speed improvement in the serial version of the bd_svd2() subroutine for computing singular vectors.
- Importation of the parameter blksz_qr from module Select_Parameters for use in subroutines bd_cmp2() and bd_cmp3(). as a blocking factor for applying Householder transformations.
- Importation of the new parameters max_num_threads_bd_cmp and max_num_threads_bd_cmp2 from module Select_Parameters for use in subroutines bd_cmp(), bd_cmp2() and bd_cmp3() for eventually limiting the number of cores used in the OpenMP path of these subroutines.
- Improved argument checking in subroutines bd_svd() and bd_svd2().
- The subroutines bd_singval() and bd_singval2(), which compute singular values of bidiagonal matrices, now include a check for NaNs in the computed singular values.
- A new subroutine for computing the largest singular value of a bidiagonal matrix by a bisection method, bd_max_singval(), is now included in the module.
- Inclusion of a new optional logical argument bisect in subroutines bd_svd(), bd_svd2(), svd_cmp(), svd_cmp2(), svd_cmp3(), svd_cmp4(), rsvd_cmp(), rsvd_cmp_fixed_precision(), reig_pos_cmp(), rqr_svd_cmp(), rqr_svd_cmp_fixed_precision(), rqlp_svd_cmp(), rqlp_svd_cmp2(), product_svd_cmp(), ginv() and comp_ginv(). When bisect=true and perfect_shift=true, the singular values in these subroutines are computed to higher precision using an accurate bisection algorithm (instead of the faster, but less accurate, Pal-Walker-Kahan algorithm used by default). This gives higher accuracy in the final (partial) SVD results in all these subroutines at the expense of a slight increase in the execution time.
- Accuracy improvements in rsvd_cmp_fixed_precision() subroutine, which performs a randomized partial SVD to solve the fixed-precision problem. The new version is also more robust with respect to possible overflows.
- Two new SVD drivers have been added, svd_cmp5() and svd_cmp6(), which give more accurate results than previous SVD drivers included in Statpack. The svd_cmp4() driver has also been updated for improved accuracy.
- Inclusion of the new subroutines rqr_svd_cmp(), rqr_svd_cmp_fixed_precision(), rqlp_svd_cmp(), rqlp_svd_cmp2() and rqlp_svd_cmp_fixed_precision(), which perform randomized or deterministic partial SVD to solve the fixed-rank or fixed-precision problems using QR-SVD or QLP-SVD algorithms. The rqlp_svd_cmp(), rqlp_svd_cmp2() and rqlp_svd_cmp_fixed_precision() subroutines are more accurate (and as fast for rqlp_svd_cmp2()) than the rsvd_cmp() and rsvd_cmp_fixed_precision() subroutines already included in the module, which perform the same task.
- Inclusion of the new subroutines qlp_cmp(), qlp_cmp2() and rqlp_cmp(), which compute randomized or deterministic (partial) QLP decompositions.
- The (generic) subroutines bd_svd() and bd_svd2(), which compute singular values and vectors of bidiagonal matrices, now use a faster BLAS3 algorithm when updating the singular vectors during the bidiagonal QR iterations if the number of elements in the singular vectors is sufficiently large. The shift from the standard BLAS1 algorithm to the new BLAS3 algorithm is controlled by the new module parameter thr_bd_svd_blas3, specified at the beginning of the module. These improvements in subroutines bd_svd() and bd_svd2() lead also to speed improvements in the generic routines svd_cmp(), svd_cmp2(), svd_cmp3(), svd_cmp4(), ginv() and comp_ginv() (and also marginally in subroutines rsvd_cmp(), rsvd_cmp_fixed_precision() and other randomized partial SVD subroutines).
- Speed improvements and better BLAS support in routines ginv() and comp_ginv().
- Large speed improvements in subroutine bd_inviter() when large clusters of singular values are present. A fast QR factorization is now used to reorthogonalize the singular vectors in this case instead of a slower Gram-Schmidt algorithm.
- Inclusion of a new optional logical argument tol_reortho in generic subroutines bd_inviter2() and bd_deflate2(). This argument controls the reorthogonalization of the left singular vectors in the versions of these generic subroutines, which assume inputs from the Rhala-Barlow one-sided bidiagonalization algorithm.
- Improved documentation of the routines in the module (especially the generic subroutines bd_svd(), bd_svd2(), svd_cmp(), svd_cmp2(), svd_cmp3(), svd_cmp4() and bd_inviter2() ).
module EIG_Procedures:
- All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
- Improved BLAS support (e.g., systematic use of the BLAS subroutines Xdot(), Xaxpy(), Xgemv(), Xsymv(), Xspmv(), Xspr2() if the cpp processor macro _BLAS is defined at compilation) in the (generic) subroutines symtrid_cmp(), symtrid_cmp2(), ortho_gen_symtrid(), apply_q_symtrid(), trid_inviter() and also in the private subroutine ortho_vec().
- The generic routine abse() (e.g., one-pass algorithm with compensated summation), included in module Utilities, is now used for computing Euclidean norm of vectors for improved accuracy in the generic subroutine symtrid_cmp().
- The generic routine norm2e() (e.g., LAPACK3E’s two-pass algorithm with compensated summation), included in the module Utilities, is now used for computing Euclidean norm of vectors for improved accuracy in the (generic) subroutines reig_cmp() and trid_inviter().
- The scaling algorithm has been improved in the generic subroutines eig_cmp(), eig_cmp2(), eig_cmp3(), eigvalues(), eigval_cmp(), eigval_cmp2(), eigval_cmp3(), select_eigval_cmp(), select_eigval_cmp2(), and select_eigval_cmp3().
- The subroutines symtrid_qri2() and symtrid_qri3(), which compute eigenvalues and eigenvectors of symmetric tridiagonal matrices, now use a faster BLAS3 algorithm when updating the eigenvectors during the QR iterations if the number of elements in the eigenvectors is sufficiently large. The shift from the standard BLAS1 algorithm to the new BLAS3 algorithm is controlled by the new module parameter thr_trid_eig_blas3, specified at the beginning of the module. These improvements in subroutines symtrid_qri2() and symtrid_qri3() lead also to speed improvements in the generic subroutines eig_cmp2() and eig_cmp3().
- Improved documentation of the routines in the module and more specifically in the generic subroutines symtrid_qri(), symtrid_qri2() and symtrid_qri3().
- Importation of the parameters blksz_qr from module Select_Parameters for use in subroutines symtrid_cmp2(), apply_q_symtrid(), ortho_gen_symtrid(), trid_deflate() and trid_inviter() as a blocking factor for applying Householder transformations.
- Importation of the parameters max_num_threads_symtrid_cmp and max_num_threads_symtrid_cmp2 from module Select_Parameters for use in subroutines symtrid_cmp() and symtrid_cmp2() for eventually limiting the number of cores used in the OpenMP path of these subroutines.
- Large speed improvements in subroutine trid_inviter() when large clusters of eigenvalues are present. A fast QR factorization is now used to reorthogonalize the eigenvectors in this case instead of a slower Gram-Schmidt algorithm.
- Improved documentation in the module.
module LLSQ_Procedures:
- All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
- Improved BLAS support and speed improvements in the generic subroutines solve_llsq(), llsq_qr_solve(), llsq_qr_solve2() and qr_solve2() when computing minimun 2-norm solutions of least square problems.
- The generic routine norm2e() (e.g., LAPACK3E’s two-pass algorithm with compensated summation), included in the module Utilities, is now used for computing Euclidean norm of vectors for improved accuracy in the subroutines included in the module.
- Inclusion of the new generic subroutine rqb_solve(), which solves overdetermined or underdetermined real linear systems, using a randomized QR or complete orthogonal factorization as computed by subroutine rqb_cmp() in module Random.
- Inclusion of a new optional logical argument bisect in subroutine llsq_svd_solve(). When bisect=true and perfect_shift=true, the singular values in this subroutine are computed to higher precision using an accurate bisection algorithm (instead of the faster, but less accurate, Pal-Walker-Kahan algorithm used by default). This gives higher accuracy in the final solution of the least square problem at the expense of a slight increase in the time of execution.
- Improved documentation of the subroutines in the module.
module Mul_Stat_Procedures:
- Inclusion of 4 new subroutines comp_ortho_rot_eof(), comp_smooth_rot_pc(), comp_lfc_rot_pc() and comp_filt_rot_pc(), which perform orthogonal rotations of Principal Components from a Principal Component Analysis according to a large set of different criteria.
- All the calls to the inquiry function lamch() have been replaced by a module approach in which the required numerical constants in each routine are now included via a single USE Num_Constants instruction.
- Improved documentation of the subroutines in the module.
modules Sort_Procedures, Prob_Procedures, Lin_Procedures, EIG_Procedures and FFT_Procedures, QR_Procedures, SVD_Procedures, LLSQ_Procedures, Mul_Stat_Procedures and Random:
- The calls to the deprecated OpenMP function omp_get_nested() have been replaced by calls to the OpenMP function omp_get_max_active_levels().
examples directory:
- Inclusion of new example programs and update of many older example programs to illustate the new features added in Statpack2.2 or to better illustrate the use of other existing Statpack subroutines.
tests directory:
- Inclusion of new test programs and update of many older test programs to test the new features added in Statpack2.2 or to improve the testing of other Statpack subroutines.
Release: 2.1 Date: Feb 12, 2021 This version provides bug corrections, large speed improvements for linear algebra subroutines and new routines based on randomized algorithms for computing restricted singular value and eigenvalue decompositions.
Updated June 07, 2021 to correct for deficiencies in new versions of the PGI fortran compiler related to the managing of (internal) memory. The updates concern only small changes in the SVD_Procedures and LLSQ_Procedures modules and the STATPACK testing programs in the tests directory.
Updated Dec 20, 2021 for bug corrections. The bug corrections concern only a small change for the lq_cmp() subroutine in the QR_Procedures module when used in a multi-threaded environment.
Changes and updates from version 2.0 of Statpack:
module BLAS_Interfaces:
- Generic interfaces have been added for the following BLAS subroutines: Xsyr2(), Xher2(), Xspr2(), Xhpr2(), Xsymm(), Xhemm(), Xtrsm(), Xtrmm(), Xsyrk(), Xherk(), Xsyr2k(), Xher2k() where X can be s, d, c and z. The new generic interfaces have the form: syr2, her2, spr2, hpr2, symm, hemm, trsm, trmm, syrk, herk, syr2k, her2k
- All COMPLEX*16 declarations in the module have been replaced by DOUBLE COMPLEX declarations.
module Char_constants:
- Inclusion of new error messages for modules SVD_Procedures and QR_Procedures.
module Select_Parameters:
- Integration of new public parameters, which specify the default block sizes for the different blocked algorithms in modules Utilities, FFT_Procedures, Lin_Procedures, QR_Procedures, Eig_Procedures and SVD_Procedures.
module Sort_Procedures:
- Improved BLAS support and speed improvement for reorder() generic subroutine. A faster and in-place algorithm is now used.
- Improved documentation of the subroutines in the module.
module Giv_Procedures:
- Large speed improvement for givens_mat_left(), givens_vec_mat_left(), fastgivens_mat_left() and fastgivens_vec_mat_left() subroutines. A faster transpose algorithm is now used by default.
- Improved documentation of the subroutines in the module.
module Lin_Procedures:
- Improved BLAS support for the following subroutines: lu_solve(), solve_lin(), lin_lu_solve(), triang_solve(), comp_uut_ltl() and comp_triang_inv().
- Improved documentation of the subroutines in the module.
module QR_Procedures:
- Improved BLAS support and speed improvements for the following subroutines: ortho_gen_lq(), apply_q_lq(), qr_cmp(), ortho_gen_qr() and apply_q_qr(). New blocked algorithms (with BLAS3 support) have been implemented in subroutines qr_cmp(), ortho_gen_qr() and apply_q_qr().
- Speed improvements and new formulations in qrfac() and qr_cmp2() subroutines.
- Improved documentation of the subroutines in the module.
module Random:
- Improved BLAS support for the following subroutines: gen_random_sym_mat() and gen_random_mat().
module SVD_Procedures:
- Large speed improvements in generic bd_cmp2() subroutine, which performs bidiagonalization of a rectangular matrix using the one-sided Ralha-Barlow bidiagonal reduction algorithm. A general blocked algorithm is now used.
- Inclusion of a new subroutine, bd_cmp3(), which performs a partial bidiagonalization of a rectangular matrix using the one-sided Ralha-Barlow bidiagonal reduction algorithm. The subroutine computes the bidiagonal and the right orthogonal matrices only. A general blocked algorithm is also used in bd_cmp3().
- Correction of a small bug in generic bd_cmp2() subroutine when the first column of the matrix is zero.
- A specific version of the generic bd_cmp() subroutine, which does a preliminary QR or LQ factorization before bidiagonalization, has been added in the module (e.g., bd_cmp_c() subroutine).
- Large speed improvements in the generic svd_cmp(), svd_cmp2(), svd_cmp3() and svd_cmp4() subroutines when the number of rows and columns of the input matrix differ significantly and, in all the cases, for svd_cmp3() and svd_cmp4() subroutines because of the use of the new bd_cmp2() subroutine. All these SVD subroutines benefit also from the speed improvements done in module QR_Procedures.
- Large speed improvements in the subroutines apply_q_bd(), apply_p_bd(), ortho_gen_bd(), ortho_gen_bd2(), ortho_gen_q_bd() and ortho_gen_p_bd(), which also benefit from the speed improvements done in module QR_Procedures.
- New forms of calls, which allow pre-processing of the input matrix. by a preliminary QR or LQ factorization, and large speed improvements in the (now generic) select_singval_cmp(), select_singval_cmp2(), select_singval_cmp3() and select_singval_cmp4() subroutines.
- Large speed improvements in the generic bd_inviter2() and bd_deflate2() subroutines and new forms of calls, which can be used with the new forms of calls of select_singval_cmp(), select_singval_cmp2() and select_singval_cmp3() and select_singval_cmp4() subroutines to obtain the (partial) SVD of a matrix after a pre-processing by a preliminary QR or LQ decomposition. This results in large speed improvement when the number of rows and columns of the input matrix differ significantly.
- Inclusion of three new subroutines rsvd_cmp(), rsvd_cmp_fixed_precision() and reig_pos_cmp(), which compute, respectively, approximations of partial SVD of rectangular matrices and partial EVD of symmetric positive semi-definite matrices using randomized algorithms.
- Improved documentation of the subroutines in the module.
module LLSQ_Procedures:
- Improved BLAS support for the following subroutines: solve_llsq(), llsq_qr_solve() and llsq_qr_solve2().
- New and faster formulation for computing mimimun 2-norm solutions in subroutines solve_llsq(), llsq_qr_solve(), llsq_qr_solve2() and qr_solve2() subroutines.
- All subroutines also benefit from the speed improvements in modules QR_Procedures and SVD_Procedures.
- Improved documentation of the subroutines in the module.
module EIG_Procedures:
- Inclusion of a new subroutine, symtrid_cmp2(), which reduces a real n-by-n symmetric matrix cross-product using the one-sided Ralha tridiagonal reduction algorithm and without forming this cross-product. A general blocked version of the algorithm is used.
- Correction of a bug in the generic apply_q_symtrid() subroutine with the optional logical argument upper and when the matrix is in packed form (upper was used in the apply_q_symtrid_c() subroutine, but its presence in the call of the subroutine was not tested). In this version of the library, the old apply_q_symtrid_c() subroutine has been replaced by the subroutines apply_q_symtrid_c() and apply_q_symtrid_d(), which offer the same functionalities and the bug is removed.
- Improved BLAS support in the generic symtrid_cmp() subroutine.
- Improved BLAS support and and large speed improvements in the generic subroutines ortho_gen_symtrid() and apply_q_symtrid() when the argument upper is not used. A more efficient blocked algorithm is now used (with BLAS3 support).
- Large speed improvements in the generic eig_cmp(), eig_cmp2(), eig_cmp3(), trid_deflate() and trid_inviter() subroutines due to the updates of ortho_gen_symtrid() and apply_q_symtrid() subroutines when the argument upper is not used.
- Inclusion of a new subroutine, reig_cmp(), which computes approximations of partial EVD of symmetric matrices using randomized algorithms.
- Improved documentation of the subroutines in the module.
modules Time_Procedures, Time_Series_Procedures, Stat_Procedures, Mul_Stat_Procedures, Prob_Procedures, Random, Sort_Procedures, Giv_Procedures, Hous_Procedures:
- Improved documentation of the subroutines in the modules.
examples directory:
- Inclusion of new example programs illustrating the new features described above: ex1_bd_cmp3.F90, ex1_symtrid_cmp2.F90, ex2_symtrid_cmp2.F90, ex1_rsvd_cmp.F90, ex1_rsvd_cmp_fixed_precision.F90, ex1_reig_cmp.F90, ex1_reig_pos_cmp.F90, ex1_random_svd.F90, ex1_random_eig.F90, ex1_random_eig_pos.F90 ex1_random_svd_with_blas.F90, ex1_random_eig_with_blas.F90, ex1_random_eig_pos_with_blas.F90 ex1_bd_inviter2_bis.F90, ex1_select_singval_cmp3_bis.F90, ex1_select_singval_cmp4_bis.F90 ex2_select_singval_cmp.F90, ex2_select_singval_cmp2.F90, ex2_select_singval_cmp3.F90, ex2_select_singval_cmp3_bis.F90, ex2_select_singval_cmp4.F90, ex2_select_singval_cmp4_bis.F90.
- Update of the following example programs illustrating the new features described above: ex1_bd_deflate2.F90, ex1_bd_deflate2_bis.F90, ex1_bd_inviter2.F90, ex1_select_singval_cmp.F90, ex1_select_singval_cmp2.F90, ex1_select_singval_cmp3.F90, ex1_select_singval_cmp4.F90.
- Update of the following example programs: ex1_eigval_cmp.F90, ex1_eigval_cmp2.F90, ex1_eigval_cmp3.F90 ex1_select_eigval_cmp.F90, ex1_select_eigval_cmp2.F90, ex1_select_eigval_cmp3.F90 ex2_eigval_cmp.F90, ex2_eigval_cmp2.F90, ex2_eigval_cmp3.F90 ex2_select_eigval_cmp.F90, ex2_select_eigval_cmp2.F90, ex2_select_eigval_cmp3.F90.
tests directory:
- Inclusion of new test programs and update of some older test programs to test the new features added in statpack2.1 or to improve the testing of other statpack subroutines.
Release: 2.0 Date: Dec 3, 2018 First open source release
Updated Sep 23, 2020 for bug corrections
Updated Dec 20, 2021 for bug corrections. The bug corrections concern only a small change for the lq_cmp() subroutine in the QR_Procedures module when used in a multi-threaded environment.
Release: 1.1 Date: Jan 21, 2004
Release: 1.0 Date: May 7, 2003