Module_SVD_Procedures¶

This file is part of statpack.

statpack is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

statpack is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You can find a copy of the GNU Lesser General Public License in the statpack/doc directory.

Authors: Pascal Terray (LOCEAN/IPSL, Paris, France)

MODULE EXPORTING SUBROUTINES AND FUNCTIONS FOR COMPUTING FULL, PARTIAL SVD OR QLP DECOMPOSITIONS AND GENERALIZED INVERSE OF A MATRIX.

SUBROUTINES FOR COMPUTING PARTIAL EIGENVALUE, SVD OR QLP DECOMPOSITIONS BASED ON RANDOMIZED ALGORITHMS ARE ALSO PROVIDED.

LATEST REVISION : 12/03/2024

`subroutine bd_cmp ( mat, d, e, tauq, taup )`¶

Purpose¶

BD_CMP reduces a general m-by-n matrix MAT to upper or lower bidiagonal form BD by an orthogonal transformation :

Q’ * MAT * P = BD

where Q and P are orthogonal. If:

m >= n, BD is upper bidiagonal;

m < n, BD is lower bidiagonal.

BD_CMP computes BD, Q and P.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the general m-by-n matrix to be reduced.

On exit, if:

m >= n, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors;

m < n, the elements below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements on and above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors.

See Further Details.

D (OUTPUT) real(stnd), dimension(:)

The diagonal elements of the bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ).

E (OUTPUT) real(stnd), dimension(:)

The off-diagonal elements of the bidiagonal matrix BD:

if m >= n, E(i) = BD(i-1,i) for i = 2,3,…,n;

if m < n, E(i) = BD(i,i-1) for i = 2,3,…,m.

The size of E must be min( size(MAT,1) , size(MAT,2) ).

TAUQ (OUTPUT) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See Further Details.

The size of TAUQ must be min( size(MAT,1) , size(MAT,2) ).

TAUP (OUTPUT) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix P. See Further Details.

The size of TAUP must be min( size(MAT,1) , size(MAT,2) ).

Further Details¶

The matrices Q and P are represented as products of elementary reflectors:

If m >= n,

Q = H(1) * H(2) * … * H(n) and P = G(1) * G(2) * … * G(n-1)

Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors; u(1:i-1) = 0 and u(i:m) is stored on exit in MAT(i:m,i); v(1:i) = 0 and v(i+1:n) is stored on exit in MAT(i,i+1:n); tauq is stored in TAUQ(i) and taup in TAUP(i).

If m < n,

Q = H(1) * H(2) * … * H(m-1) and P = G(1) * G(2) * … * G(m)

Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors; u(1:i) = 0 and u(i+1:m) is stored on exit in MAT(i+1:m,i); v(1:i-1) = 0 and v(i:n) is stored on exit in MAT(i,i:n); tauq is stored in TAUQ(i) and taup in TAUP(i).

The contents of MAT on exit are illustrated by the following examples:

m = 6 and n = 5 (m >= n):

( u1 v1 v1 v1 v1 )

( u1 u2 v2 v2 v2 )

( u1 u2 u3 v3 v3 )

( u1 u2 u3 u4 v4 )

( u1 u2 u3 u4 u5 )

( u1 u2 u3 u4 u5 )

m = 5 and n = 6 (m < n):

( v1 v1 v1 v1 v1 v1 )

( u1 v2 v2 v2 v2 v2 )

( u1 u2 v3 v3 v3 v3 )

( u1 u2 u3 v4 v4 v4 )

( u1 u2 u3 u4 v5 v5 )

where ui denotes an element of the vector defining H(i), and vi an element of the vector defining G(i).

This subroutine is adapted from the routine DGEBD2 in LAPACK. An efficient variant of the classic Golub and Kahan Householder bidiagonalization algorithm is used. This variant reduces the traffic on the data bus from four reads and two writes per column-row elimination of the bidiagonalization process to one read and one write. Furthermore, the algorithm is parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and its use or the efficient variant used here, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Howell, G.W., Demmel, J., Fulton, C.T., Hammarling, S., and Marmol, K., 2008:

Cache efficient bidiagonalization using BLAS 2.5 operators. ACM Transactions on Mathematical Software (TOMS) Volume 34, Issue 3.

`subroutine bd_cmp ( mat, d, e, tauq )`¶

Purpose¶

BD_CMP reduces a general m-by-n matrix MAT to upper or lower bidiagonal form BD by an orthogonal transformation :

Q’ * MAT * P = BD

where Q and P are orthogonal. If:

m >= n, BD is upper bidiagonal;

m < n, BD is lower bidiagonal.

BD_CMP computes only BD and Q.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the general m-by-n matrix to be reduced.

On exit, if:

m >= n, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal are destroyed;

m < n, the elements below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements on and above the diagonal are destroyed.

See Further Details.

D (OUTPUT) real(stnd), dimension(:)

The diagonal elements of the bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ).

E (OUTPUT) real(stnd), dimension(:)

The off-diagonal elements of the bidiagonal matrix BD:

if m >= n, E(i) = BD(i-1,i) for i = 2,3,…,n;

if m < n, E(i) = BD(i,i-1) for i = 2,3,…,m.

The size of E must be min( size(MAT,1) , size(MAT,2) ).

TAUQ (OUTPUT) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See Further Details.

The size of TAUQ must be min( size(MAT,1) , size(MAT,2) ).

Further Details¶

The matrice Q is represented as products of elementary reflectors:

If m >= n,

Q = H(1) * H(2) * … * H(n)

Each H(i) has the form:

H(i) = I + tauq * u * u’

where tauq is a real scalar and u is a real vector; u(1:i-1) = 0 and u(i:m) is stored on exit in MAT(i:m,i); tauq is stored in TAUQ(i).

If m < n,

Q = H(1) * H(2) * … * H(m-1)

Each H(i) has the form:

H(i) = I + tauq * u * u’

where tauq is a real scalar and u is a real vector; u(1:i) = 0 and u(i+1:m) is stored on exit in MAT(i+1:m,i); tauq is stored in TAUQ(i).

The contents of MAT on exit are illustrated by the following examples:

m = 6 and n = 5 (m > n):

( u1 xx xx xx xx )

( u1 u2 xx xx xx )

( u1 u2 u3 xx xx )

( u1 u2 u3 u4 xx )

( u1 u2 u3 u4 u5 )

( u1 u2 u3 u4 u5 )

m = 5 and n = 6 (m < n):

( xx xx xx xx xx xx )

( u1 xx xx xx xx xx )

( u1 u2 xx xx xx xx )

( u1 u2 u3 xx xx xx )

( u1 u2 u3 u4 xx xx )

where ui denotes an element of the vector defining H(i). The upper triangular part of MAT is destroyed on exit.

This subroutine is adapted from the routine DGEBD2 in LAPACK. An efficient variant of the classic Golub and Kahan Householder bidiagonalization algorithm is used. This variant reduces the traffic on the data bus from four reads and two writes per column-row elimination of the bidiagonalization process to one read and one write. Furthermore, the algorithm is parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and its use or the efficient variant used here, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Howell, G.W., Demmel, J., Fulton, C.T., Hammarling, S., and Marmol, K., 2008:

Cache efficient bidiagonalization using BLAS 2.5 operators. ACM Transactions on Mathematical Software (TOMS) Volume 34, Issue 3.

`subroutine bd_cmp ( mat, d, e, tauq, taup, rlmat, tauo )`¶

Purpose¶

BD_CMP reduces a general m-by-n matrix MAT to upper bidiagonal form BD by a two-step algorithm:

If m >= n, a QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular. In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

If m < n, an LQ factorization of the real m-by-n matrix MAT is first computed

MAT = L * O

where O is orthogonal and L is lower triangular. In a second step, the m-by-m lower triangular matrix L is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * L * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

BD_CMP computes O, BD, Q and P. The matrix O is stored in factored form if the optional argument TAUO is present or explicitly computed if this argument is absent.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the general m-by-n matrix to be reduced.

On exit, if:

m >= n, the elements on and below the diagonal, with the array TAUO, represent the orthogonal matrix O of the QR factorization of MAT, as a product of elementary reflectors, if the argument TAUO is present. Otherwise, the argument MAT contains the first n columns of O on output.

m < n, the elements on and above the diagonal, with the array TAUO, represent the orthogonal matrix O of the LQ factorization of MAT, as a product of elementary reflectors, if the argument TAUO is present. Otherwise, the argument MAT contains the first m rows of O on output.

See Further Details.

D (OUTPUT) real(stnd), dimension(:)

The diagonal elements of the bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ).

E (OUTPUT) real(stnd), dimension(:)

The off-diagonal elements of the bidiagonal matrix BD:

E(i) = BD(i-1,i) for i = 2,3,…,n;

The size of E must be min( size(MAT,1) , size(MAT,2) ).

TAUQ (OUTPUT) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See Further Details.

The size of TAUQ must be min( size(MAT,1) , size(MAT,2) ).

TAUP (OUTPUT) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix P. See Further Details.

The size of TAUP must be min( size(MAT,1) , size(MAT,2) ).

RLMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors;

See Further Details.

The shape of RLMAT must verify:

size( RLMAT, 1 ) = size( RLMAT, 2 ) = min( size(MAT,1) , size(MAT,2) ).

TAUO (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix O of the QR or LQ decomposition of MAT.

If the optional argument TAUO is present, the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on exit.

If the optional argument TAUO is absent, the orthogonal matrix O is explicitly generated and stored in the argument MAT on exit.

See description of the argument MAT above and Further Details below.

The size of TAUO must be min( size(MAT,1) , size(MAT,2) ).

Further Details¶

If m >= n, the matrix O of the QR factorization of MAT is represented as a product of elementary reflectors

O = W(1) * W(2) * … * W(n)

Each W(i) has the form

W(i) = I + tauo * ( v * v’ ) ,

where tauo is a real scalar and v is a real m-element vector with v(1:i-1) = 0. v(i:m) is stored on exit in MAT(i:m,i) and tauo in TAUO(i). If the optional argument TAUO is absent, the first n columns of O are generated and stored in the argument MAT.

If m < n, The matrix O of the LQ factorization of MAT is represented as a product of elementary reflectors

O = W(m) * … * W(2) * W(1)

Each W(i) has the form

W(i) = I + tauo * ( v * v’ ) ,

where tauo is a real scalar and v is a real n-element vector with v(1:i-1) = 0. v(i:n) is stored on exit in MAT(i,i:n) and tauo in TAUO(i).

A blocked algorithm is used for computing the QR or LQ factorization of MAT. Furthermore, the computations are parallelized if OPENMP is used.

After, the initial QR or LQ factorization of MAT, the (upper or lower) triangular matrix is reduced to upper bidiagonal form BD.

The matrices Q and P of the bidiagonal factorization of the triangular matrix R or L are represented as products of elementary reflectors:

Q = H(1) * H(2) * … * H(k) and P = G(1) * G(2) * … * G(k-1)

, where k = min( size(MAT,1) , size(MAT,2) ). Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors; u(1:i-1) = 0 and u(i:min(m,n)) is stored on exit in RLMAT(i:min(m,n),i); v(1:i) = 0 and v(i+1:min(m,n)) is stored on exit in RLMAT(i,i+1:min(m,n)); tauq is stored in TAUQ(i) and taup in TAUP(i).

The contents of RLMAT on exit are illustrated by the following example:

m = 6 and n = 5 (m >= n):

( u1 v1 v1 v1 v1 )

( u1 u2 v2 v2 v2 )

( u1 u2 u3 v3 v3 )

( u1 u2 u3 u4 v4 )

( u1 u2 u3 u4 u5 )

where ui denotes an element of the vector defining H(i), and vi an element of the vector defining G(i).

An efficient variant of the classic Golub and Kahan Householder bidiagonalization algorithm is used. This variant reduces the traffic on the data bus from four reads and two writes per column-row elimination of the bidiagonalization process to one read and one write. Furthermore, the algorithm is parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and its use or the variant used here, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Howell, G.W., Demmel, J., Fulton, C.T., Hammarling, S., and Marmol, K., 2008:

Cache efficient bidiagonalization using BLAS 2.5 operators. ACM Transactions on Mathematical Software (TOMS) Volume 34, Issue 3.

`subroutine bd_cmp ( mat, d, e )`¶

Purpose¶

BD_CMP reduces a general m-by-n matrix MAT to upper or lower bidiagonal form BD by an orthogonal transformation :

Q’ * MAT * P = BD

where Q and P are orthogonal. If:

m >= n, BD is upper bidiagonal;

m < n, BD is lower bidiagonal.

BD_CMP computes only BD and the matrices Q and P are not saved.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the general m-by-n matrix to be reduced.

On exit, the general m-by-n matrix is destroyed.

D (OUTPUT) real(stnd), dimension(:)

The diagonal elements of the bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ).

E (OUTPUT) real(stnd), dimension(:)

The off-diagonal elements of the bidiagonal matrix BD:

if m >= n, E(i) = BD(i-1,i) for i = 2,3,…,n;

if m < n, E(i) = BD(i,i-1) for i = 2,3,…,m.

The size of E must be min( size(MAT,1) , size(MAT,2) ).

Further Details¶

This subroutine is adapted from the routine DGEBD2 in LAPACK. An efficient variant of the classic Golub and Kahan Householder bidiagonalization algorithm is used. This variant reduces the traffic on the data bus from four reads and two writes per column-row elimination of the bidiagonalization process to one read and one write. Furthermore, the algorithm is parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and its use or the efficient variant used here, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Howell, G.W., Demmel, J., Fulton, C.T., Hammarling, S., and Marmol, K., 2008:

Cache efficient bidiagonalization using BLAS 2.5 operators. ACM Transactions on Mathematical Software (TOMS) Volume 34, Issue 3.

`subroutine bd_cmp2 ( mat, d, e, p, failure, gen_p, reortho )`¶

Purpose¶

BD_CMP2 reduces a m-by-n matrix MAT with m >= n to upper bidiagonal form BD by an orthogonal transformation :

Q’ * MAT * P = BD

where Q and P are orthogonal.

BD_CMP2 computes BD, Q and P using the one-sided Ralha-Barlow bidiagonal reduction algorithm.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the general m-by-n matrix to be reduced.

On exit, the first n columns of Q are stored in in MAT(1:m,1:n).

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

D (OUTPUT) real(stnd), dimension(:)

The diagonal elements of the bidiagonal matrix BD

The size of D must be size( MAT, 2) = n .

E (OUTPUT) real(stnd), dimension(:)

The off-diagonal elements of the bidiagonal matrix BD:

E(i) = BD(i-1,i) for i = 2,3,…,n;

The size of E must be size( MAT, 2 ) = n .

P (OUTPUT) real(stnd), dimension(:,:)

On exit, the n-by-n matrix P.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = n .

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that MAT is nearly singular and some loss of orthogonality of Q can be expected in the Ralha-Barlow one-sided algorithm.

See further details.

GEN_P (INPUT, OPTIONAL) logical(lgl)

If the optional argument GEN_P is used and is set to true, the orthogonal matrix P is generated on output of the subroutine. If this argument is set to false, the orthogonal matrix is stored in factored form as products of elementary reflectors in the lower triangle of the array P.

See further details.

The default is GEN_P = true.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization of the matrix Q is performed (when needed). If this optional logical argument is set to false, reorthogonalization of Q is never performed and the columns of the matrix Q can be far from orthogonal to each other for input matrices MAT with a large condition number.

See further details.

The default is REORTHO = true.

Further Details¶

This subroutine is an implementation of the Ralha-Barlow one-sided method to reduce a rectangular matrix MAT to bidiagonal form BD. Q is computed by a recurrence relationship and P as a product of n-1 elementary reflectors (e.g. Householder transformations):

P = G(1) * G(2) * … * G(n-1)

Each G(i) has the form:

G(i) = I + taup * v * v’

where taup is a real scalar, and v is a real vector. IF GEN_P is used and set to false, the n-1 G(i) elementary reflectors are stored in the lower triangle of the array P. For the G(i) reflector, taup is stored in P(i+1,1) and v is stored in P(i+1:n,i+1). IF GEN_P is set to true, P is generated in P(:n,:n).

In addition, P(1,1) is set to -1 if GEN_P=false and is equal to 1 if GEN_P=true. In other words, the value of P(1,1) indicates if the orthogonal matrix P is stored in factored form or not. Note that if n is equal to 1, no elementary reflectors are needed and consequently P(1,1) is set to 1, independently of the value of GEN_P.

This is the blocked version of the algorithm. See the references (1), (2) and (3) for further details. Note also that the blocked algorithm implemented here is more efficient than the version described in the reference (3). Furthermore the algorithm is parallelized if OPENMP is used.

Since Q is computed by a recurrence relationship, a loss of orthogonality of Q can be observed when the rectangular matrix MAT is singular or nearly singular or has a large condition number, see the reference (2) for details.

To correct partly this deficiency, partial reorthogonalization can be performed to ensure orthogonality at the expense of speed of computation. The reorthogonalization uses the Gram-Schmidt method described in the reference (4) and is activated if the optional logical argument REORTHO is set to true.

The reference (2) also explains how to handle the case of an exactly singular matrix MAT (a very rare event). However, in this subroutine, the partial reorthogonalization described in the reference (4), which is used when REORTHO is set true, corrects automatically this problem.

For further details, see:

Ralha, R.M.S., 2003:

One-sided reduction to bidiagonal form. Linear Algebra Appl., No 358, pp. 219-238.

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Stewart, G.W., 2007:

Block Gram-Schmidt Orthogonalization. Report TR-4823, Department of Computer Science, College Park, University of Maryland.

`subroutine bd_cmp2 ( mat, d, e, failure, reortho )`¶

Purpose¶

BD_CMP2 reduces a m-by-n matrix MAT with m >= n to upper bidiagonal form BD by an orthogonal transformation :

Q’ * MAT * P = BD

where Q and P are orthogonal.

BD_CMP2 computes BD and Q using the one-sided Ralha-Barlow bidiagonal reduction algorithm.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the general m-by-n matrix to be reduced.

On exit, the first n columns of Q are stored in in MAT(1:m,1:n).

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

D (OUTPUT) real(stnd), dimension(:)

The diagonal elements of the bidiagonal matrix BD

The size of D must be size( MAT, 2 ) = n .

E (OUTPUT) real(stnd), dimension(:)

The off-diagonal elements of the bidiagonal matrix BD:

E(i) = BD(i-1,i) for i = 2,3,…,n;

The size of E must be size( MAT, 2 ) = n .

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that MAT is nearly singular and some loss of orthogonality of Q can be expected in the Ralha-Barlow one-sided algorithm.

See further details.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization of the matrix Q is performed (when needed). If this optional logical argument is set to false, reorthogonalization of Q is never performed and the columns of the matrix Q can be far from orthogonal to each other for input matrices MAT with a large condition number.

See further details.

The default is REORTHO = true.

Further Details¶

This subroutine is an implementation of the Ralha-Barlow one-sided method to reduce a rectangular matrix MAT to bidiagonal form BD. Q is computed by a recurrence relationship.

This is the blocked version of the algorithm. See the references (1), (2) and (3) for further details. Note also that the blocked algorithm implemented here is more efficient than the version described in the reference (3). Furthermore the algorithm is parallelized if OPENMP is used.

Since Q is computed by a recurrence relationship, a loss of orthogonality of Q can be observed when the rectangular matrix MAT is singular or nearly singular or has a large condition number, see the reference (2) for details.

To correct partly this deficiency, partial reorthogonalization can be performed to ensure orthogonality at the expense of speed of computation. The reorthogonalization uses the Gram-Schmidt method described in the reference (4) and is activated if the optional logical argument REORTHO is set to true.

The reference (2) also explains how to handle the case of an exactly singular matrix MAT (a very rare event). However, in this subroutine, the partial reorthogonalization described in the reference (4), which is used when REORTHO is set true, corrects automatically this problem.

For further details, see:

Ralha, R.M.S., 2003:

One-sided reduction to bidiagonal form. Linear Algebra Appl., No 358, pp. 219-238.

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Stewart, G.W., 2007:

Block Gram-Schmidt Orthogonalization. Report TR-4823, Department of Computer Science, College Park, University of Maryland.

`subroutine bd_cmp3 ( mat, d, e, gen_p, failure )`¶

Purpose¶

BD_CMP3 reduces a m-by-n matrix MAT with m >= n to upper bidiagonal form BD by an orthogonal transformation :

Q’ * MAT * P = BD

where Q and P are orthogonal.

BD_CMP3 computes BD and P using the one-sided Ralha-Barlow bidiagonal reduction algorithm.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the general m-by-n matrix to be reduced.

On exit, P is stored in MAT(1:n,1:n). See Further Details.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

D (OUTPUT) real(stnd), dimension(:)

The diagonal elements of the bidiagonal matrix BD

The size of D must be size( MAT, 2) = n .

E (OUTPUT) real(stnd), dimension(:)

The off-diagonal elements of the bidiagonal matrix BD:

E(i) = BD(i-1,i) for i = 2,3,…,n;

The size of E must be size( MAT, 2 ) = n .

GEN_P (INPUT) logical(lgl)

If:

GEN_P = true : the orthogonal matrix P is generated in MAT(1:n,1:n) on output of the subroutine.

GEN_P = false : the orthogonal matrix is stored in factored form as products of elementary reflectors in the lower triangle of the array MAT(1:n,1:n).

See further details.

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that MAT is nearly singular.

Further Details¶

This subroutine is an implementation of the Ralha-Barlow one-sided method to reduce a rectangular matrix MAT to bidiagonal form BD. Q is computed by a recurrence relationship (but is not stored) and P as a product of n-1 elementary reflectors (e.g., Householder transformations):

P = G(1) * G(2) * … * G(n-1)

Each G(i) has the form:

G(i) = I + taup * v * v’

where taup is a real scalar, and v is a real vector. IF GEN_P is set to false, the n-1 G(i) elementary reflectors are stored in the lower triangle of the array MAT. For the G(i) reflector, taup is stored in MAT(i+1,1) and v is stored in MAT(i+1:n,i+1). IF GEN_P is set to true, P is generated in MAT(:n,:n).

In addition, MAT(1,1) is set to -1 if GEN_P=false and is equal to 1 if GEN_P=true. In other words, the value of MAT(1,1) indicates if the orthogonal matrix P is stored in factored form or not in MAT. Note that if n is equal to 1, no elementary reflectors are needed and consequently MAT(1,1) (e.g., P(1,1)) is set to 1, independently of the value of GEN_P.

This is the blocked version of the algorithm. See the references (1), (2) and (3) for further details. Note also that the blocked algorithm implemented here is more efficient than the version described in the reference (3). Furthermore the algorithm is parallelized if OPENMP is used.

The reference (2) also explains how to handle the case of an exactly singular matrix MAT (a very rare event). However, this case is not implemented here as this subroutine outputs only BD and P.

For further details, see:

Ralha, R.M.S., 2003:

One-sided reduction to bidiagonal form. Linear Algebra Appl., No 358, pp. 219-238.

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

`subroutine ortho_gen_bd ( mat, tauq, taup, p )`¶

Purpose¶

ORTHO_GEN_BD generates the real orthogonal matrices Q and P determined by BD_CMP when reducing a m-by-n real matrix MAT to bidiagonal form:

MAT = Q * BD * P’.

Q and P are defined as products of elementary reflectors H(i) and G(i), respectively, determined by BD_CMP and stored in its array arguments MAT, TAUQ and TAUP.

If m >= n:

Q = H(1) * H(2) * … * H(n) and ORTHO_GEN_BD returns the first n columns of Q in MAT;

P = G(1) * G(2) * … * G(n-1) and ORTHO_GEN_BD returns P as an n-by-n matrix in P.

If m < n:

Q = H(1) * H(2) * … * H(m-1) and ORTHO_GEN_BD returns Q as an m-by-m matrix in MAT(1:m,1:m);

P = G(1) * G(2) * … * G(m) and ORTHO_GEN_BD returns the first m columns of P, in P.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the vectors which define the elementary reflectors H(i) and G(i), as returned by BD_CMP in its array argument MAT.

On exit, the first min(m,n) columns of Q are stored in MAT(1:m,1:min(m,n)).

TAUQ (INPUT) real(stnd), dimension(:)

TAUQ(i) must contain the scalar factor of the elementary reflector H(i), which determines Q, as returned by BD_CMP in its array argument TAUQ.

The size of TAUQ must verify: size( TAUQ ) = min(m,n) .

TAUP (INPUT) real(stnd), dimension(:)

TAUP(i) must contain the scalar factor of the elementary reflector G(i), which determines P, as returned by BD_CMP in its array argument TAUP.

The size of TAUP must verify: size( TAUP ) = min(m,n) .

P (OUTPUT) real(stnd), dimension(:,:)

On exit, the first min(m,n) columns of the n-by-n matrix P

The shape of p must verify:

size( P, 1 ) = n ,

size( P, 2 ) = min(m,n).

Further Details¶

This subroutine used a blocked algorithm for agregating the Householder transformations (e.g. the elementary reflectors) stored in MAT and generating the orthogonal matrices Q and P of the bidiagonal decomposition of MAT.

Furthermore, the computations are parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and its use or the blocked algorithm used here, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Dongarra, J.J., Sorensen, D.C., and Hammarling, S.J., 1989:

Block reduction of matrices to condensed form for eigenvalue computations. J. of Computational and Applied Mathematics, Vol. 27, pp. 215-227.

Walker, H.F., 1988:

Implementation of the GMRES method using Householder transformations. Siam J. Sci. Stat. Comput., Vol. 9, No 1, pp. 152-163.

`subroutine ortho_gen_bd2 ( mat, tauq, taup, q_pt )`¶

Purpose¶

ORTHO_GEN_BD2 generates the real orthogonal matrices Q and P’ determined by BD_CMP when reducing a m-by-n real matrix MAT to bidiagonal form:

MAT = Q * BD * P’

Q and P’ are defined as products of elementary reflectors H(i) and G(i), respectively, determined by BD_CMP and stored in its array arguments MAT, TAUQ and TAUP.

If m >= n:

Q = H(1) * H(2) * … * H(n) and ORTHO_GEN_BD2 returns the first n columns of Q in MAT;

P’ = G(n-1) * … * G(2) * G(1) and ORTHO_GEN_BD2 returns P’ as an n-by-n matrix in Q_PT.

If m < n:

Q = H(1) * H(2) * … * H(m-1) and ORTHO_GEN_BD2 returns Q as an m-by-m matrix in Q_PT;

P’ = G(m) * … * G(2) * G(1) and ORTHO_GEN_BD2 returns the first m rows of P’, in MAT.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the vectors which define the elementary reflectors H(i) and G(i), as returned by BD_CMP in its array argument MAT.

On exit:

the first n columns of Q if m >= n ;

the first m rows of P’ if m < n .

TAUQ (INPUT) real(stnd), dimension(:)

TAUQ(i) must contain the scalar factor of the elementary reflector H(i), which determines Q, as returned by BD_CMP in its array argument TAUQ.

The size of TAUQ must verify: size( TAUQ ) = min(m,n) .

TAUP (INPUT) real(stnd), dimension(:)

TAUP(i) must contain the scalar factor of the elementary reflector G(i), which determines P’, as returned by BD_CMP in its array argument TAUP.

The size of TAUP must verify: size( TAUP ) = min(m,n) .

Q_PT (OUTPUT) real(stnd), dimension(:,:)

On exit:

the n-by-n matrix P’ if m >= n ;

the m-by-m matrix Q if m < n .

The shape of Q_PT must verify: size( Q_PT, 1 ) = size( Q_PT, 2 ) = min(m,n).

Further Details¶

This subroutine used a blocked algorithm for agregating the Householder transformations (e.g. the elementary reflectors) stored in MAT and generating the orthogonal matrices Q and P of the bidiagonal decomposition of MAT.

Furthermore, the computations are parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and its use or the blocked algorithm, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Dongarra, J.J., Sorensen, D.C., and Hammarling, S.J., 1989:

Block reduction of matrices to condensed form for eigenvalue computations. J. of Computational and Applied Mathematics, Vol. 27, pp. 215-227.

Walker, H.F., 1988:

Implementation of the GMRES method using Householder transformations. Siam J. Sci. Stat. Comput., Vol. 9, No 1, pp. 152-163.

`subroutine ortho_gen_q_bd ( mat, tauq )`¶

Purpose¶

ORTHO_GEN_Q_BD generates the real orthogonal matrix Q determined by BD_CMP when reducing a m-by-n real matrix MAT to bidiagonal form:

MAT = Q * BD * P’

Q is defined as products of elementary reflectors H(i) determined by BD_CMP and stored in its array arguments MAT and TAUQ.

If m >= n:

Q = H(1) * H(2) * … * H(n) and ORTHO_GEN_Q_BD returns the first n columns of Q in MAT.

If m < n:

Q = H(1) * H(2) * … * H(m-1) and ORTHO_GEN_Q_BD returns Q as an m-by-m matrix in MAT(:m,:m).

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the vectors which define the elementary reflectors H(i), as returned by BD_CMP.

On exit, the first min(m,n) columns of Q.

TAUQ (INPUT) real(stnd), dimension(:)

TAUQ(i) must contain the scalar factor of the elementary reflector H(i), which determines Q, as returned by BD_CMP in its array argument TAUQ.

The size of TAUQ must verify: size( TAUQ ) = min(m,n) .

Further Details¶

This subroutine used a blocked algorithm for agregating the Householder transformations (e.g. the elementary reflectors) stored in MAT and generating the orthogonal matrix Q of the bidiagonal decomposition of MAT.

Furthermore, the computations are parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and its use or the blocked algorithm, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Dongarra, J.J., Sorensen, D.C., and Hammarling, S.J., 1989:

Block reduction of matrices to condensed form for eigenvalue computations. J. of Computational and Applied Mathematics, Vol. 27, pp. 215-227.

Walker, H.F., 1988:

Implementation of the GMRES method using Householder transformations. Siam J. Sci. Stat. Comput., Vol. 9, No 1, pp. 152-163.

`subroutine ortho_gen_p_bd ( mat, taup, p )`¶

Purpose¶

ORTHO_GEN_P_BD generates the real orthogonal matrix P determined by BD_CMP when reducing a m-by-n real matrix MAT to bidiagonal form:

MAT = Q * BD * P’

P is defined as products of elementary reflectors G(i) determined by BD_CMP and stored in its array arguments MAT and TAUP.

If m >= n:

P = G(1) * G(2) * … * G(n-1) and ORTHO_GEN_P_BD returns P as an n-by-n matrix in P.

If m < n:

P = G(1) * G(2) * … * G(m) and ORTHO_GEN_P_BD returns the first m columns of P, in P.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the vectors which define the elementary reflectors G(i), as returned by BD_CMP in its array argument MAT.

TAUP (INPUT) real(stnd), dimension(:)

TAUP(i) must contain the scalar factor of the elementary reflector G(i), which determines P, as returned by BD_CMP in its array argument TAUP.

The size of TAUP must verify: size( TAUP ) = min(m,n) .

P (OUTPUT) real(stnd), dimension(:,:)

On exit, the first min(m,n) columns of the n-by-n matrix P

The shape of p must verify:

size( P, 1 ) = n ,

size( P, 2 ) = min(m,n).

Further Details¶

This subroutine used a blocked algorithm for agregating the Householder transformations (e.g. the elementary reflectors) stored in MAT and generating the orthogonal matrix P of the bidiagonal decomposition of MAT.

Furthermore, the computations are parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and its use or the blocked algorithm, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Dongarra, J.J., Sorensen, D.C., and Hammarling, S.J., 1989:

Block reduction of matrices to condensed form for eigenvalue computations. J. of Computational and Applied Mathematics, Vol. 27, pp. 215-227.

Walker, H.F., 1988:

Implementation of the GMRES method using Householder transformations. Siam J. Sci. Stat. Comput., Vol. 9, No 1, pp. 152-163.

`subroutine apply_q_bd ( mat, tauq, c, left, trans )`¶

Purpose¶

APPLY_Q_BD overwrites the general real m-by-n matrix C with:

Q * C if LEFT = true and TRANS = false ;

Q’ * C if LEFT = true and TRANS = true ;

C * Q if LEFT = false and TRANS = false ;

C * Q’ if LEFT = false and TRANS = true .

Here Q is the orthogonal matrix determined by BD_CMP when reducing a real matrix MAT to bidiagonal form:

MAT = Q * BD * P’

and Q is defined as products of elementary reflectors H(i).

Let nq = m if LEFT = true and nq = n if LEFT = false. Thus nq is the order of the orthogonal matrix Q that is applied. MAT is assumed to have been an nq-by-k matrix and

Q = H(1) * H(2) * … * H(k) , if nq >= k ;

or

Q = H(1) * H(2) * … * H(nq-1) , if nq < k .

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

The vectors which define the elementary reflectors H(i), whose products determine the matrix Q, as returned by BD_CMP. MAT must be specified as in BD_CMP and is not modified by the routine.

The shape of MAT must verify:

if LEFT = true : size( C, 1 ) = size( MAT, 1 ) = nq ;

if LEFT = false : size( C, 2 ) = size( MAT, 1 ) = nq .

TAUQ (INPUT) real(stnd), dimension(:)

TAUQ(i) must contain the scalar factor of the elementary reflector H(i) which determines Q, as returned by BD_CMP in the array argument TAUQ.

The size of TAUQ must verify: size( TAUQ ) = min( size(MAT,1) , size(MAT,2) ) .

C (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m by n matrix C.

On exit, C is overwritten by Q * C or Q’ * C or C * Q’ or C * Q .

The shape of C must verify:

if LEFT = true : size( C, 1 ) = size( MAT, 1 ) = nq ;

if LEFT = false : size( C, 2 ) = size( MAT, 1 ) = nq .

LEFT (INPUT) logical(lgl)

On entry, if:

LEFT= true : apply Q or Q’ from the left

LEFT= false : apply Q or Q’ from the right

TRANS (INPUT) logical(lgl)

On entry, if:

TRANS = false : apply Q (no transpose)

TRANS = true : apply Q’ (transpose)

Further Details¶

This subroutine is adapted from the routine DORMBR in LAPACK.

This subroutine used a blocked algorithm for agregating the Householder transformations (e.g. the elementary reflectors) stored in the lower triangle of MAT and applying the orthogonal matrix Q of the bidiagonal factorization to the real m-by-n matrix C.

Furthermore, the computations are parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and the blocked version of the algorithm used here, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Dongarra, J.J., Sorensen, D.C., and Hammarling, S.J., 1989:

Block reduction of matrices to condensed form for eigenvalue computations. J. of Computational and Applied Mathematics, Vol. 27, pp. 215-227.

Walker, H.F., 1988:

Implementation of the GMRES method using Householder transformations. Siam J. Sci. Stat. Comput., Vol. 9, No 1, pp. 152-163.

`subroutine apply_p_bd ( mat, taup, c, left, trans )`¶

Purpose¶

APPLY_P_BD overwrites the general real m-by-n matrix C with

P * C if LEFT = true and TRANS = false ;

P’ * C if LEFT = true and TRANS = true ;

C * P if LEFT = false and TRANS = false ;

C * P’ if LEFT = false and TRANS = true .

Here P is the orthogonal matrix determined by BD_CMP when reducing a real matrix MAT to bidiagonal form:

MAT = Q * BD * P’

and P is defined as products of elementary reflectors G(i).

Let np = m if LEFT = true and np = n if LEFT = false. Thus np is the order of the orthogonal matrix P that is applied. MAT is assumed to have been an k-by-np matrix and

P = G(1) * G(2) * … * G(k) , if k < np ;

or

P = G(1) * G(2) * … * G(np-1) , if k >= np .

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

The vectors which define the elementary reflectors G(i), whose products determine the matrix P, as returned by BD_CMP. MAT must be specified as in BD_CMP and is not modified by the routine.

The shape of MAT must verify:

if LEFT = true : size( C, 1 ) = size( MAT, 2 ) = np ;

if LEFT = false : size( C, 2 ) = size( MAT, 2 ) = np .

TAUP (INPUT) real(stnd), dimension(:)

TAUP(i) must contain the scalar factor of the elementary reflector G(i) which determines P, as returned by BD_CMP in the array argument TAUP.

The size of TAUP must verify: size( TAUP ) = min( size(MAT,1) , size(MAT,2) ) .

C (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m by n matrix C.

On exit, C is overwritten by P * C or P’ * C or C * P or C * P’.

The shape of C must verify:

if LEFT = true : size( C, 1 ) = size( MAT, 2 ) = np ;

if LEFT = false : size( C, 2 ) = size( MAT, 2 ) = np .

LEFT (INPUT) logical(lgl)

On entry, if:

LEFT= true : apply P or P’ from the left

LEFT= false : apply P or P’ from the right

TRANS (INPUT) logical(lgl)

On entry, if:

TRANS = false : apply P (no transpose)

TRANS = true : apply P’ (transpose)

Further Details¶

This subroutine is adapted from the routine DORMBR in LAPACK.

This subroutine used a blocked algorithm for agregating the Householder transformations (e.g. the elementary reflectors) stored in the upper triangle of MAT and applying the orthogonal matrix P of the bidiagonal factorization to the real m-by-n matrix C.

Furthermore, the computations are parallelized if OPENMP is used.

For further details on the bidiagonal reduction algorithm and the blocked version of the algorithm used here, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Dongarra, J.J., Sorensen, D.C., and Hammarling, S.J., 1989:

Block reduction of matrices to condensed form for eigenvalue computations. J. of Computational and Applied Mathematics, Vol. 27, pp. 215-227.

Walker, H.F., 1988:

Implementation of the GMRES method using Householder transformations. Siam J. Sci. Stat. Comput., Vol. 9, No 1, pp. 152-163.

`subroutine bd_svd ( upper, d, e, failure, u, v, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

BD_SVD computes the singular value decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal matrix B:

B = Q * S * P’

, where S is a diagonal matrix with non-negative diagonal elements (the singular values of B), and, Q and P are orthogonal matrices (P’ denotes the transpose of P).

The routine computes S, U * Q, and V * P, for given real input matrices U, V.

Arguments¶

UPPER (INPUT) logical(lgl)

On entry, if:

UPPER = true : B is upper bidiagonal ;

UPPER = false : B is lower bidiagonal.

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix B.

On exit, D contains the singular values of B.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix whose SVD is desired. E(1) is arbitrary.

On exit, E is destroyed.

The size of E must verify: size( E ) = size( D ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of B.

U (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the matrix U.

On exit, U is overwritten by U * Q.

The shape of U must verify:

size( U, 1 ) > 0 ;

size( U, 2 ) = size( D ) = n .

V (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the matrix V.

On exit, V is overwritten by V * P.

The shape of V must verify:

size( V, 1 ) > 0 ;

size( V, 2 ) = size( D ) = n .

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the algorithm. The algorithm fails to converge if the number of QR sweeps exceeds MAXITER * n. Convergence usually occurs in about 2 * n QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the implicit QR algorithm. MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of n and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated n-by-n symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated n-by-n symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

If, on entry, arguments U and V are n-by-n identity matrices, on exit they are replaced by Q and P, respectively.

This subroutine is adapted from subroutine QRBD given in the reference (1), with modifications suggested in the references (2) and (3) for the application of a set of Givens rotations to the singular vectors, and extensions to the bidiagonal case of the perfect shift strategy presented in the references (4) and (5) for the tridiagonal case.

Furthermore, the computation of the singular vectors is parallelized if OPENMP is used.

Note, finally, that the bidiagonal matrix is not scaled before computing the singular values and vectors. If some of the elements of the bidiagonal matrix are very small or large, it may be appropriate to scale the bidiagonal matrix before calling BD_SVD.

For further details, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Lang, B., 1998:

Using level 3 BLAS in rotation-based algorithms. Siam J. Sci. Comput., Vol. 19, 626-634.

Van Zee, F.G., Van de Geijn, R., and Quintana-Orti, G., 2011:

Restructuring the QR Algorithm for High-Performance Application of Givens Rotations. FLAME Working Note 60. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-11-36.

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, 1013-1034.

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

`subroutine bd_svd2 ( upper, d, e, failure, u, vt, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

BD_SVD2 computes the singular value decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal matrix B:

B = Q * S * P’

, where S is a diagonal matrix with non-negative diagonal elements (the singular values of B), and, Q and P are orthogonal matrices (P’ denotes the transpose of P).

The routine computes S, U * Q, and P’ * VT, for given real input matrices U, VT.

Arguments¶

UPPER (INPUT) logical(lgl)

On entry, if:

UPPER = true : B is upper bidiagonal ;

UPPER = false : B is lower bidiagonal.

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix B.

On exit, D contains the singular values of B.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix whose SVD is desired. E(1) is arbitrary.

On exit, E is destroyed.

The size of E must verify: size( E ) = size( D ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of B.

U (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the matrix U.

On exit, U is overwritten by U * Q.

The shape of U must verify:

size( U, 1 ) > 0 ;

size( U, 2 ) = size( D ) = n .

VT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the matrix VT.

On exit, VT is overwritten by P’ * VT.

The shape of VT must verify:

size( VT, 1 ) = size( D ) = n ;

size( VT, 2 ) > 0 .

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the algorithm. The algorithm fails to converge if the number of QR sweeps exceeds MAXITER * n. Convergence usually occurs in about 2 * n QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the implicit QR algorithm. MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of n and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated n-by-n symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated n-by-n symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

If arguments U and VT are n-by-n identity matrices, on exit they are replaced by Q and P’, respectively.

This subroutine is adapted from subroutine QRBD given in the reference (1), with modifications suggested in the references (2) and (3) for the application of a set of Givens rotations to the singular vectors, and extensions to the bidiagonal case of the perfect shift strategy presented in the references (4) and (5) for the tridiagonal case.

Furthermore, the computation of the singular vectors is parallelized if OPENMP is used.

Note, finally, that the bidiagonal matrix is not scaled before computing the singular values and vectors. If some of the elements of the bidiagonal matrix are very small or large, it may be appropriate to scale the bidiagonal matrix before calling BD_SVD2.

For further details, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Lang, B., 1998:

Using level 3 BLAS in rotation-based algorithms. Siam J. Sci. Comput., Vol. 19, 626-634.

Van Zee, F.G., Van de Geijn, R., and Quintana-Orti, G., 2011:

Restructuring the QR Algorithm for High-Performance Application of Givens Rotations. FLAME Working Note 60. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-11-36.

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, 1013-1034.

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

`subroutine bd_svd ( upper, d, e, failure, u, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

BD_SVD computes the singular value decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal matrix B:

B = Q * S * P’

, where S is a diagonal matrix with non-negative diagonal elements (the singular values of B), and, Q and P are orthogonal matrices (P’ denotes the transpose of P).

The routine computes S and U * Q for a given real input matrix U.

Arguments¶

UPPER (INPUT) logical(lgl)

On entry, if:

UPPER = true : B is upper bidiagonal ;

UPPER = false : B is lower bidiagonal.

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix B.

On exit, D contains the singular values of B.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix whose SVD is desired. E(1) is arbitrary.

On exit, E is destroyed.

The size of E must verify: size( E ) = size( D ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of B.

U (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the matrix U.

On exit, U is overwritten by U * Q.

The shape of U must verify:

size( U, 1 ) > 0 ;

size( U, 2 ) = size( D ) = n .

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. The singular vectors U are rearranged accordingly.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the algorithm. The algorithm fails to converge if the number of QR sweeps exceeds MAXITER * n. Convergence usually occurs in about 2 * n QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the implicit QR algorithm. MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of n and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated n-by-n symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated n-by-n symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

If argument U is a n-by-n identity matrix, on exit it is replaced by Q.

This subroutine is adapted from subroutine QRBD given in the reference (1), with modifications suggested in the references (2) and (3) for the application of a set of Givens rotations to the singular vectors, and extensions to the bidiagonal case of the perfect shift strategy presented in the references (4) and (5) for the tridiagonal case.

Furthermore, the computation of the singular vectors is parallelized if OPENMP is used.

Note, finally, that the bidiagonal matrix is not scaled before computing the singular values and vectors. If some of the elements of the bidiagonal matrix are very small or large, it may be appropriate to scale the bidiagonal matrix before calling BD_SVD.

For further details, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Lang, B., 1998:

Using level 3 BLAS in rotation-based algorithms. Siam J. Sci. Comput., Vol. 19, 626-634.

Van Zee, F.G., Van de Geijn, R., and Quintana-Orti, G., 2011:

Restructuring the QR Algorithm for High-Performance Application of Givens Rotations. FLAME Working Note 60. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-11-36.

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, 1013-1034.

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

`subroutine bd_svd ( upper, d, e, failure, sort, maxiter )`¶

Purpose¶

BD_SVD computes the singular values, S, of a real n-by-n (upper or lower) bidiagonal matrix B:

B = Q * S * P’

, where S is a diagonal matrix with non-negative diagonal elements (the singular values of B), and, Q and P are orthogonal matrices (P’ denotes the transpose of P).

The singular values are computed by the bidiagonal implicit QR method. See the references (1) and (2) for details.

Arguments¶

UPPER (INPUT) logical(lgl)

On entry, if:

UPPER = true : B is upper bidiagonal ;

UPPER = false : B is lower bidiagonal.

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix B.

On exit, D contains the singular values of B.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix whose singular values are desired. E(1) is arbitrary.

On exit, E is destroyed.

The size of E must verify: size( E ) = size( D ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of B.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the algorithm. The algorithm fails to converge if the number of QR sweeps exceeds MAXITER * n. Convergence usually occurs in about 2 * n QR sweeps.

The default is 10.

Further Details¶

This subroutine is adapted from subroutine QRBD in the reference (1).

For further details, see:

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine bd_singval ( d, e, nsing, s, failure, sort, vector, abstol, ls, theta, scaling, init )`¶

Purpose¶

BD_SINGVAL computes all or some of the greatest singular values of a real n-by-n (upper or lower) bidiagonal matrix B by a bisection algorithm.

The Singular Value Decomposition of B is:

B = Q * S * P’

where S is a diagonal matrix with non-negative diagonal elements (the singular values of B), and, Q and P are orthogonal matrices (P’ denotes the transpose of P).

Arguments¶

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix B.

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix whose singular values are desired. E(1) is arbitrary.

The size of E must verify: size( E ) = size( D ) .

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to size(D) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the NSING greatest singular values of B. The other values in S ( S(NSING+1:size(D)) ) are flagged by a quiet NAN.

The size of S must verify: size( S ) = size( D ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. For other values of SORT nothing is done and S(:nsing) may not be sorted in decreasing order of of magnitude.

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used.

The default is VECTOR=false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located if it has been determined to lie in an interval whose width is ABSTOL or less.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | T(GK) | will be used, where | T(GK) | means the 1-norm of the GOLUB-KAHAN tridiagonal form of the bidiagonal matrix B and ULP is the machine precision (distance from 1 to the next larger floating point number).

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold, sqrt(LAMCH(‘S’)), not zero.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to size( D ) .

The default is LS = size(D).

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

The default is THETA = 0.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix B is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated B’ * B tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

The default is not to use the Pal-Walker-Kahan algorithm.

Further Details¶

Let S(i), i=1,…,N=size(D), be the N singular values of the bidiagonal matrix B in decreasing order of magnitude. BD_SINGVAL then computes the LS largest singular values ( or the singular values which are greater or equal to THETA) of B by a bisection method (see the reference (1) below, Sec.8.5 ). The bisection method is applied to an associated 2N by 2N symmetric tridiagonal matrix T (the so-called GOLUB-KAHAN form of B) whose eigenvalues are the singular values of B and their negatives (see the reference (2) below, Sec.3.3 ).

For further details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine bd_singval2 ( d, e, nsing, s, failure, sort, vector, abstol, ls, theta, scaling, init )`¶

Purpose¶

BD_SINGVAL2 computes all or some of the greatest singular values of a real n-by-n (upper or lower) bidiagonal matrix B by a bisection algorithm.

The Singular Value Decomposition of B is:

B = Q * S * P’

where S is a diagonal matrix with non-negative diagonal elements (the singular values of B), and, Q and P are orthogonal matrices (P’ denotes the transpose of P).

Arguments¶

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix B.

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix whose singular values are desired. E(1) is arbitrary.

The size of E must verify: size( E ) = size( D ) .

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to size(D) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the NSING greatest singular values of B. The other values in S ( S(NSING+1:size(D)) ) are flagged by a quiet NAN.

The size of S must verify: size( S ) = size( D ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. For other values of SORT nothing is done and S(:nsing) may not be sorted in decreasing order of of magnitude.

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used.

The default is VECTOR=false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located if its square has been determined to lie in an interval whose width is ABSTOL or less.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | B’ * B | will be used, where | B’ * B | means the 1-norm of the tridiagonal matrix B’ * B ( B’ means the transpose of B) and ULP is the machine precision (distance from 1 to the next larger floating point number).

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold, sqrt(LAMCH(‘S’)), not zero.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to size( D ) .

The default is LS = size( D ).

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

The default is THETA = 0.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix B is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated B’ * B tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

The default is not to use the Pal-Walker-Kahan algorithm.

Further Details¶

Let S(i), i=1,…,N=size(D), be the N singular values of the bidiagonal matrix B in decreasing order of magnitude. BD_SINGVAL2 then computes the LS largest singular values ( or the singular values which are greater or equal to THETA) of B by a bisection method (see the reference (1) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated N by N symmetric tridiagonal matrix B’ * B whose eigenvalues are the squares of the singular values of B by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (2) below, Sec.3.1 ).

BD_SINGVAL2 is faster than BD_SINGVAL, however if relative accuracy for small singular values is required, BD_SINGVAL (which is based on the Golub-Kahan form of the bidiagonal matrix) is the best choice.

For further details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine bd_max_singval ( d, e, nsing, s, failure, abstol, scaling )`¶

Purpose¶

BD_MAX_SINGVAL computes the greatest singular value of a real n-by-n (upper or lower) bidiagonal matrix B by a bisection algorithm.

The Singular Value Decomposition of a bidiagonal matrix B is:

B = Q * S * P’

where S is a diagonal matrix with non-negative diagonal elements (the singular values of B), and, Q and P are orthogonal matrices (P’ denotes the transpose of P).

Arguments¶

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix B.

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix whose singular values are desired. E(1) is arbitrary.

The size of E must verify: size( E ) = size( D ) .

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than 1 if multiple singular values make unique selection of the greatest singular value impossible.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING greatest singular values of B. The other values in S ( S(NSING+1:size(D)) ) are flagged by a quiet NAN.

The size of S must verify: size( S ) = size( D ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located if its square has been determined to lie in an interval whose width is ABSTOL or less.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | B’ * B | will be used, where | B’ * B | means the 1-norm of the tridiagonal matrix B’ * B ( B’ means the transpose of B) and ULP is the machine precision (distance from 1 to the next larger floating point number).

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold, sqrt(LAMCH(‘S’)), not zero.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix B is scaled before computing the greatest singular values.

The default is to scale the bidiagonal matrix.

Further Details¶

BD_MAX_SINGVAL computes the largest singular value of B by a bisection method (see the reference (1) below, Sec.8.5 ).

The bisection method is applied (implicitly) to the associated n-by-n symmetric tridiagonal matrix B’ * B whose eigenvalues are the squares of the singular values of B by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (2) below, Sec.3.1 ).

For further details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine bd_lasq1 ( d, e, failure, maxiter, sort, scaling, ieee, aggdef2, max_win, freq, info )`¶

Purpose¶

BD_LASQ1 computes the singular values of a real n-by-n bidiagonal matrix BD (with diagonal D and off-diagonal E) by the differential quotient difference with shifts (dqds) algorithm and, optionally, with an early aggressive deflation strategy.

The singular values are computed to high relative accuracy, in the absence of denormalization, underflow and overflow. The algorithm was first presented in the reference (1) and the present implementation is described in the references (2), (3) and (4). The early aggressive deflation strategy is described in the reference (4).

Arguments¶

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contain the diagonal elements of the bidiagonal matrix BD whose SVD is desired.

On normal exit, D contains the singular values of BD (e.g., if FAILURE is false). On abnormal exit (e.g. if FAILURE is true), D is overwritten only if INFO is equal to 2.

See description of FAILURE and INFO arguments for further details.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, E(2:) contain the off-diagonal elements of the bidiagonal matrix BD whose singular values are desired. E(1) is arbitrary.

On normal exit, E is unchanged, but, on abnormal exit, E(2:) is overwritten if INFO is equal to 2.

See description of FAILURE and INFO arguments for further details.

The size of E must verify: size( E ) = size( D ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit;

FAILURE = true : indicates that the algorithm did not converge. The specific reasons for the failure are given by the optional argument INFO described below if this argument is specified in the call to BD_LASQ1.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of iterations in the inner loop of the dqds algorithm (see BD_LASQ2 and BD_LASQ2_AGGDEF2 subroutines). MAXITER must be greater than 10, otherwise the default value is used.

The default is 30.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true, the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

IEEE (INPUT, OPTIONAL) logical(lgl)

On entry, logical flag for IEEE or non IEEE arithmetic. When IEEE is set to true, the division in the dqds loop is not protected from incurring an exception (division by zero or overflow) since the powerful feature of arithmetic units conforming to IEEE floating point standard 754 is that computation is not held up by an exception. At the end of the loop, the code tests whether an infinite or an NaN (not a number) occurred and acts appropriately.

The default is to test inside the subroutine if IEEE arithmetic is supported or not.

AGGDEF2 (INPUT, OPTIONAL) logical(lgl)

On entry, if AGGDEF2=true, early aggressive deflation is used in conjunction with the dqds algorithm.

The default is to use early aggressive deflation only if the dimension of the input bidiagonal matrix is sufficiently large (e.g., if size(D)>=20000).

MAX_WIN (INPUT, OPTIONAL) integer(i4b)

On entry, the maximum window size to be used in early aggressive deflation. MAX_WIN must be greater or equal to 5.

This optional argument has an effect only if AGGDEF2 is equal to true.

The default value is max( int(sqrt(real(size(D)))), 5).

FREQ (INPUT, OPTIONAL) integer(i4b)

On entry, the frequency with which early aggressive deflation will be invoked. FREQ must be greater or equal to 1.

This optional argument has an effect only if AGGDEF2 is equal to true.

The default value is 16.

INFO (OUTPUT, OPTIONAL) integer(i4b)

On exit:

INFO = 0 : indicates successful exit;

INFO = 1 : indicates a failure because a split was marked by a positive value in E;

INFO = 2 : indicates that the algorithm did not converge and that the maximum number of iterations in the inner loop of the dqds algorithm is exceeded; On exit, D and E represent a matrix with the same singular values which the calling subroutine could use to finish the computation, or even feed back into BD_LASQ1;

INFO = 3 : indicates that the termination criterion of the outer while loop of the algorithm (in BD_LASQ2 or BD_LASQ2_AGGDEF2 subroutines) was not met (e.g., the program created more than n=size(D) unreduced blocks);

INFO = 4 : indicates that the maximum number of iterations in the inner loop of the dqds algorithm is exceeded (as if INFO=2), but that a bidiagonal matrix with the same singular values as the original one cannot be recovered.

Further Details¶

This subroutine is adapted from the DLASQ1 subroutine in LAPACK version 3.12.0 and the codes (e.g., DLASQDD1 subroutine) for performing early aggressive deflation given in the reference (4).

This subroutine is a variant of the dqds algorithm described in the references (2), (3) and (4). Early aggressive deflation strategy, as described in the reference (4), is included in this version of the dqds algorithm when the optional logical argument AGGDEF2 is used with the value true.

For further details on the dqds algorithm, see:

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Parlett, B.N., and Marques, O.A., 2000:

An implementation of the dqds algorithm (positive case). Linear Algebra Appl., Volume 309, pp. 217-259.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

Nakatsukasa, Y., Aishima, K., and Yamazaki, I., 2012:

dqds with aggressive early deflation. SIAM Journal on Matrix Analysis Applications, Volume 33, No 1, pp. 22-51.

`subroutine bd_dqds ( d, e, failure, maxiter, sort, scaling )`¶

Purpose¶

BD_DQDS computes all the singular values, S, of a real n-by-n (upper or lower) bidiagonal matrix BD:

BD = Q * S * P’

, where S is a diagonal matrix with non-negative diagonal elements (the singular values of BD), and, Q and P are orthogonal matrices (P’ denotes the transpose of P).

The singular values are computed to high relative precision in the absence of denormalization, underflow and overflow with a variant of the differential quotient difference with shifts (dqds) algorithm described in the reference (3).

Arguments¶

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

On exit, D contains the singular values of BD if FAILURE is false, but is unchanged if FAILURE is true.

See description of FAILURE argument for further details.

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD whose singular values are desired. E(1) is arbitrary.

On exit, E is unchanged.

The size of E must verify: size( E ) = size( D ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that the algorithm did not converge and that the maximum number of iterations in the inner loop of the algorithm is exceeded. In that case, D is unchanged on exit.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of iterations in the inner loop of the dqds algorithm. MAXITER must be greater than 5, otherwise the default value is used.

The default is 30.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

Further Details¶

This subroutine is adapted from the variant of the dqds algorithm described in the reference (3), which uses a new shift strategy in the dqds algorithm.

BD_DQDS is a Fortran95/2003 translation and improved version of a Fortran77 subroutine available at:

http://syskiso.fuee.u-fukui.ac.jp/~kkimur/LAPROGNC/LAPROGNC.html

For further details on the dqds algorithm and its implementation, see:

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Parlett, B.N., and Marques, O.A., 2000:

An implementation of the dqds algorithm (positive case). Linear Algebra Appl., Volume 309, pp. 217-259.

Yamashita, T., Kimura, K., Takata, M., and Nakamura, Y., 2013:

An application of the Kato-Temple inequality on matrix eigenvalues to the dqds algorithm for singular values. JSIAM Letters, Volume 5, pp. 21-24.

`subroutine las2 ( f, g, h, ssmin, ssmax )`¶

Purpose¶

LAS2 computes singular values of a 2-by-2 triangular matrix.

Arguments¶

F (INPUT) real(stnd)

On entry, the (1,1) element of the 2-by-2 matrix.

G (INPUT) real(stnd)

On entry, the (1,2) element of the 2-by-2 matrix.

H (INPUT) real(stnd)

On entry, the (2,2) element of the 2-by-2 matrix.

SSMIN (OUTPUT) real(stnd)

On exit, the smaller singular value.

SSMAX (OUTPUT) real(stnd)

On exit, the larger singular value.

Further Details¶

This subroutine is adapted from the DLAS2 subroutine from LAPACK version 3.12.0.

Barring over/underflow, all output quantities are correct to within a few units in the last place (ulps), even in the absence of a guard digit in addition/subtraction.

In IEEE arithmetic, the code works also correctly if one matrix element is infinite.

`function singvalues ( mat, sort, mul_size, maxiter, dqds )`¶

Purpose¶

Function SINGVALUES computes the singular values of a real m-by-n matrix MAT. The Singular Value Decomposition (SVD) is written

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper or lower bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal. The singular values SIGMA of the bidiagonal matrix BD are then computed by the bidiagonal implicit QR algorithm (if DQDS=false) or the dqds algorithm (if DQDS=true). If these algorithms fail to converge, function SINGVALUES returns a min(m,n)-vector filled with a quiet NaN.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= max(m,n), otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is 32.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form BD of MAT fails to converge if the number of QR sweeps exceeds MAXITER*min(m,n). Convergence usually occurs in about 2*min(m,n) QR sweeps.

This argument has no effect if DQDS is equal to true.

The default is 10.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values of the intermediate min(m,n)-by-min(m,n) bidiagonal matrix BD are computed.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition. Moreover, the dqds algorithm is usually also faster than the bidiagonal implicit QR algorithm.

If DQDS is set to false, singular values are computed with the bidiagonal implicit QR algorithm applied to the associated min(m,n)-by-min(m,n) bidiagonal matrix BD.

The default is true.

Further Details¶

Computing the singular values of a rectangular matrix in function SINGVALUES consists of two steps:

reduction of the rectangular matrix to bidiagonal form BD, see the references (1) and (2);

computation of the singular values of the min(m,n)-by-min(m,n) bidiagonal matrix BD by the bidiagonal implicit QR or dqds algorithms, see the references(1), (2), (3) and (4).

Note that if max(m,n) is much larger than min(m,n) the rectangular matrix is first reduced to upper or lower triangular form by a QR or LQ factorization and the bdiagonal reduction algorithm is applied to the resulting triangular factor. The singular values of the rectangular rmatrix are then obtained from those of the triangular factor.

If the SVD algorithm did not converge and full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form BD of MAT or in the dqds algorithm applied to BD, function SINGVALUES returns a min(m,n)-vector filled with NAN() function.

For further details, on the SVD of a rectangular matrix and the algorithms to compute it or simply its singular values, see the references (1), (2), (3) or (4). In SINGVALUES function, the reduction to bidiagonal form by orthogonal transformations is parallelized if OPENMP is used, but not the computation of the singular values.

For more informations, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine select_singval_cmp ( mat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauq, taup, scaling, init, dqds )`¶

Purpose¶

SELECT_SINGVAL_CMP computes all or some of the greatest singular values of a real m-by-n matrix MAT.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper or lower bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal (see the reference (1) below).

The singular values SIGMA of the bidiagonal matrix BD, which are also the singular values of MAT, are then computed by a bisection algorithm applied to the Tridiagonal Golub-Kahan (TGK) form of the bidiagonal matrix BD (see the reference (2) below, Sec.3.3) or the differential quotient difference with shifts (dqds) algorithm (see the references (3) and (4)).

The routine outputs (parts of) SIGMA and optionally Q and P (in packed form), and BD for a given matrix MAT. SIGMA, Q, P and BD may then be used to obtain selected or all singular vectors of MAT with subroutines BD_INVITER2 or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is destroyed and if TAUQ or TAUP are present MAT is overwritten as follows:

if m >= n, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors;

if m < n, the elements below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements on and above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors.

See Further Details.

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible and bisection is used.

If none of the optional arguments LS and THETA are used, NSING is set to min( size(MAT,1) , size(MAT,2) ) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING singular values of MAT. The other values in S ( S(NSING+1:) ) are flagged by a quiet NAN.

The size of S must be min( size(MAT,1) , size(MAT,2) ).

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection or dqds algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= n, otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is min( 32, n ) .

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used to compute the singular values SIGMA of the bidiagonal matrix BD.

This argument has no effect if DQDS is equal to true.

The default is VECTOR = false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it has has been determined to lie in an interval whose width is ABSTOL or less.

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold (e.g., sqrt(safmin)=sqrt(LAMCH(‘S’)), not zero.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | T(GK) | will be used, where | T(GK) | means the 1-norm of the GOLUB-KAHAN tridiagonal form of the bidiagonal matrix BD and ULP is the machine precision (distance from 1 to the next larger floating point number).

This argument has no effect if DQDS is equal to true.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to min( size(MAT,1) , size(MAT,2) ).

This argument has no effect if DQDS is equal to true.

The default is LS = min( size(MAT,1) , size(MAT,2) ).

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

This argument has no effect if DQDS is equal to true.

The default is THETA = 0.

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ).

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate bidiagonal matrix BD:

if m >= n, E(i) = BD(i-1,i) for i = 2,3,…,n;

if m < n, E(i) = BD(i,i-1) for i = 2,3,…,m.

The size of E must be min( size(MAT,1) , size(MAT,2) ).

TAUQ (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See Further Details.

The size of TAUQ must be min( size(MAT,1) , size(MAT,2) ).

TAUP (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix P. See Further Details.

The size of TAUP must be min( size(MAT,1) , size(MAT,2) ).

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated BD’ * BD tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

This argument has no effect if DQDS is equal to true.

The default is not to use the Pal-Walker-Kahan algorithm.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed.

If DQDS is set to true, all singular values are computed with the dqds algorithm The dqds algorithm is much faster than bisection when all singular values are requested in many cases and delivers about the same accuracy than bisection.

If DQDS is set to false, the requested singular values are computed by bisection.

The default is to use the dqds algorithm if all the singular values are requested and to use bisection if selected singular values are requested (e.g., if the optional arguments LS or THETA are present).

Further Details¶

The matrices Q and P are represented as products of elementary reflectors:

If m >= n,

Q = H(1) * H(2) * … * H(n) and P = G(1) * G(2) * … * G(n-1)

Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors. Moreover, u(1:i-1) = 0 and v(1:i) = 0.

If TAUQ or TAUP are present:

u(i:m) is stored on exit in MAT(i:m,i);

v(i+1:n) is stored on exit in MAT(i,i+1:n).

If TAUQ is present : tauq is stored in TAUQ(i).

If TAUP is present : taup is stored in TAUP(i).

If m < n,

Q = H(1) * H(2) * … * H(m-1) and P = G(1) * G(2) * … * G(m)

Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors. Moreover, u(1:i) = 0 and v(1:i-1) = 0.

If TAUQ or TAUP are present:

u(i+1:m) is stored on exit in MAT(i+1:m,i);

v(i:n) is stored on exit in MAT(i,i:n).

If TAUQ is present : tauq is stored in TAUQ(i).

If TAUP is present : taup is stored in TAUP(i).

The contents of MAT on exit, if TAUQ or TAUP are present, are illustrated by the following examples:

m = 6 and n = 5 (m >= n):

( u1 v1 v1 v1 v1 )

( u1 u2 v2 v2 v2 )

( u1 u2 u3 v3 v3 )

( u1 u2 u3 u4 v4 )

( u1 u2 u3 u4 u5 )

( u1 u2 u3 u4 u5 )

m = 5 and n = 6 (m < n):

( v1 v1 v1 v1 v1 v1 )

( u1 v2 v2 v2 v2 v2 )

( u1 u2 v3 v3 v3 v3 )

( u1 u2 u3 v4 v4 v4 )

( u1 u2 u3 u4 v5 v5 )

where ui denotes an element of the vector defining H(i), and vi an element of the vector defining G(i).

Now, let SIGMA(i), i=1,…,N=min(m,n), be the singular values of the intermediate bidiagonal matrix BD in decreasing order of magnitude.

If DQDS is equal to false, the subroutine computes the LS largest singular values (or the singular values which are greater or equal to THETA) of BD by a bisection method (see the reference (1) below, Sec.8.5). The bisection method is applied to an associated 2.N-by-2.N symmetric tridiagonal matrix T (the so-called GOLUB-KAHAN form of BD) whose eigenvalues are the singular values of BD and their negatives (see the reference (2) below).

On the other hand, if DQDS is equal to true, all the singular values of BD are computed by the dqds algorithm (see the references (3) and (4).

For further details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine select_singval_cmp ( mat, rlmat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauo, tauq, taup, scaling, init, dqds )`¶

Purpose¶

SELECT_SINGVAL_CMP computes all or some of the greatest singular values of a real m-by-n matrix MAT.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper bidiagonal form by a two-step algorithm :

If m >= n, a QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular. In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

If m < n, an LQ factorization of the real m-by-n matrix MAT is first computed

MAT = L * O

where O is orthogonal and L is lower triangular. In a second step, the m-by-m lower triangular matrix L is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * L * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

SELECT_SINGVAL_CMP computes O, BD, Q and P.

The singular values SIGMA of the bidiagonal matrix BD, which are also the singular values of MAT, are then computed by a bisection algorithm applied to the Tridiagonal Golub-Kahan (TGK) form of the bidiagonal matrix BD (see the reference (2) below, Sec.3.3) or the differential quotient difference with shifts (dqds) algorithm (see the references (3) and (4)).

The routine outputs (parts of) SIGMA, and optionally O, Q and P (in packed form), and BD for a given matrix MAT. The matrix O is stored in factored form in the argument MAT if the optional argument TAUO is present or explicitly computed if this argument is absent. SIGMA, O, Q, P and BD may then be used to obtain selected singular vectors of MAT with subroutines BD_INVITER2 or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, if:

m >= n, the elements on and below the diagonal, with the array TAUO, represent the orthogonal matrix O of the QR factorization of MAT, as a product of elementary reflectors, if the argument TAUO is present. Otherwise, the argument MAT contains the first n columns of O on output.

m < n, the elements on and above the diagonal, with the array TAUO, represent the orthogonal matrix O of the LQ factorization of MAT, as a product of elementary reflectors, if the argument TAUO is present. Otherwise, the argument MAT contains the first m rows of O on output.

See Further Details.

RLMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors;

See Further Details.

The shape of RLMAT must verify: size( RLMAT, 1 ) = size( RLMAT, 2 ) = min( size(MAT,1) , size(MAT,2) ).

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to min( size(MAT,1) , size(MAT,2) ) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING singular values of MAT. The other values in S ( S(NSING+1:) ) are flagged by a quiet NAN.

The size of S must be min( size(MAT,1) , size(MAT,2) ).

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection or dqds algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= n, otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is min( 32, n ) .

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used to compute the singular values SIGMA of the bidiagonal matrix BD.

This argument has no effect if DQDS is equal to true.

The default is VECTOR = false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it has has been determined to lie in an interval whose width is ABSTOL or less.

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold (e.g., sqrt(safmin)=sqrt(LAMCH(‘S’)), not zero.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | T(GK) | will be used, where | T(GK) | means the 1-norm of the GOLUB-KAHAN tridiagonal form of the bidiagonal matrix BD and ULP is the machine precision (distance from 1 to the next larger floating point number).

This argument has no effect if DQDS is equal to true.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to min( size(MAT,1) , size(MAT,2) ).

This argument has no effect if DQDS is equal to true.

The default is LS = min( size(MAT,1) , size(MAT,2) ).

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

This argument has no effect if DQDS is equal to true.

The default is THETA = 0.

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ).

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate upper bidiagonal matrix BD:

E(i) = BD(i-1,i) for i = 2,3,…,k;

The size of E must be min( size(MAT,1) , size(MAT,2) ).

TAUO (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix O of the QR or LQ decomposition of MAT.

If the optional argument TAUO is present, the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on exit.

If the optional argument TAUO is absent, the orthogonal matrix O is explicitly generated and stored in the argument MAT on exit.

See description of the argument MAT above and Further Details below.

The size of TAUO must be min( size(MAT,1) , size(MAT,2) ).

TAUQ (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See Further Details.

The size of TAUQ must be min( size(MAT,1) , size(MAT,2) ).

TAUP (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix P. See Further Details.

The size of TAUP must be min( size(MAT,1) , size(MAT,2) ).

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

This argument has no effect if DQDS is equal to true.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated BD’ * BD tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

The default is not to use the Pal-Walker-Kahan algorithm.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed.

If DQDS is set to true, all singular values are computed with the dqds algorithm The dqds algorithm is much faster than bisection when all singular values are requested in many cases and delivers about the same accuracy than bisection.

If DQDS is set to false, the requested singular values are computed by bisection.

The default is to use the dqds algorithm if all the singular values are requested and to use bisection if selected singular values are requested (e.g., if the optional arguments LS or THETA are present).

Further Details¶

If m >= n, the matrix O of the QR factorization of MAT is represented as a product of elementary reflectors

O = W(1) * W(2) * … * W(n)

Each W(i) has the form

W(i) = I + tauo * ( v * v’ ) ,

where tauo is a real scalar and v is a real m-element vector with v(1:i-1) = 0. v(i:m) is stored on exit in MAT(i:m,i) and tauo in TAUO(i). If the optional argument TAUO is absent, the first n columns of O are generated and stored in the argument MAT.

If m < n, The matrix O of the LQ factorization of MAT is represented as a product of elementary reflectors

O = W(m) * … * W(2) * W(1)

Each W(i) has the form

W(i) = I + tauo * ( v * v’ ) ,

where tauo is a real scalar and v is a real n-element vector with v(1:i-1) = 0. v(i:n) is stored on exit in MAT(i,i:n) and tauo in TAUO(i). If the optional argument TAUO is absent, the first m rows of O are generated and stored in the argument MAT.

The matrix O is stored in factored form if the optional argument TAUO is present or explicitly computed if this argument is absent.

A blocked algorithm is used for computing the QR or LQ factorization of MAT. Furthermore, the computations are parallelized if OPENMP is used.

After, the initial QR or LQ factorization of MAT, the (upper or lower) triangular matrix is reduced to upper bidiagonal form BD.

The matrices Q and P of the bidiagonal factorization of the triangular matrix R or L are represented as products of elementary reflectors:

Q = H(1) * H(2) * … * H(k) and P = G(1) * G(2) * … * G(k-1)

, where k = min( size(MAT,1) , size(MAT,2) ). Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors; u(1:i-1) = 0 and u(i:min(m,n)) is stored on exit in RLMAT(i:min(m,n),i); v(1:i) = 0 and v(i+1:min(m,n)) is stored on exit in RLMAT(i,i+1:min(m,n)); tauq is stored in TAUQ(i) and taup in TAUP(i).

The contents of RLMAT on exit are illustrated by the following example:

m = 6 and n = 5 (m >= n):

( u1 v1 v1 v1 v1 )

( u1 u2 v2 v2 v2 )

( u1 u2 u3 v3 v3 )

( u1 u2 u3 u4 v4 )

( u1 u2 u3 u4 u5 )

where ui denotes an element of the vector defining H(i), and vi an element of the vector defining G(i).

Now, let SIGMA(i), i=1,…,N=min(m,n), be the singular values of the intermediate bidiagonal matrix BD in decreasing order of magnitude.

If DQDS is equal to false, the subroutine computes the LS largest singular values (or the singular values which are greater or equal to THETA) of BD by a bisection method (see the reference (1) below, Sec.8.5). The bisection method is applied to an associated 2.N-by-2.N symmetric tridiagonal matrix T (the so-called GOLUB-KAHAN form of BD) whose eigenvalues are the singular values of BD and their negatives (see the reference (2) below).

On the other hand, if DQDS is equal to true, all the singular values of BD are computed by the dqds algorithm (see the references (3) and (4).

For further details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine select_singval_cmp2 ( mat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauq, taup, scaling, init, dqds )`¶

Purpose¶

SELECT_SINGVAL_CMP2 computes all or some of the greatest singular values of a real m-by-n matrix MAT.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper or lower bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal (see the reference (1) below).

The singular values SIGMA of the bidiagonal matrix BD, which are also the singular values of MAT, are then computed by a bisection algorithm (see the reference (1) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated min(m,n)-by-min(m,n) symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (2) below, Sec.3.1 ). Alternatively, at the user option, all singular values can be computed by the differential quotient difference with shifts (dqds) algorithm (see the references (3) and (4)).

The routine outputs (parts of) SIGMA and optionally Q and P (in packed form), and BD for a given matrix MAT. SIGMA, Q, P and BD may then be used to obtain selected or all singular vectors of MAT with subroutines BD_INVITER2 or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is destroyed and if TAUQ or TAUP are present MAT is overwritten as follows:

if m >= n, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors;

if m < n, the elements below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements on and above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors.

See Further Details.

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to min( size(MAT,1) , size(MAT,2) ) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING singular values of MAT. The other values in S ( S(NSING+1:) ) are flagged by a quiet NAN.

The size of S must be min( size(MAT,1) , size(MAT,2) ).

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection or dqds algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= n, otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is min( 32, n ) .

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used to compute the singular values SIGMA of the bidiagonal matrix BD.

This argument has no effect if DQDS is equal to true.

The default is VECTOR = false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it has has been determined to lie in an interval whose width is ABSTOL or less.

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold (e.g., sqrt(safmin)=sqrt(LAMCH(‘S’)), not zero.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | BD’ * BD | will be used, where | BD’ * BD | means the 1-norm of the tridiagonal matrix BD’ * BD ( BD’ means the transpose of BD) and ULP is the machine precision (distance from 1 to the next larger floating point number).

This argument has no effect if DQDS is equal to true.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to min( size(MAT,1) , size(MAT,2) ).

This argument has no effect if DQDS is equal to true.

The default is LS = min( size(MAT,1) , size(MAT,2) ).

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

This argument has no effect if DQDS is equal to true.

The default is THETA = 0.

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ).

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate bidiagonal matrix BD:

if m >= n, E(i) = BD(i-1,i) for i = 2,3,…,n;

if m < n, E(i) = BD(i,i-1) for i = 2,3,…,m.

The size of E must be min( size(MAT,1) , size(MAT,2) ).

TAUQ (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See Further Details.

The size of TAUQ must be min( size(MAT,1) , size(MAT,2) ).

TAUP (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix P. See Further Details.

The size of TAUP must be min( size(MAT,1) , size(MAT,2) ).

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated BD’ * BD tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

This argument has no effect if DQDS is equal to true.

The default is not to use the Pal-Walker-Kahan algorithm.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed.

If DQDS is set to true, all singular values are computed with the dqds algorithm The dqds algorithm is much faster than bisection when all singular values are requested in many cases and delivers about the same accuracy than bisection.

If DQDS is set to false, the requested singular values are computed by bisection.

The default is to use the dqds algorithm if all the singular values are requested and to use bisection if selected singular values are requested (e.g., if the optional arguments LS or THETA are present).

Further Details¶

The matrices Q and P are represented as products of elementary reflectors:

If m >= n,

Q = H(1) * H(2) * … * H(n) and P = G(1) * G(2) * … * G(n-1)

Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors. Moreover, u(1:i-1) = 0 and v(1:i) = 0.

If TAUQ or TAUP are present:

u(i:m) is stored on exit in MAT(i:m,i);

v(i+1:n) is stored on exit in MAT(i,i+1:n).

If TAUQ is present : tauq is stored in TAUQ(i). If TAUP is present : taup is stored in TAUP(i).

If m < n,

Q = H(1) * H(2) * … * H(m-1) and P = G(1) * G(2) * … * G(m)

Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors. Moreover, u(1:i) = 0 and v(1:i-1) = 0.

If TAUQ or TAUP are present:

u(i+1:m) is stored on exit in MAT(i+1:m,i);

v(i:n) is stored on exit in MAT(i,i:n).

If TAUQ is present : tauq is stored in TAUQ(i).

If TAUP is present : taup is stored in TAUP(i).

The contents of MAT on exit, if TAUQ or TAUP are present, are illustrated by the following examples:

m = 6 and n = 5 (m >= n):

( u1 v1 v1 v1 v1 )

( u1 u2 v2 v2 v2 )

( u1 u2 u3 v3 v3 )

( u1 u2 u3 u4 v4 )

( u1 u2 u3 u4 u5 )

( u1 u2 u3 u4 u5 )

m = 5 and n = 6 (m < n):

( v1 v1 v1 v1 v1 v1 )

( u1 v2 v2 v2 v2 v2 )

( u1 u2 v3 v3 v3 v3 )

( u1 u2 u3 v4 v4 v4 )

( u1 u2 u3 u4 v5 v5 )

where ui denotes an element of the vector defining H(i), and vi an element of the vector defining G(i).

Now, let SIGMA(i), i=1,…,N=min(m,n), be the singular values of the intermediate bidiagonal matrix BD in decreasing order of magnitude.

If DQDS is equal to false, the subroutine computes the LS largest singular values (or the singular values which are greater or equal to THETA) of BD by a bisection method (see the reference (1) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated N-by-N symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (2) below, Sec.3.1 ).

On the other hand, if DQDS is equal to true, all the singular values of BD are computed by the dqds algorithm (see the references (3) and (4).

When bisection is used, SELECT_SINGVAL_CMP2 subroutine is less accurate, but faster, than SELECT_SINGVAL_CMP subroutine since SELECT_SINGVAL_CMP works on the 2.N-by-2.N symmetric tridiagonal GOLUB-KAHAN form of BD, while SELECT_SINGVAL_CMP2 works implicitly on the associated N-by-N symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD.

For further details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine select_singval_cmp2 ( mat, rlmat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauo, tauq, taup, scaling, init, dqds )`¶

Purpose¶

SELECT_SINGVAL_CMP2 computes all or some of the greatest singular values of a real m-by-n matrix MAT.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper bidiagonal form by a two-step algorithm :

If m >= n, a QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular. In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

If m < n, an LQ factorization of the real m-by-n matrix MAT is first computed

MAT = L * O

where O is orthogonal and L is lower triangular. In a second step, the m-by-m lower triangular matrix L is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * L * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

SELECT_SINGVAL_CMP2 computes O, BD, Q and P.

The singular values SIGMA of the bidiagonal matrix BD, which are also the singular values of MAT, are then computed by a bisection algorithm (see the reference (1) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated min(m,n)-by-min(m,n) symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (2) below, Sec.3.1 ). Alternatively, at the user option, all singular values can be computed by the differential quotient difference with shifts (dqds) algorithm (see the references (3) and (4)).

The routine outputs (parts of) SIGMA, and optionally O, Q and P (in packed form), and BD for a given matrix MAT. The matrix O is stored in factored form in the argument MAT if the optional argument TAUO is present or explicitly computed if this argument is absent. SIGMA, O, Q, P and BD may then be used to obtain selected singular vectors of MAT with subroutines BD_INVITER2 or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, if:

m >= n, the elements on and below the diagonal, with the array TAUO, represent the orthogonal matrix O of the QR factorization of MAT, as a product of elementary reflectors, if the argument TAUO is present. Otherwise, the argument MAT contains the first n columns of O on output.

m < n, the elements on and above the diagonal, with the array TAUO, represent the orthogonal matrix O of the LQ factorization of MAT, as a product of elementary reflectors, if the argument TAUO is present. Otherwise, the argument MAT contains the first m rows of O on output.

See Further Details.

RLMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors;

See Further Details.

The shape of RLMAT must verify: size( RLMAT, 1 ) = size( RLMAT, 2 ) = min( size(MAT,1) , size(MAT,2) ).

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to min( size(MAT,1) , size(MAT,2) ) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING singular values of MAT. The other values in S ( S(NSING+1:) ) are flagged by a quiet NAN.

The size of S must be min( size(MAT,1) , size(MAT,2) ).

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= n, otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is min( 32, n ) .

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used to compute the singular values SIGMA of the bidiagonal matrix BD.

This argument has no effect if DQDS is equal to true.

The default is VECTOR = false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it has has been determined to lie in an interval whose width is ABSTOL or less.

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold, sqrt(LAMCH(‘S’)), not zero.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | BD’ * BD | will be used, where | BD’ * BD | means the 1-norm of the tridiagonal matrix BD’ * BD ( BD’ means the transpose of BD) and ULP is the machine precision (distance from 1 to the next larger floating point number).

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to min( size(MAT,1) , size(MAT,2) ).

This argument has no effect if DQDS is equal to true.

The default is LS = min( size(MAT,1) , size(MAT,2) ).

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

This argument has no effect if DQDS is equal to true.

The default is THETA = 0.

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ).

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate upper bidiagonal matrix BD:

E(i) = BD(i-1,i) for i = 2,3,…,k;

The size of E must be min( size(MAT,1) , size(MAT,2) ).

TAUO (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix O of the QR or LQ decomposition of MAT.

If the optional argument TAUO is present, the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on exit.

If the optional argument TAUO is absent, the orthogonal matrix O is explicitly generated and stored in the argument MAT on exit.

See description of the argument MAT above and Further Details below.

The size of TAUO must be min( size(MAT,1) , size(MAT,2) ).

TAUQ (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See Further Details.

The size of TAUQ must be min( size(MAT,1) , size(MAT,2) ).

TAUP (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix P. See Further Details.

The size of TAUP must be min( size(MAT,1) , size(MAT,2) ).

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated BD’ * BD tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

This argument has no effect if DQDS is equal to true.

The default is not to use the Pal-Walker-Kahan algorithm.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed.

If DQDS is set to true, all singular values are computed with the dqds algorithm The dqds algorithm is much faster than bisection when all singular values are requested in many cases and delivers about the same accuracy than bisection.

If DQDS is set to false, the requested singular values are computed by bisection.

The default is to use the dqds algorithm if all the singular values are requested and to use bisection if selected singular values are requested (e.g., if the optional arguments LS or THETA are present).

Further Details¶

If m >= n, the matrix O of the QR factorization of MAT is represented as a product of elementary reflectors

O = W(1) * W(2) * … * W(n)

Each W(i) has the form

W(i) = I + tauo * ( v * v’ ) ,

where tauo is a real scalar and v is a real m-element vector with v(1:i-1) = 0. v(i:m) is stored on exit in MAT(i:m,i) and tauo in TAUO(i). If the optional argument TAUO is absent, the first n columns of O are generated and stored in the argument MAT.

If m < n, The matrix O of the LQ factorization of MAT is represented as a product of elementary reflectors

O = W(m) * … * W(2) * W(1)

Each W(i) has the form

W(i) = I + tauo * ( v * v’ ) ,

where tauo is a real scalar and v is a real n-element vector with v(1:i-1) = 0. v(i:n) is stored on exit in MAT(i,i:n) and tauo in TAUO(i). If the optional argument TAUO is absent, the first m rows of O are generated and stored in the argument MAT.

The matrix O is stored in factored form if the optional argument TAUO is present or explicitly computed if this argument is absent.

A blocked algorithm is used for computing the QR or LQ factorization of MAT. Furthermore, the computations are parallelized if OPENMP is used.

After, the initial QR or LQ factorization of MAT, the (upper or lower) triangular matrix is reduced to upper bidiagonal form BD.

The matrices Q and P of the bidiagonal factorization of the triangular matrix R or L are represented as products of elementary reflectors:

Q = H(1) * H(2) * … * H(k) and P = G(1) * G(2) * … * G(k-1)

, where k = min( size(MAT,1) , size(MAT,2) ). Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors; u(1:i-1) = 0 and u(i:min(m,n)) is stored on exit in RLMAT(i:min(m,n),i); v(1:i) = 0 and v(i+1:min(m,n)) is stored on exit in RLMAT(i,i+1:min(m,n)); tauq is stored in TAUQ(i) and taup in TAUP(i).

The contents of RLMAT on exit are illustrated by the following example:

m = 6 and n = 5 (m >= n):

( u1 v1 v1 v1 v1 )

( u1 u2 v2 v2 v2 )

( u1 u2 u3 v3 v3 )

( u1 u2 u3 u4 v4 )

( u1 u2 u3 u4 u5 )

where ui denotes an element of the vector defining H(i), and vi an element of the vector defining G(i).

Now, let SIGMA(i), i=1,…,N=min(m,n), be the singular values of the intermediate bidiagonal matrix BD in decreasing order of magnitude.

If DQDS is equal to false, the subroutine computes the LS largest singular values (or the singular values which are greater or equal to THETA) of BD by a bisection method (see the reference (1) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated N-by-N symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (2) below, Sec.3.1 ).

On the other hand, if DQDS is equal to true, all the singular values of BD are computed by the dqds algorithm (see the references (3) and (4).

When bisection is used, SELECT_SINGVAL_CMP2 subroutine is less accurate, but faster, than SELECT_SINGVAL_CMP subroutine since SELECT_SINGVAL_CMP works on the 2.N-by-2.N symmetric tridiagonal GOLUB-KAHAN form of BD, while SELECT_SINGVAL_CMP2 works implicitly on the associated N-by-N symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD.

For further details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine select_singval_cmp3 ( mat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, p, gen_p, reortho, scaling, init, dqds, failure_bd )`¶

Purpose¶

SELECT_SINGVAL_CMP3 computes all or some of the greatest singular values of a real m-by-n matrix MAT with m>=n.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal (see the reference (5) below). The Ralha-Barlow one-sided method is used for this purpose (see the references (1) to (3) below).

The singular values SIGMA of the bidiagonal matrix BD, which are also the singular values of MAT, are then computed by a bisection algorithm applied to the Tridiagonal Golub-Kahan (TGK) of the bidiagonal matrix BD (see the reference (6) below, Sec.3.3) or the differential quotient difference with shifts (dqds) algorithm (see the references (7) and (8)).

The routine outputs (parts of) SIGMA, Q and optionally P (in packed form) and BD for a given matrix MAT. SIGMA, Q, P and BD may then be used to obtain selected singular vectors of MAT with subroutines BD_INVITER2 or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is overwritten with the first n columns of Q (stored column-wise), the orthogonal matrix used to reduce MAT to bidiagonal form as returned by subroutine BD_CMP2 in its argument MAT.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to size(MAT,2) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING singular values of MAT. The other values in S ( S(NSING+1:) ) are flagged by a quiet NAN.

The size of S must be equal to size( MAT, 2 ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection or dqds algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= n, otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is min( 32, n ).

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it has has been determined to lie in an interval whose width is ABSTOL or less.

This argument has no effect if DQDS is equal to true.

The default is VECTOR = false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it been determined to lie in an interval whose width is ABSTOL or less.

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold (e.g., sqrt(safmin)=sqrt(LAMCH(‘S’)), not zero.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | T(GK) | will be used, where | T(GK) | means the 1-norm of the GOLUB-KAHAN tridiagonal form of the bidiagonal matrix BD and ULP is the machine precision (distance from 1 to the next larger floating point number).

This argument has no effect if DQDS is equal to true.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to size( MAT, 2 ) = n .

This argument has no effect if DQDS is equal to true.

The default is LS = size( MAT, 2 ) = n .

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

This argument has no effect if DQDS is equal to true.

The default is THETA = 0 .

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be equal to size( MAT, 2 ) = n .

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate upper bidiagonal matrix BD:

E(i) = B(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must be equal to size( MAT, 2 ) = n .

P (OUTPUT, OPTIONAL) real(stnd), dimension(:,:)

On exit, P is overwritten with the n-by-n orthogonal matrix P (stored column-wise or in packed form), the orthogonal matrix used to reduce MAT to bidiagonal form as returned by subroutine BD_CMP2 in its argument P.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = n .

GEN_P (INPUT, OPTIONAL) logical(lgl)

On entry, this optional argument has an effect only if the optional argument P is also used.

In this case, if the optional argument GEN_P is used and is set to true, the orthogonal matrix P used to reduce MAT to bidiagonal form is generated on output of the subroutine in its argument P.

If GEN_P is set to false, the orthogonal matrix P is stored in factored form as products of elementary reflectors in the lower triangle of the array P.

See the description of BD_CMP2 subroutine for more details.

The default is true.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization of the matrix Q, used to reduce MAT to bidiagonal form, is performed (when needed). If this optional logical argument is set to false, reorthogonalization of Q is never performed and the columns of the matrix Q can be far from orthogonal to each other for input matrices MAT with a large condition number.

See the description of BD_CMP2 subroutine for more details.

The default is REORTHO = false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated BD’ * BD tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

This argument has no effect if DQDS is equal to true.

The default is not to use the Pal-Walker-Kahan algorithm.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed.

If DQDS is set to true, all singular values are computed with the dqds algorithm The dqds algorithm is much faster than bisection when all singular values are requested in many cases and delivers about the same accuracy than bisection.

If DQDS is set to false, the requested singular values are computed by bisection.

The default is to use the dqds algorithm if all the singular values are requested and to use bisection if selected singular values are requested (e.g., if the optional arguments LS or THETA are present).

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT.

FAILURE_BD = true : indicates that MAT is nearly singular and some loss of orthogonality can be expected in the Ralha-Barlow bidiagonalization algorithm.

See description of the BD_CMP2 subroutine for further details.

Further Details¶

The matrices Q, P and BD are computed with the help of the Ralha-Barlow one-sided method. Q is computed by a recurrence relationship and the first n columns of Q are stored in the argument MAT on exit. P is computed as a product of n-1 elementary reflectors (e.g. Householder transformations):

P = G(1) * G(2) * … * G(n-1)

Each G(i) has the form:

G(i) = I + taup * v * v’

where taup is a real scalar, and v is a real vector. IF GEN_P is used and set to false, the n-1 G(i) elementary reflectors are stored in the lower triangle of the array P.

For the G(i) reflector, taup is stored in P(i+1,1) and v is stored in P(i+1:n,i+1). In addition, P(1,1) is set to -1 if GEN_P=false and is equal to 1 if GEN_P=true.

In other words, the value of P(1,1) indicates if the orthogonal matrix P is stored in factored form or not. Note that if n is equal to 1, elementary reflectors are not needed and consequently P(1,1) is set to 1, independently of the value of GEN_P.

This is the blocked version of the Ralha-Barlow one-sided algorithm. See the references (1), (2) and (3) for further details. Furthermore, the algorithm is parallelized if OPENMP is used.

Since Q is computed by a recurrence relationship, a loss of orthogonality of Q can be observed when the rectangular matrix MAT is singular or nearly singular or has a large condition number, see the reference (2) for details.

To correct partly this deficiency, partial reorthogonalization can be performed to ensure orthogonality at the expense of speed of computation. The reorthogonalization uses the Gram-Schmidt method described in the reference (4) and is activated if the optional logical argument REORTHO is set to true.

The reference (2) also explains how to handle the case of an exactly singular matrix MAT (a very rare event). However, in this subroutine, the partial reorthogonalization described in the reference (4), which is used when REORTHO is set true, corrects automatically this problem.

Now, let SIGMA(i), i=1,…,n, be the singular values of the intermediate bidiagonal matrix BD in decreasing order of magnitude.

If DQDS is equal to false, the subroutine computes the LS largest singular values (or the singular values which are greater or equal to THETA) of BD by a bisection method (see the reference (5) below, Sec.8.5 ). The bisection method is applied to an associated 2.n-by-2.n symmetric tridiagonal matrix T (the so-called GOLUB-KAHAN form of BD) whose eigenvalues are the singular values of BD and their negatives (see the reference (6) below).

On the other hand, if DQDS is equal to true, all the singular values of BD are computed by the dqds algorithm (see the references (7) and (8).

For further details, see:

Ralha, R.M.S., 2003:

One-sided reduction to bidiagonal form. Linear Algebra Appl., No 358, pp. 219-238.

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Stewart, G.W., 2007:

Block Gram-Schmidt Orthogonalization. Report TR-4823, Department of Computer Science, College Park, University of Maryland.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine select_singval_cmp3 ( mat, rmat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauo, p, gen_p, reortho, scaling, init, dqds, failure_bd )`¶

Purpose¶

SELECT_SINGVAL_CMP3 computes all or some of the greatest singular values of a real m-by-n matrix MAT with m>=n.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper bidiagonal form by a two-step algorithm :

A QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular.

In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix (see the reference (5) below). The Ralha-Barlow one-sided method is used in this second step (see the references (1) to (3) below).

SELECT_SINGVAL_CMP3 computes O, BD, Q and P.

The singular values SIGMA of the bidiagonal matrix BD, which are also the singular values of MAT, are then computed by a bisection algorithm applied to the Tridiagonal Golub-Kahan (TGK) of the bidiagonal matrix BD (see the reference (6) below, Sec.3.3) or the differential quotient difference with shifts (dqds) algorithm (see the references (7) and (8)).

The routine outputs (parts of) SIGMA, and optionally O, Q and P and BD for a given matrix MAT. The matrix O is stored in factored form in the argument MAT if the optional argument TAUO is present or explicitly computed if this argument is absent. The first n columns of Q are stored in the argument RMAT. Finally, P is stored in the optional argument P. P is stored in factored form or explicitly generated depending on the value of the optional logical argument GEN_P.

SIGMA, O, Q, P and BD may then be used to obtain selected singular vectors of MAT with subroutines BD_INVITER2 or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, the elements on and below the diagonal, with the array TAUO, represent the orthogonal matrix O of the QR factorization of MAT, as a product of elementary reflectors, if the argument TAUO is present. Otherwise, the argument MAT contains the first n columns of O (stored column-wise) on output.

See Further Details.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

RMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, RMAT contains the first n columns of Q (stored column-wise), the orthogonal matrix used to reduce R to bidiagonal form as returned by subroutine BD_CMP2 in its argument MAT.

See Further Details.

The shape of RMAT must verify: size( RMAT, 1 ) = size( RMAT, 2 ) = size( MAT, 2 ) = n .

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to size(MAT,2) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING singular values of MAT. The other values in S ( S(NSING+1:) ) are flagged by a quiet NAN.

The size of S must be equal to size( MAT, 2 ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection or dqds algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= n, otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is min( 32, n ).

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used to compute the singular values SIGMA of the bidiagonal matrix BD.

This argument has no effect if DQDS is equal to true.

The default is VECTOR = false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it been determined to lie in an interval whose width is ABSTOL or less.

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold (e.g., sqrt(safmin)=sqrt(LAMCH(‘S’)), not zero.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | T(GK) | will be used, where | T(GK) | means the 1-norm of the GOLUB-KAHAN tridiagonal form of the bidiagonal matrix BD and ULP is the machine precision (distance from 1 to the next larger floating point number).

This argument has no effect if DQDS is equal to true.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to size( MAT, 2 ) = n .

This argument has no effect if DQDS is equal to true.

The default is LS = size( MAT, 2 ) = n .

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

This argument has no effect if DQDS is equal to true.

The default is THETA = 0 .

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be equal to size( MAT, 2 ) = n .

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate upper bidiagonal matrix BD:

E(i) = B(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must be equal to size( MAT, 2 ) = n .

TAUO (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix O of the QR decomposition of MAT.

If the optional argument TAUO is present, the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on exit.

If the optional argument TAUO is absent, the first n columns of the orthogonal matrix O are explicitly generated and stored in the argument MAT on exit.

See description of the argument MAT above and Further Details below.

The size of TAUO must be equal to size( MAT, 2 ) = n .

P (OUTPUT, OPTIONAL) real(stnd), dimension(:,:)

On exit, P is overwritten with the n-by-n orthogonal matrix P (stored column-wise or in packed form), the orthogonal matrix used to reduce MAT to bidiagonal form as returned by subroutine BD_CMP2 in its argument P.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = n .

GEN_P (INPUT, OPTIONAL) logical(lgl)

On entry, this optional argument has an effect only if the optional argument P is also used.

In this case, if the optional argument GEN_P is used and is set to true, the orthogonal matrix P used to reduce MAT to bidiagonal form is generated on output of the subroutine in its argument P.

If GEN_P is set to false, the orthogonal matrix P is stored in factored form as products of elementary reflectors in the lower triangle of the array P.

See the description of BD_CMP2 subroutine for more details.

The default is true.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization of the matrix Q, used to reduce MAT to bidiagonal form, is performed (when needed). If this optional logical argument is set to false, reorthogonalization of Q is never performed and the columns of the matrix Q can be far from orthogonal to each other for input matrices MAT with a large condition number.

See the description of BD_CMP2 subroutine for more details.

The default is REORTHO = false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated BD’ * BD tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

This argument has no effect if DQDS is equal to true.

The default is not to use the Pal-Walker-Kahan algorithm.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed.

If DQDS is set to true, all singular values are computed with the dqds algorithm The dqds algorithm is much faster than bisection when all singular values are requested in many cases and delivers about the same accuracy than bisection.

If DQDS is set to false, the requested singular values are computed by bisection.

The default is to use the dqds algorithm if all the singular values are requested and to use bisection if selected singular values are requested (e.g., if the optional arguments LS or THETA are present).

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT.

FAILURE_BD = true : indicates that MAT is nearly singular and some loss of orthogonality can be expected in the Ralha-Barlow bidiagonalization algorithm.

See description of the BD_CMP2 subroutine for further details.

Further Details¶

the matrix O of the QR factorization of MAT is represented as a product of elementary reflectors

O = W(1) * W(2) * … * W(n)

Each W(i) has the form

W(i) = I + tauo * ( v * v’ ) ,

where tauo is a real scalar and v is a real m-element vector with v(1:i-1) = 0. v(i:m) is stored on exit in MAT(i:m,i) and tauo in TAUO(i). If the optional argument TAUO is absent, the first n columns of O are generated and stored in the argument MAT.

The matrix O is stored in factored form if the optional argument TAUO is present or explicitly computed if this argument is absent.

A blocked algorithm is used for computing the QR factorization of MAT. Furthermore, the computations are parallelized if OPENMP is used.

After, the initial QR factorization of MAT, the upper triangular matrix R is reduced to upper bidiagonal form BD:

Q’ * R * P = BD

The matrices Q, P and BD are computed with the help of the Ralha-Barlow one-sided method. Q is computed by a recurrence relationship and the first n columns of Q are stored in the argument RMAT on exit. P is computed as a product of n-1 elementary reflectors (e.g. Householder transformations):

P = G(1) * G(2) * … * G(n-1)

Each G(i) has the form:

G(i) = I + taup * v * v’

where taup is a real scalar, and v is a real vector. IF GEN_P is used and set to false, the n-1 G(i) elementary reflectors are stored in the lower triangle of the array P.

For the G(i) reflector, taup is stored in P(i+1,1) and v is stored in P(i+1:n,i+1). In addition, P(1,1) is set to -1 if GEN_P=false and is equal to 1 if GEN_P=true.

In other words, the value of P(1,1) indicates if the orthogonal matrix P is stored in factored form or not. Note that if n is equal to 1, elementary reflectors are not needed and consequently P(1,1) is set to 1, independently of the value of GEN_P.

This is the blocked version of the Ralha-Barlow one-sided algorithm. See the references (1), (2) and (3) for further details. Furthermore, the algorithm is parallelized if OPENMP is used.

Since Q is computed by a recurrence relationship, a loss of orthogonality of Q can be observed when the rectangular matrix MAT is singular or nearly singular or has a large condition number, see the reference (2) for details.

To correct partly this deficiency, partial reorthogonalization can be performed to ensure orthogonality at the expense of speed of computation. The reorthogonalization uses the Gram-Schmidt method described in the reference (4) and is activated if the optional logical argument REORTHO is set to true.

The reference (2) also explains how to handle the case of an exactly singular matrix MAT (a very rare event). However, in this subroutine, the partial reorthogonalization described in the reference (4), which is used when REORTHO is set true, corrects automatically this problem.

Now, let SIGMA(i), i=1,…,n, be the singular values of the intermediate bidiagonal matrix BD in decreasing order of magnitude.

If DQDS is equal to false, the subroutine computes the LS largest singular values (or the singular values which are greater or equal to THETA) of BD by a bisection method (see the reference (5) below, Sec.8.5 ). The bisection method is applied to an associated 2.n-by-2.n symmetric tridiagonal matrix T (the so-called GOLUB-KAHAN form of BD) whose eigenvalues are the singular values of BD and their negatives (see the reference (6) below).

On the other hand, if DQDS is equal to true, all the singular values of BD are computed by the dqds algorithm (see the references (7) and (8).

For further details, see:

Ralha, R.M.S., 2003:

One-sided reduction to bidiagonal form. Linear Algebra Appl., No 358, pp. 219-238.

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Stewart, G.W., 2007:

Block Gram-Schmidt Orthogonalization. Report TR-4823, Department of Computer Science, College Park, University of Maryland.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine select_singval_cmp4 ( mat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, p, gen_p, reortho, scaling, init, dqds, failure_bd )`¶

Purpose¶

SELECT_SINGVAL_CMP4 computes all or some of the greatest singular values of a real m-by-n matrix MAT with m>=n.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal (see the reference (5) below). The Ralha-Barlow one-sided method is used for this purpose (see the references (1) to (3) below).

The singular values SIGMA of the bidiagonal matrix BD, which are also the singular values of MAT, are then computed by a bisection algorithm (see the reference (5) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated n-by-n symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (6) below, Sec.3.1 ). Alternatively, at the user option, all singular values can be computed by the differential quotient difference with shifts (dqds) algorithm (see the references (7) and (8)).

The routine outputs (parts of) SIGMA, Q and optionally P (in packed form) and BD for a given matrix MAT. SIGMA, Q, P and BD may then be used to obtain selected singular vectors of MAT with subroutines BD_INVITER2 or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is overwritten with the first n columns of Q (stored column-wise), the orthogonal matrix used to reduce MAT to bidiagonal form as returned by subroutine BD_CMP2 in its argument MAT.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to size(MAT,2) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING singular values of MAT. The other values in S ( S(NSING+1:) ) are flagged by a quiet NAN.

The size of S must be equal to size( MAT, 2 ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection or dqds algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= n, otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is min( 32, n ).

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used to compute the singular values SIGMA of the bidiagonal matrix BD.

This argument has no effect if DQDS is equal to true.

The default is VECTOR = false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it been determined to lie in an interval whose width is ABSTOL or less.

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold (e.g., sqrt(safmin)=sqrt(LAMCH(‘S’)), not zero.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | BD’ * BD | will be used, where | BD’ * BD | means the 1-norm of the tridiagonal matrix BD’ * BD ( BD’ means the transpose of BD) and ULP is the machine precision (distance from 1 to the next larger floating point number).

This argument has no effect if DQDS is equal to true.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to size( MAT, 2 ) = n .

This argument has no effect if DQDS is equal to true.

The default is LS = size( MAT, 2 ) = n .

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

This argument has no effect if DQDS is equal to true.

The default is THETA = 0.

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be equal to size( MAT, 2 ) = n .

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate upper bidiagonal matrix BD:

E(i) = B(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must be equal to size( MAT, 2 ) = n .

P (OUTPUT, OPTIONAL) real(stnd), dimension(:,:)

On exit, P is overwritten with the n-by-n orthogonal matrix P (stored column-wise or in packed form), the orthogonal matrix used to reduce MAT to bidiagonal form as returned by subroutine BD_CMP2 in its argument P.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = n .

GEN_P (INPUT, OPTIONAL) logical(lgl)

On entry, this optional argument has an effect only if the optional argument P is also used.

In this case, if the optional argument GEN_P is used and is set to true, the orthogonal matrix P used to reduce MAT to bidiagonal form is generated on output of the subroutine in its argument P..

If GEN_P is set to false, the orthogonal matrix P is stored in factored form as products of elementary reflectors in the lower triangle of the array P.

See the description of BD_CMP2 subroutine for more details.

The default is true.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization of the matrix Q, used to reduce MAT to bidiagonal form, is performed (when needed). If this optional logical argument is set to false, reorthogonalization of Q is never performed and the columns of the matrix Q can be far from orthogonal to each other for input matrices MAT with a large condition number.

See the description of BD_CMP2 subroutine for more details.

The default is REORTHO = false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated BD’ * BD tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

This argument has no effect if DQDS is equal to true.

The default is not to use the Pal-Walker-Kahan algorithm.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed.

If DQDS is set to true, all singular values are computed with the dqds algorithm The dqds algorithm is much faster than bisection when all singular values are requested in many cases and delivers about the same accuracy than bisection.

If DQDS is set to false, the requested singular values are computed by bisection.

The default is to use the dqds algorithm if all the singular values are requested and to use bisection if selected singular values are requested (e.g., if the optional arguments LS or THETA are present).

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT.

FAILURE_BD = true : indicates that MAT is nearly singular and some loss of orthogonality can be expected in the Ralha-Barlow bidiagonalization algorithm.

See description of the BD_CMP2 subroutine for further details.

Further Details¶

The matrices Q, P and BD are computed with the help of the Ralha-Barlow one-sided method. Q is computed by a recurrence relationship and the first n columns of Q are stored in the argument MAT on exit. P is computed as a product of n-1 elementary reflectors (e.g. Householder transformations):

P = G(1) * G(2) * … * G(n-1)

Each G(i) has the form:

G(i) = I + taup * v * v’

where taup is a real scalar, and v is a real vector. IF GEN_P is used and set to false, the n-1 G(i) elementary reflectors are stored in the lower triangle of the array P.

For the G(i) reflector, taup is stored in P(i+1,1) and v is stored in P(i+1:n,i+1). In addition, P(1,1) is set to -1 if GEN_P=false and is equal to 1 if GEN_P=true.

In other words, the value of P(1,1) indicates if the orthogonal matrix P is stored in factored form or not. Note that if n is equal to 1, elementary reflectors are not needed and consequently P(1,1) is set to 1, independently of the value of GEN_P.

This is the blocked version of the Ralha-Barlow one-sided algorithm. See the references (1), (2) and (3) for further details. Furthermore, the algorithm is parallelized if OPENMP is used.

Since Q is computed by a recurrence relationship, a loss of orthogonality of Q can be observed when the rectangular matrix MAT is singular or nearly singular or has a large condition number, see the reference (2) for details.

To correct partly this deficiency, partial reorthogonalization can be performed to ensure orthogonality at the expense of speed of computation. The reorthogonalization uses the Gram-Schmidt method described in the reference (4) and is activated if the optional logical argument REORTHO is set to true.

The reference (2) also explains how to handle the case of an exactly singular matrix MAT (a very rare event). However, in this subroutine, the partial reorthogonalization described in the reference (4), which is used when REORTHO is set true, corrects automatically this problem.

Now, let SIGMA(i), i=1,…,n, be the singular values of the intermediate bidiagonal matrix BD in decreasing order of magnitude.

If DQDS is equal to false, the subroutine computes the LS largest singular values (or the singular values which are greater or equal to THETA) of BD by a bisection method (see the reference (5) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated n-by-n symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (6) below, Sec.3.1 ).

On the other hand, if DQDS is equal to true, all the singular values of BD are computed by the dqds algorithm (see the references (7) and (8).

If bisection is used, SELECT_SINGVAL_CMP4 subroutine is less accurate, but faster, than SELECT_SINGVAL_CMP3 subroutine since SELECT_SINGVAL_CMP3 works on the 2.n-by-2.n symmetric tridiagonal GOLUB-KAHAN form of BD, while SELECT_SINGVAL_CMP4 works implicitly on the associated n-by-n symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD.

For further details, see:

Ralha, R.M.S., 2003:

One-sided reduction to bidiagonal form. Linear Algebra Appl., No 358, pp. 219-238.

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Stewart, G.W., 2007:

Block Gram-Schmidt Orthogonalization. Report TR-4823, Department of Computer Science, College Park, University of Maryland.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine select_singval_cmp4 ( mat, rmat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauo, p, gen_p, reortho, scaling, init, dqds, failure_bd )`¶

Purpose¶

SELECT_SINGVAL_CMP4 computes all or some of the greatest singular values of a real m-by-n matrix MAT with m>=n.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper bidiagonal form by a two-step algorithm :

A QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular.

In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix (see the reference (5) below). The Ralha-Barlow one-sided method is used in this second step (see the references (1) to (3) below).

SELECT_SINGVAL_CMP4 computes O, BD, Q and P.

The singular values SIGMA of the bidiagonal matrix BD, which are also the singular values of MAT, are then computed by a bisection algorithm (see the reference (5) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated n-by-n symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (6) below, Sec.3.1 ). Alternatively, at the user option, all singular values can be computed by the differential quotient difference with shifts (dqds) algorithm (see the references (7) and (8)).

The routine outputs (parts of) SIGMA, and optionally O, Q and P and BD for a given matrix MAT. The matrix O is stored in factored form in the argument MAT if the optional argument TAUO is present or explicitly computed if this argument is absent. The first n columns of Q are stored in the argument RMAT. Finally, P is stored in the optional argument P. P is stored in factored form or explicitly generated depending on the value of the optional logical argument GEN_P.

SIGMA, O, Q, P and BD may then be used to obtain selected singular vectors of MAT with subroutines BD_INVITER2 or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, the elements on and below the diagonal, with the array TAUO, represent the orthogonal matrix O of the QR factorization of MAT, as a product of elementary reflectors, if the argument TAUO is present. Otherwise, the argument MAT contains the first n columns of O (stored column-wise) on output.

See Further Details.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

RMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, RMAT contains the first n columns of Q (stored column-wise), the orthogonal matrix used to reduce R to bidiagonal form as returned by subroutine BD_CMP2 in its argument MAT.

See Further Details.

The shape of RMAT must verify: size( RMAT, 1 ) = size( RMAT, 2 ) = size( MAT, 2 ) = n .

NSING (OUTPUT) integer(i4b)

On output, NSING specifies the number of singular values which have been computed. Note that NSING may be greater than the optional argument LS, if multiple singular values at index LS make unique selection impossible.

If none of the optional arguments LS and THETA are used, NSING is set to size(MAT,2) and all the singular values are computed.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(1:NSING) contains the first NSING singular values of MAT. The other values in S ( S(NSING+1:) ) are flagged by a quiet NAN.

The size of S must be equal to size( MAT, 2 ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit and the bisection or dqds algorithm converged for all the computed singular values to the desired accuracy ;

FAILURE = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= n, otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is min( 32, n ).

VECTOR (INPUT, OPTIONAL) logical(lgl)

On entry, if VECTOR is set to TRUE, a vectorized version of the bisection algorithm is used to compute the singular values SIGMA of the bidiagonal matrix BD.

This argument has no effect if DQDS is equal to true.

The default is VECTOR = false.

ABSTOL (INPUT, OPTIONAL) real(stnd)

On entry, the absolute tolerance for the singular values. A singular value (or cluster) is considered to be located in the bisection algorithm if it been determined to lie in an interval whose width is ABSTOL or less.

Singular values will be computed most accurately when ABSTOL is set to the square root of the underflow threshold (e.g., sqrt(safmin)=sqrt(LAMCH(‘S’)), not zero.

If ABSTOL is less than or equal to zero, or is not specified, then ULP * | BD’ * BD | will be used, where | BD’ * BD | means the 1-norm of the tridiagonal matrix BD’ * BD ( BD’ means the transpose of BD) and ULP is the machine precision (distance from 1 to the next larger floating point number).

This argument has no effect if DQDS is equal to true.

LS (INPUT, OPTIONAL) integer(i4b)

On entry, LS specifies the number of singular values which must be computed by the subroutine. On output, NSING may be different than LS if multiple singular values at index LS make unique selection impossible.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

LS must be greater than 0 and less or equal to size( MAT, 2 ) = n .

This argument has no effect if DQDS is equal to true.

The default is LS = size( MAT, 2 ) = n .

THETA (INPUT, OPTIONAL) real(stnd)

On entry, THETA specifies that the singular values which are greater or equal to THETA must be computed. If none of the singular values are greater or equal to THETA, NSING is set to zero and S(:) to a quiet NAN.

Only one of the optional arguments LS and THETA must be specified, otherwise the subroutine will stop with an error message.

This argument has no effect if DQDS is equal to true.

The default is THETA = 0.

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be equal to size( MAT, 2 ) = n .

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate upper bidiagonal matrix BD:

E(i) = B(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must be equal to size( MAT, 2 ) = n .

TAUO (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix O of the QR decomposition of MAT.

If the optional argument TAUO is present, the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on exit.

If the optional argument TAUO is absent, the first n columns of the orthogonal matrix O are explicitly generated and stored in the argument MAT on exit.

See description of the argument MAT above and Further Details below.

The size of TAUO must be equal to size( MAT, 2 ) = n .

P (OUTPUT, OPTIONAL) real(stnd), dimension(:,:)

On exit, P is overwritten with the n-by-n orthogonal matrix P (stored column-wise or in packed form), the orthogonal matrix used to reduce MAT to bidiagonal form as returned by subroutine BD_CMP2 in its argument P.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = n .

GEN_P (INPUT, OPTIONAL) logical(lgl)

On entry, this optional argument has an effect only if the optional argument P is also used.

In this case, if the optional argument GEN_P is used and is set to true, the orthogonal matrix P used to reduce MAT to bidiagonal form is generated on output of the subroutine in its argument P..

If GEN_P is set to false, the orthogonal matrix P is stored in factored form as products of elementary reflectors in the lower triangle of the array P.

See the description of BD_CMP2 subroutine for more details.

The default is true.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization of the matrix Q, used to reduce MAT to bidiagonal form, is performed (when needed). If this optional logical argument is set to false, reorthogonalization of Q is never performed and the columns of the matrix Q can be far from orthogonal to each other for input matrices MAT with a large condition number.

See the description of BD_CMP2 subroutine for more details.

The default is REORTHO = false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the singular values.

The default is to scale the bidiagonal matrix.

INIT (INPUT, OPTIONAL) logical(lgl)

On entry, if INIT=true the initial intervals for the bisection steps are computed from estimates of the eigenvalues of the associated BD’ * BD tridiagonal matrix obtained from the Pal-Walker-Kahan algorithm.

This argument has no effect if DQDS is equal to true.

The default is not to use the Pal-Walker-Kahan algorithm.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed.

If DQDS is set to true, all singular values are computed with the dqds algorithm The dqds algorithm is much faster than bisection when all singular values are requested in many cases and delivers about the same accuracy than bisection.

If DQDS is set to false, the requested singular values are computed by bisection.

The default is to use the dqds algorithm if all the singular values are requested and to use bisection if selected singular values are requested (e.g., if the optional arguments LS or THETA are present).

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT.

FAILURE_BD = true : indicates that MAT is nearly singular and some loss of orthogonality can be expected in the Ralha-Barlow bidiagonalization algorithm.

See description of the BD_CMP2 subroutine for further details.

Further Details¶

the matrix O of the QR factorization of MAT is represented as a product of elementary reflectors

O = W(1) * W(2) * … * W(n)

Each W(i) has the form

W(i) = I + tauo * ( v * v’ ) ,

where tauo is a real scalar and v is a real m-element vector with v(1:i-1) = 0. v(i:m) is stored on exit in MAT(i:m,i) and tauo in TAUO(i). If the optional argument TAUO is absent, the first n columns of O are generated and stored in the argument MAT.

The matrix O is stored in factored form if the optional argument TAUO is present or explicitly computed if this argument is absent.

A blocked algorithm is used for computing the QR factorization of MAT. Furthermore, the computations are parallelized if OPENMP is used.

After, the initial QR factorization of MAT, the upper triangular matrix R is reduced to upper bidiagonal form BD:

Q’ * R * P = BD

The matrices Q, P and BD are computed with the help of the Ralha-Barlow one-sided method. Q is computed by a recurrence relationship and the first n columns of Q are stored in the argument RMAT on exit. P is computed as a product of n-1 elementary reflectors (e.g. Householder transformations):

P = G(1) * G(2) * … * G(n-1)

Each G(i) has the form:

G(i) = I + taup * v * v’

where taup is a real scalar, and v is a real vector. IF GEN_P is used and set to false, the n-1 G(i) elementary reflectors are stored in the lower triangle of the array P.

For the G(i) reflector, taup is stored in P(i+1,1) and v is stored in P(i+1:n,i+1). In addition, P(1,1) is set to -1 if GEN_P=false and is equal to 1 if GEN_P=true.

In other words, the value of P(1,1) indicates if the orthogonal matrix P is stored in factored form or not. Note that if n is equal to 1, elementary reflectors are not needed and consequently P(1,1) is set to 1, independently of the value of GEN_P.

This is the blocked version of the Ralha-Barlow one-sided algorithm. See the references (1), (2) and (3) for further details. Furthermore, the algorithm is parallelized if OPENMP is used.

Since Q is computed by a recurrence relationship, a loss of orthogonality of Q can be observed when the rectangular matrix MAT is singular or nearly singular or has a large condition number, see the reference (2) for details.

To correct partly this deficiency, partial reorthogonalization can be performed to ensure orthogonality at the expense of speed of computation. The reorthogonalization uses the Gram-Schmidt method described in the reference (4) and is activated if the optional logical argument REORTHO is set to true.

The reference (2) also explains how to handle the case of an exactly singular matrix MAT (a very rare event). However, in this subroutine, the partial reorthogonalization described in the reference (4), which is used when REORTHO is set true, corrects automatically this problem.

Now, let SIGMA(i), i=1,…,n, be the singular values of the intermediate bidiagonal matrix BD in decreasing order of magnitude.

If DQDS is equal to false, the subroutine computes the LS largest singular values (or the singular values which are greater or equal to THETA) of BD by a bisection method (see the reference (5) below, Sec.8.5 ). The bisection method is applied (implicitly) to the associated n-by-n symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD by using the differential stationary form of the qd algorithm of Rutishauser (see the reference (6) below, Sec.3.1 ).

On the other hand, if DQDS is equal to true, all the singular values of BD are computed by the dqds algorithm (see the references (7) and (8).

If bisection is used, SELECT_SINGVAL_CMP4 subroutine is less accurate, but faster, than SELECT_SINGVAL_CMP3 subroutine since SELECT_SINGVAL_CMP3 works on the 2n-by-2n symmetric tridiagonal GOLUB-KAHAN form of BD, while SELECT_SINGVAL_CMP4 works implicitly on the associated n-by-n symmetric tridiagonal matrix BD’ * BD whose eigenvalues are the squares of the singular values of BD.

For further details, see:

Ralha, R.M.S., 2003:

One-sided reduction to bidiagonal form. Linear Algebra Appl., No 358, pp. 219-238.

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Stewart, G.W., 2007:

Block Gram-Schmidt Orthogonalization. Report TR-4823, Department of Computer Science, College Park, University of Maryland.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine svd_cmp ( mat, s, failure, v, sort, mul_size, maxiter, max_francis_steps, perfect_shift, bisect, dqds, use_svd2 )`¶

Purpose¶

SVD_CMP computes the Singular Value Decomposition (SVD) of a real m-by-n matrix MAT. The SVD is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are the left and right singular vectors of MAT.

SVD_CMP computes only the first min(m,n) columns of U and V (e.g. the left and right singular vectors of MAT in the thin SVD of MAT).

The routine returns the first min(m,n) singular values and the associated left and right singular vectors.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is overwritten with the first min(m,n) columns of U, the left singular vectors.

S (OUTPUT) real(stnd), dimension(:)

The singular values of MAT.

The size of S must verify: size( S ) = min(m,n) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of MAT.

V (OUTPUT) real(stnd), dimension(:,:)

On exit, V contains the first min(m,n) columns of V, the right singular vectors.

The shape of V must verify:

size( V, 1 ) = n,

size( V, 2 ) = min(m,n).

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= max(m,n), otherwise a default value is used. MUL_SIZE can be increased or decreased to improve the performance of the algorithm.

The default value is 32.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of min(m,n) and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

USE_SVD2 (INPUT, OPTIONAL) logical(lgl)

If the optional argument USE_SVD2 is used and is set to true, an alternate SVD algorithm which used less workspace (but which may be slower) is automatically used if m is much larger than n or if n is much larger than m (e.g. if max(m,n)>=1.5 * min(m,n) ).

Further Details¶

Computing the SVD of a rectangular matrix in subroutine SVD_CMP consists of three steps:

reduction of the rectangular matrix to bidiagonal form via orthogonal transformations (e.g. Householder transformations);

in place accumulation of the orthogonal transformations used in the reduction to bidiagonal form;

computation of the SVD of the bidiagonal matrix.

For further details, on the SVD of a rectangular matrix and the algorithm to compute it, see the references (1) or (2).

All the three steps of the SVD algorithm (e.g. the reduction to bidiagonal form, accumulation of the Householder transformations used in the reduction to bidiagonal form and computation of the SVD of the bidiagonal matrix) are parallelized if OPENMP is used.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

`subroutine svd_cmp2 ( mat, s, failure, u_vt, sort, mul_size, maxiter, max_francis_steps, perfect_shift, bisect, dqds, use_svd2 )`¶

Purpose¶

SVD_CMP2 computes the Singular Value Decomposition (SVD) of a real m-by-n matrix MAT. The SVD is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are the left and right singular vectors of MAT.

SVD_CMP2 computes only the first min(m,n) columns of U and V (e.g. the left and right singular vectors of MAT in the thin SVD of MAT).

The routine returns the first min(m,n) singular values and the associated left and right singular vectors. The right singular vectors are returned row-wise.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit:

if m>=n, MAT is overwritten with the first min(m,n) columns of U (the left singular vectors, stored column-wise);

if m<n, MAT is overwritten with the first min(m,n) rows of V’ (the right singular vectors, stored row-wise).

S (OUTPUT) real(stnd), dimension(:)

The singular values of MAT.

The size of S must verify: size( S ) = min(m,n) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of MAT.

U_VT (OUTPUT) real(stnd), dimension(:,:)

On exit:

if m>=n, U_VT contains the n-by-n orthogonal matrix V’;

if m<n, U_VT contains the m-by-m orthogonal matrix U.

The shape of U_VT must verify: size( U_VT, 1 ) = size( U_VT, 2 ) = min(m,n).

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify 1 <= MUL_SIZE <= max(m,n), otherwise a default value is used. MUL_SIZE can be increased or decreased to improve the performance of the algorithm.

The default value is 32.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of min(m,n) and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

USE_SVD2 (INPUT, OPTIONAL) logical(lgl)

If the optional argument USE_SVD2 is used and is set to true, an alternate SVD algorithm which used less workspace (but which may be slower) is automatically used if m is much larger than n or if n is much larger than m (e.g. if max(m,n)>=1.5 * min(m,n) ).

Further Details¶

Computing the SVD of a rectangular matrix in subroutine SVD_CMP2 consists of three steps:

reduction of the rectangular matrix to bidiagonal form via orthogonal transformations (e.g. Householder transformations);

in place accumulation of the orthogonal transformations used in the reduction to bidiagonal form;

computation of the SVD of the bidiagonal matrix.

For further details, on the SVD of a rectangular matrix and the algorithm to compute it, see the references (1) or (2).

All the three steps of the SVD algorithm (e.g. the reduction to bidiagonal form, accumulation of the Householder transformations used in the reduction to bidiagonal form and computation of the SVD of the bidiagonal matrix) are parallelized if OPENMP is used.

For more informations, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

`subroutine svd_cmp ( mat, s, failure, sort, mul_size, maxiter, bisect, dqds, d, e, tauq, taup )`¶

Purpose¶

SVD_CMP computes the singular values of a real m-by-n matrix MAT.

The Singular Value Decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative.

The original matrix MAT is first reduced to upper or lower bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal. The singular values SIGMA of the bidiagonal matrix BD are then computed by the bidiagonal implicit QR algorithm (if BISECT=false and DQDS=false), the bisection method (if BISECT=true) or the dqds algorithm (if DQDS=true) applied to the bidiagonal matrix BD.

The routine outputs SIGMA and optionally Q and P (in packed form), and BD for a given matrix MAT. SIGMA, Q, P and BD may then be used to obtain selected singular vectors with subroutines BD_INVITER, BD_INVITER2, BD_DEFLATE or BD_DEFLATE2.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is destroyed and if TAUQ or TAUP are present MAT is overwritten as follows:

if m >= n, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors;

if m < n, the elements below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements on and above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors.

See Further Details.

S (OUTPUT) real(stnd), dimension(:)

On exit, the singular values SIGMA of MAT if FAILURE is equal to false. If FAILURE is equal to true, the sign of the incorrect singular values is set to negative.

The size of S must be min( size(MAT,1) , size(MAT,2) ) = min(m,n).

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the implicit QR, bisection or dqds algorithm used to compute the singular values of the bidiagonal form BD of the input m-by-n matrix MAT did not converge and that full accuracy was not attained in the bidiagonal SVD of this intermediate bidiagonal form BD of MAT. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

By default, the singular values are not sorted.

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= max(m,n), otherwise a default value is used. For better performance, at the expense of more workspace, a large value can be used.

The default is 32.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal implicit QR phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form BD of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

This argument has no effect if BISECT or DQDS is equal to true.

The default is 10.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values of the intermediate min(m,n)-by-min(m,n) bidiagonal matrix BD are computed.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slower execution time.

If BISECT is set to false, singular values are computed with the bidiagonal implicit QR or dqds algorithm applied to the associated min(m,n)-by-min(m,n) bidiagonal matrix BD.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values of the intermediate min(m,n)-by-min(m,n) bidiagonal matrix BD are computed.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition. Moreover, the dqds algorithm is usually also faster than the bisection or bidiagonal implicit QR algorithms and is the default method used in SVD_CMP for computing singular values of the bidiagonal form BD of the input matrix MAT.

If DQDS is set to false, singular values are computed with the bidiagonal implicit QR or bisection algorithm applied to the associated min(m,n)-by-min(m,n) bidiagonal matrix BD.

If both optional arguments BISECT and DQDS are specified with the value true, the dqds algorithm is used.

If both optional arguments BISECT and DQDS are specified with the value false, the bidiagonal implicit QR algorithm is used.

The default is true.

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate bidiagonal matrix BD

The size of D must be min( size(MAT,1) , size(MAT,2) ) = min(m,n).

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate bidiagonal matrix BD:

if m >= n, E(i) = BD(i-1,i) for i = 2,3,…,n;

if m < n, E(i) = BD(i,i-1) for i = 2,3,…,m.

The size of E must be min( size(MAT,1) , size(MAT,2) ) = min(m,n).

TAUQ (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix Q. See Further Details.

The size of TAUQ must be min( size(MAT,1) , size(MAT,2) ) = min(m,n).

TAUP (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors which represent the orthogonal matrix P. See Further Details.

The size of TAUP must be min( size(MAT,1) , size(MAT,2) ) = min(m,n).

Further Details¶

Computing the singular values of a rectangular matrix in subroutine SVD_CMP consists of two steps:

reduction of the rectangular matrix to bidiagonal form BD, see the references (1) and (2);

computation of the singular values of the min(m,n)-by-min(m,n) bidiagonal matrix BD by a bidiagonal implicit QR algorithm (if BISECT=false and DQDS=false), a dqds algorithm (if DQDS=true) or a bisection algorithm (if BISECT=true). See the references (1), (2), (3) and (4) for a description of these different algorithms.

Note that if max(m,n) is much larger than min(m,n) and the optional arguments TAUQ and TAUP are not used, the rectangular matrix is first reduced to upper or lower triangular form by a QR or LQ factorization and the reduction algorithm is applied to the resulting triangular factor. The singular values of the rectangular matrix are then obtained from those of the triangular factor. This usually gives a large speedup in the computations.

The matrices Q and P in the bidiagonal reduction of the input m-by-n matrix MAT are represented as products of elementary reflectors:

If m >= n,

Q = H(1) * H(2) * … * H(n) and P = G(1) * G(2) * … * G(n-1)

Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors. Moreover, u(1:i-1) = 0 and v(1:i) = 0.

If TAUQ or TAUP are present :

u(i:m) is stored on exit in MAT(i:m,i);

v(i+1:n) is stored on exit in MAT(i,i+1:n).

If TAUQ is present : tauq is stored in TAUQ(i).

If TAUP is present : taup is stored in TAUP(i).

If m < n,

Q = H(1) * H(2) * … * H(m-1) and P = G(1) * G(2) * … * G(m)

Each H(i) and G(i) has the form:

H(i) = I + tauq * u * u’ and G(i) = I + taup * v * v’

where tauq and taup are real scalars, and u and v are real vectors. Moreover, u(1:i) = 0 and v(1:i-1) = 0.

If TAUQ or TAUP are present :

u(i+1:m) is stored on exit in MAT(i+1:m,i);

v(i:n) is stored on exit in MAT(i,i:n).

If TAUQ is present : tauq is stored in TAUQ(i).

If TAUP is present : taup is stored in TAUP(i).

The contents of MAT on exit, if TAUQ or TAUP are present, are illustrated by the following examples:

m = 6 and n = 5 (m >= n):

( u1 v1 v1 v1 v1 )

( u1 u2 v2 v2 v2 )

( u1 u2 u3 v3 v3 )

( u1 u2 u3 u4 v4 )

( u1 u2 u3 u4 u5 )

( u1 u2 u3 u4 u5 )

m = 5 and n = 6 (m < n):

( v1 v1 v1 v1 v1 v1 )

( u1 v2 v2 v2 v2 v2 )

( u1 u2 v3 v3 v3 v3 )

( u1 u2 u3 v4 v4 v4 )

( u1 u2 u3 u4 v5 v5 )

where ui denotes an element of the vector defining H(i), and vi an element of the vector defining G(i).

For further details, on the SVD of a rectangular matrix and the algorithms to compute it, see the references (1) or (2). In SVD_CMP subroutine, the reduction to bidiagonal form by orthogonal transformations is parallelized if OPENMP is used, the computation of the singular values is also parallelized if OPENMP is used and BISECT is used with the value true. Further details on the dqds algorithm are given in the references (3) and (4).

For more informations, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine svd_cmp3 ( mat, s, failure, u_v, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds, reortho, failure_bd )`¶

Purpose¶

SVD_CMP3 computes the Singular Value Decomposition (SVD) of a real m-by-n matrix MAT. The SVD is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are, respectively, the left and right singular vectors of MAT.

The routine returns the first min(m,n) singular values and the associated left and right singular vectors. The right singular vectors are returned row-wise if m<n.

MAT (or MAT’ if m<n) is first reduced to bidiagonal form B with the help of the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (1) and (2).

The singular values, left and right singular vectors of B are then computed by the bidiagonal implicit QR algorithm applied to B, see the references (3) and (4).

The singular vectors of MAT are finally computed by a back transformation algorithm.

In cases of a very large condition number of MAT, SVD_CMP3 may compute left (right if m<n) singular vectors of MAT, which are not numerically orthogonal (see Further Details). However, the largest left (right if m<n) singular vectors of MAT are always numerically orthogonal even if MAT is singular or nearly singular (see Further Details).

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit:

if m>=n, MAT is overwritten with the first n columns of U (the left singular vectors, stored column-wise);

if m<n, MAT is overwritten with the first m rows of V’ (the first m right singular vectors, stored row-wise);

S (OUTPUT) real(stnd), dimension(:)

The singular values of MAT.

The size of S must verify: size( S ) = min(m,n) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the bidiagonal SVD algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of MAT.

U_V (OUTPUT) real(stnd), dimension(:,:)

On exit:

if m>=n, U_V contains the n-by-n orthogonal matrix V;

if m<n, U_V contains the m-by-m orthogonal matrix U.

The shape of U_V must verify: size( U_V, 1 ) = size( U_V, 2 ) = min(m,n) .

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. If this argument is not used the singular values are not sorted. The singular vectors are rearranged accordingly.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of min(m,n) and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B or MAT.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B of MAT.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization is performed (when needed) in the Ralha-Barlow one-sided bidiagonalization of MAT. If this optional logical argument is set to false, reorthogonalization is never performed, which can result in a loss of orthogonality for the left (right) singular vectors of MAT if m>=n (m<n) and MAT has a large condition number.

See description of the BD_CMP2 subroutine for further details.

The default is REORTHO = true.

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT.

FAILURE_BD = true : indicates that MAT is nearly singular and some loss of orthogonality can be expected in the Ralha-Barlow bidiagonalization algorithm.

See description of the BD_CMP2 subroutine for further details.

Further Details¶

Computing the SVD of a m-by-n matrix MAT with m>=n in subroutine SVD_CMP3 consists of three steps:

reduction of the rectangular matrix to bidiagonal form B via the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (1) and (2);

in place accumulation of the orthogonal transformations used in the reduction to bidiagonal form B, see the references (3) and (4);

computation of the SVD of the bidiagonal matrix B, see the references (3) and (4).

In cases of a large condition number of MAT, this three-step algorithm may compute left singular vectors of MAT, which are not numerically orthogonal. This is because the left (orthogonal) matrix in the bidiagonal decomposition of MAT (estimated by the Ralha-Barlow one-sided bidiagonalization algorithm) may also not be numerically orthogonal as it is computed by a recurrence relationship, see the references (1) and (2) for details. To correct partly this deficiency, partial reorthogonalization can be performed to ensure orthogonality at the expense of speed of computation. The reorthogonalization uses the Gram-Schmidt method described in the reference (5) and is activated if the optional logical argument REORTHO is set to true.

However, the largest left singular vectors of MAT are always numerically orthogonal even if MAT is singular or nearly singular, see the reference (1).

If m<n, this three-step algorithm is applied to MAT’, instead of MAT, to get the SVD of MAT and it computes also numerically orthogonal right singular vectors of MAT in that case.

Note that if max(m,n) is much larger than min(m,n), the rectangular matrix is first reduced to upper or lower triangular form by a QR or LQ factorization and the three-steps reduction algorithm is applied to the resulting triangular factor. The singular vectors of the rectangular matrix are then obtained from those of the triangular factor by a back-transformation algorithm.

For further details on the SVD of a rectangular matrix and the algorithms to compute it, see the references below.

The three or four steps of the SVD algorithm used here (e.g., preliminary QR or LQ factorization, reduction to bidiagonal form B, in place accumulation of the orthogonal transformations and computation of the SVD of the bidiagonal matrix B) are parallelized if OPENMP is used.

For more details, see:

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Stewart, G.W., 2007:

Block Gram-Schmidt Orthogonalization. Report TR-4823, Department of Computer Science, College Park, University of Maryland.

`subroutine svd_cmp4 ( mat, s, failure, v, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds, sing_vec, gen_p, reortho, failure_bd, d, e )`¶

Purpose¶

SVD_CMP4 computes the Singular Value Decomposition (SVD) of a real m-by-n matrix MAT with m>=n. The SVD is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are, respectively, the left and right singular vectors of MAT.

The routine returns the first n singular values and the associated left and right singular vectors.

MAT is first reduced to bidiagonal form B with the help of the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (1) and (2).

The singular values and right singular vectors of B are then computed by the bidiagonal implicit QR algorithm applied to B, see the references (3) and (4).

The singular vectors of MAT are finally computed by a back transformation algorithm and an orthogonalization step for the left singular vectors.

Optionally, if the logical argument SING_VEC is used with the value false, matrices the routine computes only the singular values by the dqds algorithm and the orthogonal Q and P used to reduce MAT to bidiagonal form B. This is useful for computing a partial SVD of MAT with subroutines BD_INVITER2 or BD_DEFLATE2 for example.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit:

if SING_VEC=true, MAT is overwritten with the first n columns of U (the left singular vectors, stored column-wise);

if SING_VEC=false, MAT is overwritten with the first n columns of Q (stored column-wise), the orthogonal matrix used to reduce MAT to bidiagonal form as returned by subroutine BD_CMP2 in its argument MAT.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

S (OUTPUT) real(stnd), dimension(:)

The singular values of MAT.

The size of S must verify: size( S ) = size( MAT, 2 ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the bidiagonal SVD or dqds algorithms did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of MAT.

V (OUTPUT) real(stnd), dimension(:,:)

On exit:

if SING_VEC=true, V is overwritten with the n-by-n orthogonal matrix V (the right singular vectors, stored column-wise);

if SING_VEC=false, V is overwritten with the n-by-n orthogonal matrix P (stored column-wise or in packed form), the orthogonal matrix used to reduce MAT to bidiagonal form as returned by subroutine BD_CMP2 in its argument P.

The shape of V must verify: size( V, 1 ) = size( V, 2 ) = n .

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or into descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

The default is to sort the singular values and vectors into descending order.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

On entry, this optional argument has an effect only if the optional argument SING_VEC has the value true.

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of n and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

On entry, this optional argument has an effect only if the optional argument SING_VEC has the value true.

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B or MAT.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B or MAT.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

SING_VEC (INPUT, OPTIONAL) logical(lgl)

On entry:

if SING_VEC=true, the routine computes the singular values and vectors of MAT.

If SING_VEC=false the routine computes only the singular values of MAT and the orthogonal matrices Q and P used to reduce MAT to upper bidiagonal form as returned by subroutine BD_CMP2. See the description of BD_CMP2 subroutine for more details.

The default is true.

GEN_P (INPUT, OPTIONAL) logical(lgl)

On entry, this optional argument has an effect only if the optional argument SING_VEC is also used with the value false.

In this case, if the optional argument GEN_P is used and is set to true, the orthogonal matrix P used to reduce MAT to bidiagonal form is generated on output of the subroutine in its argument V.

If this argument is set to false, the orthogonal matrix P is stored in factored form as products of elementary reflectors in the lower triangle of the array V.

See the description of BD_CMP2 subroutine for more details.

The default is true.

REORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, this optional argument has an effect only if the optional argument SING_VEC is also used with the value false.

In this case, if the optional argument REORTHO is set to true, reorthogonalization is performed (when needed) in the Ralha-Barlow one-sided bidiagonalization of MAT. If this optional logical argument is set to false, reorthogonalization is never performed, which can result in a loss of orthogonality if MAT has a large condition number.

See description of the BD_CMP2 subroutine for further details.

The default is REORTHO = true.

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT;

FAILURE_BD = true : indicates that MAT is nearly singular and some loss of orthogonality can be expected in the left orthogonal matrix in the bidiagonal reduction of MAT obtained by the one-sided Rhala-Barlow algorithm if SING_VEC=false is used in the call to SVD_CMP4.

See description of the BD_CMP2 subroutine for further details.

D (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The diagonal elements of the intermediate upper bidiagonal matrix B.

The size of D must be size( D ) = size( MAT, 2 ) = n .

E (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The off-diagonal elements of the intermediate upper bidiagonal matrix B:

E(i) = B(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must be size( E ) = size( MAT, 2 ) = n .

Further Details¶

Computing the SVD of a m-by-n matrix MAT, with m>=n, in subroutine SVD_CMP4 consists of four steps:

reduction of the rectangular matrix to bidiagonal form B via the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (1) and (2);

in place accumulation of the right orthogonal transformations used in the reduction of MAT to bidiagonal form B, see the references (3) and (4);

computation of the singular values and right singular vectors of MAT by applying the implicit QR algorithm to the bidiagonal matrix B, see the references (3) and (4);

computation and orthogonalization of the left singular vectors in the SVD of MAT to avoid the possible loss of orthogonality of the left orthogonal matrix in the bidiagonal factorization of MAT computed by the one-sided bidiagonalization algorithm, see the references (3) and (4).

This four-step algorithm computes numerically orthogonal left singular vectors of MAT even in cases of large condition number of MAT despite that the left (orthogonal) matrix in the bidiagonal decomposition of MAT computed by the Ralha-Barlow one-sided bidiagonalization algorithm may not be numerically orthogonal if MAT is nearly singular or has a very large condition number (see references (1) and (2) for details).

If singular vectors are requested (e.g., if the optional logical argument SING_VEC is not used or is used and set to true) and max(m,n) is much larger than min(m,n), the rectangular matrix is first reduced to upper triangular form by a QR factorization and the above four-step algorithm is applied to the resulting triangular factor. The left singular vectors of the rectangular matrix are then obtained from those of the triangular factor by a back-transformation algorithm.

For further details on the SVD of a rectangular matrix and the different algorithms to compute it, see the references below.

The four or five steps of the SVD algorithm used here (e.g., preliminary QR or LQ factorization, the reduction to bidiagonal form B, in place accumulation of the orthogonal transformations, computation of the SVD of the bidiagonal matrix B and computation/orthogonalization of the left singular vectors in the SVD of MAT) are parallelized if OPENMP is used.

Optionally, the intermediate bidiagonal decomposition of MAT can be output by the subroutine if the optional logical argument SING_VEC is used with the value false and the optional arguments D and E are also specified. Note, however, that in that case the left (orthogonal) matrix in the bidiagonal decomposition of MAT (computed by the Ralha-Barlow one-sided bidiagonalization algorithm) may not be numerically orthogonal if MAT is nearly singular or has a very large condition number (see references (1) and (2) for details). To correct partly this deficiency, partial reorthogonalization can be performed to ensure orthogonality of this left matrix at the expense of speed of computation. The reorthogonalization uses the Gram-Schmidt method described in the reference (5) and is activated if the optional logical argument REORTHO is set to true.

When the optional logical argument SING_VEC is used with the value false, the singular values of the bidiagonal form B of MAT are computed by the more accurate and faster dqds algorithm. See references (6) and (7) for a description of the dqds algorithm.

For more details, see:

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Stewart, G.W., 2007:

Block Gram-Schmidt Orthogonalization. Report TR-4823, Department of Computer Science, College Park, University of Maryland.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine svd_cmp3 ( mat, s, failure, sort, maxiter, bisect, dqds, save_mat, reortho, failure_bd )`¶

Purpose¶

SVD_CMP3 computes the singular values of a real m-by-n matrix MAT. The singular value decomposition (SVD) is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are, respectively, the left and right singular vectors of MAT.

The original matrix MAT is first reduced to upper or lower bidiagonal form B by an orthogonal transformation:

Q’ * MAT * P = B

where Q and P are orthogonal. The singular values SIGMA of the bidiagonal matrix B are then computed by the bidiagonal implicit QR algorithm (if BISECT=false and DQDS=false), the bisection method (if BISECT=true) or the dqds algorithm (if DQDS=true) applied to the bidiagonal matrix B.

The routine returns only the first min(m,n) singular values of MAT.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, the m-by-n matrix MAT is destroyed if m>=n and the optional argument SAVE_MAT is not used with the value true.

S (OUTPUT) real(stnd), dimension(:)

The singular values of MAT.

The size of S must verify: size( S ) = min(m,n) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the implicit QR, bisection or dqds algorithm used to compute the singular values of the bidiagonal form B of the input m-by-n matrix MAT did not converge and that full accuracy was not attained in the bidiagonal SVD of this intermediate bidiagonal form B of MAT. The sign of the incorrect singular values is set to negative.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or into descending order if SORT = ‘D’ or ‘d’.

The default is to sort the singular values into descending order.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal implicit QR phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

This argument has no effect if BISECT is equal to true.

The default is 10.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values of the intermediate min(m,n)-by-min(m,n) bidiagonal matrix B are computed.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slower execution time.

If BISECT is set to false, singular values are computed with the bidiagonal implicit QR or dqds algorithm applied to the associated min(m,n)-by-min(m,n) bidiagonal matrix B.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values of the intermediate min(m,n)-by-min(m,n) bidiagonal matrix B are computed.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition. Moreover, the dqds algorithm is usually also faster than the bisection or bidiagonal implicit QR algorithms and is the default method used in SVD_CMP3 for computing singular values of the bidiagonal form B of the input matrix MAT.

If DQDS is set to false, singular values are computed with the bidiagonal implicit QR or bisection algorithm applied to the associated min(m,n)-by-min(m,n) bidiagonal matrix B.

If both optional arguments BISECT and DQDS are specified with the value true, the dqds algorithm is used.

If both optional arguments BISECT and DQDS are specified with the value false, the bidiagonal implicit QR algorithm is used.

The default is true.

SAVE_MAT (INPUT, OPTIONAL) logical(lgl)

On entry, if SAVE_MAT is set to true, the m-by-n matrix MAT is not modified by the routine.

The default is false.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization is performed (when needed) in the Ralha-Barlow one-sided bidiagonalization of MAT. If this optional logical argument is set to false, reorthogonalization is never performed, which can result in a loss of orthogonality for the left (right) singular vectors of MAT if m>=n (m<n) if MAT has a large condition number. However, this loss of orthogonality is not detrimental to the accuracy of the singular values.

See description of the BD_CMP2 subroutine for further details.

The default is REORTHO = false.

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT.

FAILURE_BD = true : indicates that MAT is nearly singular.

Further Details¶

Computing the singular values of a rectangular matrix in subroutine SVD_CMP3 consists of two steps:

reduction of the rectangular matrix to bidiagonal form B via the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (1) and (2);

computation of the singular values of the min(m,n)-by-min(m,n) bidiagonal matrix B by a bidiagonal implicit QR algorithm (if BISECT=false and DQDS=false), a dqds algorithm (if DQDS=true) or a bisection algorithm (if BISECT=true). See the references (3), (4), (5) and (6) for a description of these different algorithms.

Note that if max(m,n) is much larger than min(m,n), the rectangular matrix is first reduced to upper or lower triangular form by a QR or LQ factorization and the reduction algorithm is applied to the resulting triangular factor. The singular values of the rectangular matrix are then obtained from those of the triangular factor.

For further details on the SVD of a rectangular matrix and the algorithms to compute it, see the references below.

The different steps of the SVD algorithm used here (e.g., preliminary QR or LQ factorization, reduction to bidiagonal form, computation of the singular values of the bidiagonal form) are parallelized if OPENMP is used.

For more details, see:

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differential qd algorithms. Numer. Math., Volume 67, No 2, pp. 191-229.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

`subroutine svd_cmp5 ( mat, s, failure, v, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds, failure_bd )`¶

Purpose¶

SVD_CMP5 computes the Singular Value Decomposition (SVD) of a real m-by-n matrix MAT. The SVD is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are, respectively, the left and right singular vectors of MAT.

The routine returns the first min(m,n) singular values and the associated left and right singular vectors.

MAT (or MAT’ if m<n) is first reduced to bidiagonal form B with the help of the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (1) and (2).

The singular values and right (left if m<n) singular vectors of B are then computed by the bidiagonal implicit QR algorithm applied to B, see the references (3) and (4).

The singular vectors of MAT are finally computed by a back transformation algorithm and and an orthogonalization step for the left (right if m<n) singular vectors.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is overwritten with the first min(m,n) columns of U, the left singular vectors.

S (OUTPUT) real(stnd), dimension(:)

The singular values of MAT.

The size of S must verify: size( S ) = min(m,n) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the bidiagonal SVD algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of MAT.

V (OUTPUT) real(stnd), dimension(:,:)

On exit, V contains the first min(m,n) columns of V, the right singular vectors.

The shape of V must verify:

size( V, 1 ) = n,

size( V, 2 ) = min(m,n).

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or into descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

The default is to sort the singular values and vectors into descending order.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of min(m,n) and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B or MAT.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm applied to the associated symmetric tridiagonal matrix B * B’ whose eigenvalues are the squares of the singular values of B or MAT.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT.

FAILURE_BD = true : indicates that MAT is nearly singular.

Further Details¶

Computing the SVD of a m-by-n matrix MAT, with m>=n, in subroutine SVD_CMP5 consists of four steps:

reduction of the rectangular matrix to bidiagonal form B via the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (1) and (2);

in place accumulation of the right orthogonal transformations used in the reduction of MAT to bidiagonal form B, see the references (3) and (4);

computation of the singular values and right singular vectors of MAT by applying the implicit QR algorithm to the bidiagonal matrix B, see the references (3) and (4);

computation and orthogonalization of the left singular vectors in the SVD of MAT to avoid the possible loss of orthogonality of the left orthogonal matrix in the bidiagonal factorization of MAT computed by the one-sided bidiagonalization algorithm, see the references (3) and (4).

This four-step algorithm computes numerically orthogonal left singular vectors of a m-by-n matrix MAT, with m>=n, even in cases of large condition number of MAT despite that the left (orthogonal) matrix in the bidiagonal decomposition of MAT computed by the Ralha-Barlow one-sided bidiagonalization algorithm may not be numerically orthogonal if MAT is nearly singular or has a very large condition number (see references (1) and (2) for details).

If m<n, this four-step algorithm is applied to MAT’, instead of MAT, to get the SVD of MAT and it computes also numerically orthogonal right singular vectors of MAT in that case.

Note that if max(m,n) is much larger than min(m,n), the rectangular matrix is first reduced to upper or lower triangular form by a QR or LQ factorization and the four-step reduction algorithm is applied to the resulting triangular factor. The singular vectors of the rectangular matrix are then obtained from those of the triangular factor in the QR or LQ factorization by a back-transformation algorithm (see the references (3) and (4) for details).

For further details on the SVD of a rectangular matrix and the algorithms to compute it, see the references below.

The four or five steps of the SVD algorithm used here (e.g., preliminary QR or LQ factorization, reduction to bidiagonal form B, in place accumulation of the orthogonal transformations, computation of the SVD of the bidiagonal matrix B and reorthogonalization of the left (or right if m<n) singular vectors in the SVD of MAT) are parallelized if OPENMP is used.

For more details, see:

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

`subroutine svd_cmp6 ( mat, s, v, failure, sort, nsvd, maxiter, ortho, backward_sweep, scaling, initvec, failure_bd, failure_bisect )`¶

Purpose¶

SVD_CMP6 computes a full or partial Singular Value Decomposition (SVD) of a real m-by-n matrix MAT. The full SVD is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are, respectively, the left and right singular vectors of MAT.

The routine can return the first min(m,n) singular values and the associated left and right singular vectors or a truncated SVD if the optional integer parameter NSVD is used in the call to SVD_CMP6.

MAT (or MAT’ if m<n) is first reduced to bidiagonal form B with the help of the Ralha-Barlow one-sided bidiagonalization algorithm without reorthogonalisation, see the references (1) and (2).

The singular values and right (left if m<n) singular vectors of B are then computed by the bisection and inverse iteration methods applied to B and the tridiagonal matrix B’ * B, respectively, see the reference (3).

The singular vectors of MAT are finally computed by a back transformation algorithm and an orthogonalization step for the left (right if m<n) singular vectors.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is overwritten with the first nsvd columns of U, e.g., the left singular vectors associated with the first nsvd largest singular values of MAT.

S (OUTPUT) real(stnd), dimension(:), pointer

On exit, S(:) contains estimates of the first nsvd largest singular values of MAT. The singular values are given in decreasing order and are positive or zero.

The statut of the pointer S must not be undefined on entry. If, on entry, the pointer S is already allocated, it will be first deallocated and then reallocated with the correct size.

On exit, the size of the pointer S will verify:

size( S ) = nsvd <= min( size(MAT,1) , size(MAT,2) ) = min(m,n) .

V (OUTPUT) real(stnd), dimension(:,:), pointer

On exit, the computed first nsvd columns of V, e.g., the right singular vectors associated with the first nsvd largest singular values of MAT.

The right singular vector associated with the singular value S(j) is stored in the j-th column of V.

The statut of the pointer V must not be undefined on entry. If, on entry, the pointer V is already allocated, it will be first deallocated and then reallocated with the correct shape.

On exit, the shape of the pointer V will verify:

size( V, 1 ) = size( MAT, 2 ) = n ,

size( V, 2 ) = size( S ) = nsvd .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false indicates successful exit.

FAILURE = true indicates that some singular vectors failed to converge in MAXITER inverse iterations.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or into descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

The default is to sort the singular values and vectors into descending order.

NSVD (INPUT/OUTPUT, OPTIONAL) integer(i4b)

On entry, NSVD specifies the number of the top singular triplets which are requested.

On exit, NSVD is the number of singular triplets which have been computed by the subroutine, which can be greater than the requested number if multiple singular values at index NSVD make unique selection impossible.

On entry, NSVD must be greater than 0 and less or equal to min(m,n).

The default is NSVD = min( size(MAT,1) , size(MAT,2) ) = min(m,n).

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed for computing singular vectors.

By default, 2 inverse iterations are performed for all the singular vectors.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, all the singular vectors are orthogonalized by the Modified Gram-Schmidt or QR algorithm in the inverse iteration algorithm;

ORTHO=false, the singular vectors are not orthogonalized by the Modified Gram-Schmidt or QR algorithm in the inverse iteration algorithm.

The default is to orthogonalize the singular vectors only for the singular values, which are not well-separated.

BACKWARD_SWEEP (INPUT, OPTIONAL) logical(lgl)

On entry, if:

BACKWARD_SWEEP=true and the singular vectors are orthogonalized by the modified Gram-Schmidt algorithm in the inverse iteration algorithm, a backward sweep of the modified Gram-Schmidt algorithm is also performed;

BACKWARD_SWEEP=false a backward sweep is not performed.

The default is not to perform a backward sweep of the modified Gram-Schmidt algorithm.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix B and tridiagonal matrix B’ * B are scaled before computing the singular values and vectors, respectively;

SCALING=false, the bidiagonal matrix B and tridiagonal matrix B’ * B are not scaled.

The default is to scale the bidiagonal and tridiagonal matrices.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, Fernando vectors are used to start the inverse iteration process for computing the singular vectors;

INITVEC=false, random uniform starting vectors are used.

For unreduced tridiagonal matrices B’ * B, the default is to use Fernando starting vectors if the eigenvalues (e.g., the squares of the singular values) are well-separated and random uniform starting vectors otherwise.

For reduced tridiagonal matrices B’ * B, the default is to use random uniform starting vectors.

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT;

FAILURE_BD = true : indicates that MAT is nearly singular.

FAILURE_BISECT (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BISECT = false : indicates successful exit and the bisection algorithm converged for all the computed singular values to the desired accuracy;

FAILURE_BISECT = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic.

Further Details¶

Computing the SVD of a m-by-n matrix MAT, with m>=n, in subroutine SVD_CMP6 consists of four steps:

reduction of the rectangular matrix to bidiagonal form B via the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (1) and (2);

computation of the singular values and right singular vectors of B by applying the bisection and inverse iteration algorithms to the matrices B and B’ * B, respectively, see the reference (3);

computation of the right singular vectors of MAT from those of B by a back-transformation algorithm, see the references (3) and (4);

computation and orthogonalization of the left singular vectors in the SVD of MAT to avoid the possible loss of orthogonality of the left orthogonal matrix in the bidiagonal factorization of MAT computed by the one-sided bidiagonalization algorithm, see the references (1) and (2).

This four-step algorithm computes numerically orthogonal left singular vectors of a m-by-n matrix MAT, with m>=n, even in cases of large condition number of MAT despite that the left (orthogonal) matrix in the bidiagonal decomposition of MAT (computed by the Ralha-Barlow one-sided bidiagonalization algorithm) may not be numerically orthogonal when MAT is nearly singular or has a very large condition number (see references (1) and (2) for details).

If m<n, this four-step algorithm is applied to MAT’, instead of MAT, to get the SVD of MAT and it computes also numerically orthogonal right singular vectors of MAT in that case.

Note that if max(m,n) is much larger than min(m,n), the rectangular matrix is first reduced to upper or lower triangular form by a preliminary QR or LQ factorization and the four-step reduction algorithm described above is applied to the resulting triangular factor. The singular vectors of the original rectangular matrix are then obtained from those of the triangular factor in the QR or LQ factorization by a back-transformation algorithm (see the references (3) and (4) for details).

For further details on the SVD of a rectangular matrix and the algorithms to compute it, see the references below.

The four or five steps of the SVD algorithm used here (e.g., preliminary QR or LQ factorization, reduction to bidiagonal form B, computation of the singular values and right (or left if m<n) singular vectors of B, computation of the right (or left if m<n) singular vectors of MAT and orthogonalization of the left (or right if m<n) singular vectors in the SVD of MAT) are parallelized if OPENMP is used.

For more details, see:

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

`subroutine svd_cmp7 ( mat, s, u, v, failure, sort, maxiter, ortho, backward_sweep, scaling, initvec, failure_dqds )`¶

Purpose¶

SVD_CMP7 computes a full or partial Singular Value Decomposition (SVD) of a real m-by-n matrix MAT. The full SVD is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are, respectively, the left and right singular vectors of MAT.

The routine returns the first min(m,n) singular values and the associated left and right singular vectors (e.g. the left and right singular vectors of MAT in the thin SVD of MAT) or a truncated partial SVD at the user option.

MAT is first reduced to bidiagonal form B with a one-stage or two-stage algorithm, depending on its shape, see Further Details below and the references (1) and (2).

The singular values and vectors of B are then computed by the dqds and inverse iteration methods applied to B, see the references (1), (3) and (4).

The singular vectors of MAT are finally computed by a blocked back-transformation algorithm.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is overwritten as it is used as workspace inside the subroutine.

S (OUTPUT) real(stnd), dimension(:)

The singular values of MAT.

The size of S must verify: size( S ) = min(m,n) .

U (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed first nsvd columns of U, e.g., the left singular vectors associated with the first nsvd largest singular values of MAT.

The left singular vector associated with the singular value S(j) is stored in the j-th column of U.

The shape of V must verify:

size( U, 1 ) = size( MAT, 1 ) = m ,

size( U, 2 ) = nsvd <= size( S ) = min(m,n) .

V (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed first nsvd columns of V, e.g., the right singular vectors associated with the first nsvd largest singular values of MAT.

The right singular vector associated with the singular value S(j) is stored in the j-th column of V.

The shape of V must verify:

size( V, 1 ) = size( MAT, 2 ) = n ,

size( V, 2 ) = nsvd <= size( S ) = min(m,n) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false indicates successful exit.

FAILURE = true indicates that the dqds algorithm failed to converge for some singular values or that some singular vectors failed to converge in MAXITER inverse iterations. If the dqds algorithm failed to compute some singular values, the sign of the incorrect singular values in S is set to negative and arguments U and V are filled with a quiet NAN in output of the subroutine.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or into descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

The default is to sort the singular values and vectors into descending order.

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed for computing singular vectors.

By default, 2 inverse iterations are performed for all the singular vectors.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, all the singular vectors are orthogonalized by the Modified Gram-Schmidt or QR algorithm in the inverse iteration algorithm;

ORTHO=false, the singular vectors are not orthogonalized by the Modified Gram-Schmidt or QR algorithm in the inverse iteration algorithm.

The default is to orthogonalize the singular vectors only for the singular values, which are not well-separated.

BACKWARD_SWEEP (INPUT, OPTIONAL) logical(lgl)

On entry, if:

BACKWARD_SWEEP=true and the singular vectors are orthogonalized by the modified Gram-Schmidt algorithm in the inverse iteration algorithm, a backward sweep of the modified Gram-Schmidt algorithm is also performed;

BACKWARD_SWEEP=false, a backward sweep is not performed.

The default is not to perform a backward sweep of the modified Gram-Schmidt algorithm.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix B is scaled before computing the singular values and vectors;

SCALING=false, the bidiagonal matrix B is not scaled.

The default is to scale the bidiagonal matrix.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, Fernando vectors are used to start the inverse iteration process for computing the singular vectors of the bidiagonal matrix BD (e.g. the eigenvectors of the associated Golub-Kahan tridiagonal matrix);

INITVEC=false, random uniform starting vectors are used.

The default is to use Fernando starting vectors if the singular values are well-separated and the Golub-Kahan form of the input bidiagonal matrix is unreduced, and random uniform starting vectors otherwise.

FAILURE_DQDS (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_DQDS = false : indicates successful exit and the dqds algorithm converged for all the computed singular values to the desired accuracy;

FAILURE_DQDS = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. In that case, U and V are filled with a quiet NAN and the sign of the incorrect singular values in S is set to negative in output of the subroutine.

Further Details¶

Computing the SVD of a m-by-n matrix MAT in subroutine SVD_CMP7 consists of three steps:

reduction of the rectangular matrix to bidiagonal form B via orthogonal transformations (e.g. Householder transformations);

computation of the SVD of the bidiagonal matrix B with the dqds algorithm for the singular values and inverse iteration for all or the leading singular vectors at the user option;

computation of the singular vectors of MAT from those of its bidiagonal form B by a blocked back-transformation algorithm, see the references (1) and (2).

Note that if max(m,n) is much larger than min(m,n), the rectangular matrix is first reduced to upper or lower triangular form by a preliminary QR or LQ factorization and the three-step reduction algorithm described above is applied to the resulting triangular factor. The singular vectors of the original rectangular matrix are then obtained from those of the triangular factor in the QR or LQ factorization by another blocked back-transformation algorithm.

Concerning the inverse iteration algorithm used here, note that a first estimate of the singular vectors is computed by the Fernando method applied to the tridiagonal Golub-Kahan matrix associated with the bidiagonal matrix B (see the references (5) and (6) for details) for the singular values which are well-separated and if the Golub-Kahan form of the input bidiagonal matrix is unreduced. For the other singular values, a random start is used as a first estimate of the singular vectors as in the standard inverse-iteration algorithm.

The singular vectors of B are then computed or refined using inverse iteration on the tridiagonal Golub-Kahan matrix for all the singular values at one step.

By default, the singular vectors of B are then orthogonalized by the Modified Gram-Schmidt or QR algorithm only if the singular values are not well-separated.

Excepted for the computation of the singular values by the dqds algorithm, all the steps of the SVD algorithm used here (e.g., preliminary QR or LQ factorization, reduction to bidiagonal form B, computation of the singular vectors vectors of B, computation of the singular vectors of MAT by blocked back-transformation algorithms are parallelized if OPENMP is used.

For more details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Parlett, B.N., and Marques, O.A., 2000:

An implementation of the dqds algorithm (positive case). Linear Algebra Appl., Volume 309, pp. 217-259.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

`subroutine svd_cmp8 ( mat, s, u, v, failure, sort, maxiter, ortho, backward_sweep, scaling, initvec, reortho, failure_bd, failure_dqds )`¶

Purpose¶

SVD_CMP8 computes a full or partial Singular Value Decomposition (SVD) of a real m-by-n matrix MAT with m>=n. The full SVD is written:

MAT = U * SIGMA * V’

where SIGMA is an m-by-n matrix which is zero except for its n-diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of MAT; they are real and non-negative. The columns of U and V are, respectively, the left and right singular vectors of MAT.

The routine returns the first n singular values and the associated left and right singular vectors (e.g. the left and right singular vectors of MAT in the thin SVD of MAT) or a truncated partial SVD at the user option.

MAT is first reduced to bidiagonal form B with the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (7) and (8). In addition, if m is much larger than n, a preliminary QR factorization is applied to MAT and the one-sided bidiagonalization algorithm is applied to the triangular factor of this QR factorization in order to speedup the computations.

The singular values and vectors of B are then computed by the dqds and inverse iteration methods applied to B, see the references (1), (3) and (4).

The singular vectors of MAT are finally computed by a blocked back-transformation algorithm.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is overwritten as it is used as workspace inside the subroutine.

S (OUTPUT) real(stnd), dimension(:)

The singular values of MAT.

The size of S must verify: size( S ) = min(m,n) .

U (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed first nsvd columns of U, e.g., the left singular vectors associated with the first nsvd largest singular values of MAT.

The left singular vector associated with the singular value S(j) is stored in the j-th column of U.

The shape of V must verify:

size( U, 1 ) = size( MAT, 1 ) = m ,

size( U, 2 ) = nsvd <= size( S ) = min(m,n) .

V (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed first nsvd columns of V, e.g., the right singular vectors associated with the first nsvd largest singular values of MAT.

The right singular vector associated with the singular value S(j) is stored in the j-th column of V.

The shape of V must verify:

size( V, 1 ) = size( MAT, 2 ) = n ,

size( V, 2 ) = nsvd <= size( S ) = min(m,n) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false indicates successful exit.

FAILURE = true indicates that the dqds algorithm failed to converge for some singular values or that some singular vectors failed to converge in MAXITER inverse iterations. If the dqds algorithm failed to compute some singular values, the sign of the incorrect singular values in S is set to negative and arguments U and V are filled with a quiet NAN in output of the subroutine.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or into descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

The default is to sort the singular values and vectors into descending order.

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed for computing singular vectors.

By default, 2 inverse iterations are performed for all the singular vectors.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, all the singular vectors are orthogonalized by the Modified Gram-Schmidt or QR algorithm in the inverse iteration algorithm;

ORTHO=false, the singular vectors are not orthogonalized by the Modified Gram-Schmidt or QR algorithm in the inverse iteration algorithm.

The default is to orthogonalize the singular vectors only for the singular values, which are not well-separated.

BACKWARD_SWEEP (INPUT, OPTIONAL) logical(lgl)

On entry, if:

BACKWARD_SWEEP=true and the singular vectors are orthogonalized by the modified Gram-Schmidt algorithm in the inverse iteration algorithm, a backward sweep of the modified Gram-Schmidt algorithm is also performed;

BACKWARD_SWEEP=false, a backward sweep is not performed.

The default is not to perform a backward sweep of the modified Gram-Schmidt algorithm.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix B is scaled before computing the singular values and vectors;

SCALING=false, the bidiagonal matrix B is not scaled.

The default is to scale the bidiagonal matrix.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, Fernando vectors are used to start the inverse iteration process for computing the singular vectors of the bidiagonal matrix BD (e.g. the eigenvectors of the associated Golub-Kahan tridiagonal matrix);

INITVEC=false, random uniform starting vectors are used.

The default is to use Fernando starting vectors if the singular values are well-separated and the Golub-Kahan form of the input bidiagonal matrix is unreduced, and random uniform starting vectors otherwise.

REORTHO (INPUT, OPTIONAL) logical(lgl)

If the optional argument REORTHO is set to true, reorthogonalization is performed (when needed) in the Ralha-Barlow one-sided bidiagonalization of MAT. If this optional logical argument is set to false, reorthogonalization is never performed, which can result in a loss of orthogonality for the left singular vectors of MAT if MAT has a large condition number.

See description of the BD_CMP2 subroutine for further details.

The default is REORTHO = true.

FAILURE_BD (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_BD = false : indicates that maximum accuracy was obtained in the Ralha-Barlow one-sided bidiagonalization of MAT;

FAILURE_BD = true : indicates that MAT is nearly singular and some loss of orthogonality can be expected in the Ralha-Barlow bidiagonalization algorithm.

See description of the BD_CMP2 subroutine for further details.

FAILURE_DQDS (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE_DQDS = false : indicates successful exit and the dqds algorithm converged for all the computed singular values to the desired accuracy;

FAILURE_DQDS = true : indicates that some or all of the singular values failed to converge or were not computed. This is generally caused by unexpectedly inaccurate arithmetic. In that case, U and V are filled with a quiet NAN and the sign of the incorrect singular values in S is set to negative in output of the subroutine.

Further Details¶

Computing the SVD of a m-by-n matrix MAT, with m>=n, in subroutine SVD_CMP8 consists of three steps:

reduction of the rectangular matrix to bidiagonal form B via the Ralha-Barlow one-sided bidiagonalization algorithm, see the references (7) and (8);

computation of the SVD of the bidiagonal matrix B with the dqds algorithm for the singular values and inverse iteration for all or the leading singular vectors at the user option;

computation of the singular vectors of MAT from those of its bidiagonal form B by a blocked back-transformation algorithm, see the references (1) and (2).

Note that if m is much larger than n, the rectangular matrix is first reduced to upper triangular form by a preliminary QR factorization and the three-step reduction algorithm described above is applied to the resulting triangular factor. The singular vectors of the original rectangular matrix are then obtained from those of the triangular factor in the QR factorization by another blocked back-transformation algorithm.

Concerning the inverse iteration algorithm used here, note that a first estimate of the singular vectors is computed by the Fernando method applied to the tridiagonal Golub-Kahan matrix associated with the bidiagonal matrix B (see the references (5) and (6) for details) for the singular values which are well-separated and if the Golub-Kahan form of the input bidiagonal matrix is unreduced. For the other singular values, a random start is used as a first estimate of the singular vectors as in the standard inverse-iteration algorithm.

The singular vectors of B are then computed or refined using inverse iteration on the tridiagonal Golub-Kahan matrix for all the singular values at one step.

By default, the singular vectors of B are then orthogonalized by the Modified Gram-Schmidt or QR algorithm only if the singular values are not well-separated.

Excepted for the computation of the singular values by the dqds algorithm, all the steps of the SVD algorithm used here (e.g., preliminary QR factorization, reduction to bidiagonal form B, computation of the singular vectors vectors of B, computation of the singular vectors of MAT by blocked back-transformation algorithms are parallelized if OPENMP is used.

For more details, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

Parlett, B.N., and Marques, O.A., 2000:

An implementation of the dqds algorithm (positive case). Linear Algebra Appl., Volume 309, pp. 217-259.

Li, S., Gu, M., and Parlett, B. N., 2014:

An Improved DQDS Algorithm. SIAM Journal on Scientific Computing, Volume 36, No 3, C290-C308.

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

Barlow, J.L., Bosner, N., and Drmac, Z., 2005:

A new stable bidiagonal reduction algorithm. Linear Algebra Appl., No 397, pp. 35-84.

Bosner, N., and Barlow, J.L., 2007:

Block and Parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl., Volume 29, No 3, pp. 927-953.

`subroutine rsvd_cmp ( mat, s, leftvec, rightvec, failure, niter, nover, ortho, extd_samp, rng_alg, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

RSVD_CMP computes approximations of the nsvd largest singular values and associated left and right singular vectors of a full m-by-n real matrix MAT using randomized power, subspace or block Krylov iterations.

nsvd is the target rank of the partial Singular Value Decomposition (SVD), which is sought, and is equal to the size of the output real vector argument S, i.e., nsvd = size( S ).

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT. MAT is not modified by the routine.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(:) contains the first top nsvd singular values of MAT. The singular values are given in decreasing order and are positive or zero.

The size of S must verify: size( S ) = nsvd <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed nsvd top left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) = nsvd .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed nsvd top right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) = nsvd .

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit, if the optional logical argument FAILURE is present, a test of the accuracy of the computed singular triplets is performed and in that case:

FAILURE = false : indicates successful exit;

FAILURE = true : indicates that some singular values and vectors of MAT failed to converge in NITER iterations.

If FAILURE = true on exit, results are still useful, but some of the approximated singular triplets have a poor accuracy.

NITER (INPUT, OPTIONAL) integer(i4b)

The number of randomized power, subspace or block Krylov iterations performed in the subroutine for computing the top nsvd singular triplets. NITER must be positive or null.

By default, 5 randomized power, subspace or block Krylov iterations are performed.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized power, subspace or block Krylov iterations for computing the top nsvd singular triplets.

NOVER must be positive or null and verifies the relationship:

NOVER + size( S ) <= min( size(MAT,1) , size(MAT,2) )

and is adjusted if necessary to verify this relationship in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the randomized power, subspace or block Krylov iterations.

By default, the oversampling size is set to 10.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian test matrix in the randomized SVD algorithm.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to RSVD_CMP.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, orthonormalization is carried out between each step of the power or block Krylov iterations, to avoid loss of accuracy due to rounding errors. This means that subspace iterations are used instead of power iterations;

ORTHO=false, orthonormalization is not performed.

The default is to use orthonormalization, e.g., ORTHO=true.

EXTD_SAMP (INPUT, OPTIONAL) logical(lgl)

The optional argument EXTD_SAMP determines if extended sampling (e.g., block Krylov iterations) is used or not for computing the top nsvd singular triplets.

On entry, if:

EXTD_SAMP=true, block Krylov iterations are used;

EXTD_SAMP=false, power or subspace iterations are used.

The default is to use power or subspace iterations, e.g., EXTD_SAMP=false.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm, which is used in the last phase of the randomized algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

The optional argument MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g., QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm, which is used in the last phase of the randomized algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is the minimum of nsvd and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

The optional argument PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm, which is used in the last phase of the randomized algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

For a good introduction to randomized linear algebra, see the references (1) and (2).

The randomized power or subspace iteration was proposed in (3; see Algorithm 4.4) to compute an orthonormal matrix whose range approximates the range of MAT. An approximate partial SVD can then be computed using the aforementioned orthonormal matrix, see Algorithm 5.1 in (3).

The randomized block Krylov iterations for computing an approximate partial SVD was proposed in (5; see Algorithm 2). See also the reference (1).

For further details, on randomized linear algebra, computing low-rank matrix approximations and partial SVD using randomized power, subspace or block Krylov iterations, see:

Martinsson, P.G., 2019:

Randomized methods for matrix computations. arXiv.1607.01649

Erichson, N.B., Voronin, S., Brunton, S.L., and Kutz, J.N., 2019:

Randomized matrix decompositions using R. arXiv.1608.02148

Halko, N., Martinsson, P.G., and Tropp, J.A., 2011:

Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53, 217-288.

Gu, M., 2015:

Subspace iteration randomization and singular value problems. SIAM J. Sci. Comput., 37, A1139-A1173.

Musco, C., and Musco, C., 2015:

Randomized block krylov methods for stronger and faster approximate singular value decomposition. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 15, pages 1396-1404, Cambridge, MA, USA, 2015. MIT Press.

Li, H.,Linderman, G.C., Szlam, A., Stanton, K.P., Kluger, Y., and Tygert, M., 2017:

Algorithm 971: An implementation of a randomized algorithm for principal component analysis. ACM Trans. Math. Softw. 43, 3, Article 28 (January 2017).

`subroutine rsvd_cmp_fixed_precision ( mat, relerr, s, leftvec, rightvec, failure_relerr, failure, niter, blk_size, maxiter_qb, ortho, reortho, niter_qb, rng_alg, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

RSVD_CMP_FIXED_PRECISION computes approximations of the top nsvd singular values and associated left and right singular vectors of a full m-by-n real matrix MAT using randomized power or subspace iterations.

nsvd is the target rank of the partial Singular Value Decomposition (SVD), which is sought, and this partial SVD must have an approximation error which fulfills:

||MAT-rSVD||_F <= ||MAT||_F * relerr

, where rSVD is the computed partial SVD approximation, || ||_F is the Frobenius norm and relerr is a prescribed accuracy tolerance for the relative error of the computed partial SVD approximation, specified in the input real argument RELERR.

In other words, nsvd is not known in advance and is determined in the subroutine. This explains why the output real array arguments S, LEFTVEC and RIGHTVEC, which contain the computed singular triplets of the partial SVD on exit, must be declared in the calling program as pointers.

On exit, nsvd is equal to the size of the output real pointer argument S, which contains the computed singular values, i.e., nsvd = size( S ) and the relative error in the Frobenius norm of the computed partial SVD approximation is output in argument RELERR.

RSVD_CMP_FIXED_PRECISION searches incrementally the best (e.g., smallest) partial SVD approximation, which fulfills the prescribed accuracy tolerance for the relative error. More precisely, the rank of the partial SVD approximation is increased progressively of BLK_SIZE by BLK_SIZE until the prescribed accuracy tolerance is satisfied and then improved and adjusted precisely by additional subspace iterations (as specified by the optional NITER_QB integer argument) to obtain the smallest partial SVD approximation, which satisfies the prescribed tolerance.

Note that the product of the two integer arguments BLK_SIZE and MAXITER_QB (see below for their meaning), BLK_SIZE*MAXITER_QB, determines the maximum allowable rank of the partial SVD approximation, which is sought. In other words, the subroutine will stop the search for the best (e.g., smallest) partial SVD approximation, which fulfills the requested tolerance, if the rank of this partial SVD approximation exceeds BLK_SIZE*MAXITER_QB. In that case, the subroutine will return the current partial SVD approximation (with a rank less or equal to BLK_SIZE*MAXITER_QB).

In all cases the relative error of the computed partial SVD approximation is output in argument RELERR.

If, finally, the optional logical argument FAILURE_RELERR is used, it will be set to true if the computed partial SVD does not fulfill the requested relative error specified on entry in the argument RELERR and to false otherwise.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

MAT is not modified by the routine.

RELERR (INPUT/OUTPUT) real(stnd)

On entry, the requested accuracy tolerance for the relative error of the computed restricted SVD approximation.

The preset RELERR must be greater than 4*epsilon( RELERR ), less than one and verifies:

RELERR >= 2 * sqrt( epsilon( RELERR )/RELERR )

and is forced to be greater than 2*sqrt( epsilon( RELERR )/RELERR ) if this is not the case to avoid loss of accuracy in the algorithm. See reference (6) for more details.

On exit, RELERR contains the relative error of the computed partial SVD approximation in the Frobenius norm:

RELERR = ||MAT-rSVD||_F / ||MAT||_F

S (OUTPUT) real(stnd), dimension(:), pointer

On exit, S(:) contains estimates of the first top nsvd singular values of MAT. The singular values are given in decreasing order and are positive or zero.

The statut of the pointer S must not be undefined on entry. If, on entry, the pointer S is already allocated, it will be first deallocated and then reallocated with the correct size.

On exit, the size of the pointer S will verify:

size( S ) = nsvd <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:), pointer

On exit, the computed nsvd top left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The statut of the pointer LEFTVEC must not be undefined on entry. If, on entry, the pointer LEFTVEC is already allocated, it will be first deallocated and then reallocated with the correct shape.

On exit, the shape of the pointer LEFTVEC will verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) = nsvd .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:), pointer

On exit, the computed nsvd top right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The statut of the pointer RIGHTVEC must not be undefined on entry. If, on entry, the pointer RIGHTVEC is already allocated, it will be first deallocated and then reallocated with the correct shape.

On exit, the shape of the pointer RIGHTVEC will verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) = nsvd .

FAILURE_RELERR (OUTPUT, OPTIONAL) logical(lgl)

On exit, if the optional logical argument FAILURE_RELERR is present, it is set on exit as follows:

FAILURE_RELERR = false : indicates successful exit and the computed partial SVD fulfills the requested relative error specified on entry in the argument RELERR,

FAILURE_RELERR = true : indicates that the computed partial SVD has a relative error larger than the requested relative error. This means that the requested accuracy tolerance for the relative error is too small (i.e., RELERR < 2 * sqrt( epsilon( RELERR )/RELERR ) or that the input parameters BLK_SIZE and/or MAXITER_QB have a too small value, given the distribution of the singular values of MAT, and must be increased to fullfill the preset accuracy tolerance for the relative error of the partial SVD approximation.

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit, if the optional logical argument FAILURE is present, a test of the accuracy of the computed singular triplets is performed and in that case:

FAILURE = false : indicates successful exit;

FAILURE = true : indicates that some singular values and vectors of MAT failed to converge in NITER and NITER_QB power and subspace iterations.

If FAILURE = true on exit, results are still useful, but some of the approximated singular triplets have a poor accuracy.

NITER (INPUT, OPTIONAL) integer(i4b)

The number of randomized power or subspace iterations performed in the first phase of the randomized algorithm for computing the preliminary QB factorization.

NITER must be positive or null.

By default, 1 randomized power or subspace iteration is performed.

BLK_SIZE (INPUT, OPTIONAL) integer(i4b)

On entry, the block size used in the randomized QB factorization, which is used in the first phase of the randomized SVD algorithm.

BLK_SIZE must be greater or equal to one and less than min(m,n) and must be set to a much smaller value than min(m,n) usually, depending also on the architecture of the computer.

By default, BLK_SIZE is set to min( 10, min(m,n) ).

MAXITER_QB (INPUT, OPTIONAL) integer(i4b)

MAXITER_QB controls the maximum number of allowed iterations in the randomized QB algorithm, which is used in the first phase of the randomized SVD algorithm.

MAXITER_QB must be set greater or equal to one and less than int( min(m,n)/BLK_SIZE ).

By default, MAXITER_QB is set to max( 1, int( min(m,n)/(4*BLK_SIZE) ) ).

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, orthonormalization is carried out between each step of the power iterations, to avoid loss of accuracy due to rounding errors. This means that subspace iterations are used instead of power iterations,

ORTHO=false, orthonormalization is not performed.

The default is to use orthonormalization, e.g., ORTHO=true.

REORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

REORTHO=true, a reorthogonalization step is performed to avoid the loss of orthogonality in the Gram-Schmidt procedure, which is used in the randomized QB factorization;

REORTHO=false, a reorthogonalization step is not performed in the Gram-Schmidt procedure.

The default is to use a reorthogonalization step, e.g., REORTHO=true.

NITER_QB (INPUT, OPTIONAL) integer(i4b)

The number of subspace iterations performed in the last phase of the QB algorithm for improving the QB factorization and computes the top nsvd singular triplets of MAT.

NITER_QB must be greater or equal to 0.

By default, 2 final subspace iterations are performed.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian test matrix in the randomized SVD algorithm.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to RSVD_CMP_FIXED_PRECISION.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm, which is used in the last phase of the randomized algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

The optional argument MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g., QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm, which is used in the last phase of the randomized algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is the minimum of nsvd and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

The optional argument PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm, which is used in the last phase of the randomized algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

For a good introduction to randomized linear algebra , see the references (1) and (2).

The randomized power or subspace iteration was proposed in (3; see Algorithm 4.4) to compute an orthonormal matrix whose range approximates the range of MAT. An approximate partial SVD can then be computed using the aforementioned orthonormal matrix, see Algorithm 5.1 in (3).

Usually, the problem of low-rank matrix approximation falls into two categories:

the fixed-rank problem, where the rank parameter nsvd is given;

the fixed-precision problem, where we seek a partial SVD factorization, rSVD, as small as possible such that

||MAT-rSVD||_F <= eps

, where eps is a given accuracy tolerance.

RSVD_CMP_FIXED_PRECISION is dedicated to solve the fixed-precision problem. The fixed-rank problem can be solved by subroutine RSVD_CMP.

RSVD_CMP_FIXED_PRECISION uses an improved version of the “randQB_FP” algorithm described in the reference (6) to solve the fixed-precision problem.

For further details, on randomized linear algebra, computing low-rank matrix approximations, partial SVD using randomized power or subspace iterations or solving the fixed-precision problem, see:

Martinsson, P.G., 2019:

Randomized methods for matrix computations. arXiv.1607.01649

Erichson, N.B., Voronin, S., Brunton, S.L., and Kutz, J.N., 2019:

Randomized matrix decompositions using R. arXiv.1608.02148

Halko, N., Martinsson, P.G., and Tropp, J.A., 2011:

Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53, 217-288.

Gu, M., 2015:

Subspace iteration randomization and singular value problems. SIAM J. Sci. Comput., 37, A1139-A1173.

Martinsson, P.G., and Voronin, S., 2016:

A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices. SIAM J. Sci. Comput., 38:5, S485-S507.

Yu, W., Gu, Y., and Li, Y., 2018:

Efficient randomized algorithms for the fixed-precision low-rank matrix approximation. SIAM J. Mat. Ana. Appl., 39:3, 1339-1359.

`subroutine reig_pos_cmp ( mat, eigval, eigvec, failure, niter, nover, ortho, extd_samp, use_nystrom, rng_alg, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

REIG_POS_CMP computes approximations of the neig largest eigenvalues and associated eigenvectors of a full n-by-n real symmetric positive semi-definite matrix MAT using randomized power, subspace or block Krylov iterations and, at the user option, the Nystrom method (see below for details).

neig is the target rank of the partial EigenValue Decomposition (EVD), which is sought, and is equal to the size of the output real vector argument EIGVAL, i.e., neig = size( EIGVAL ).

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the n-by-n symmetric positive semi-definite matrix MAT.

MAT is not modified by the routine.

EIGVAL (OUTPUT) real(stnd), dimension(:)

On exit, EIGVAL(:) contains the first top neig eigenvalues of MAT. The eigenvalues are given in decreasing order of magnitude.

The size of EIGVAL must verify:

size( EIGVAL ) = neig <= size( MAT, 1 ) = size( MAT, 2 ) = n.

EIGVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed neig top eigenvectors. The eigenvector associated with the eigenvalue EIGVAL(j) is stored in the j-th column of EIGVEC.

The shape of EIGVEC must verify:

size( EIGVEC, 1 ) = size( MAT, 1 ) = size( MAT, 2 ) = n,

size( EIGVEC, 2 ) = size( EIGVEC ) = neig .

If FAILURE = true on exit, results are still useful, but some of the approximated eigen couplets have a poor accuracy.

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit, if the optional logical argument FAILURE is present, a test of the accuracy of the computed partial EVD is performed and in that case:

FAILURE = false : indicates successful exit;

FAILURE = true : indicates that some of the computed eigenvalues and eigenvectors of MAT failed to converge in NITER iterations.

If FAILURE = true on exit, results are still useful, but some of the approximated eigen couplets have a poor accuracy.

NITER (INPUT, OPTIONAL) integer(i4b)

The number of randomized power, subspace or block Krylov iterations performed in the subroutine for computing the top neig eigen triplets. NITER must be positive or null.

By default, 10 randomized power, subspace or block Krylov iterations are performed.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized power, subspace or block Krylov iterations for computing the top neig eigen triplets.

NOVER must be positive or null and verifies the relationship:

NOVER + size( EIGVAL ) <= size( MAT, 1 ) = size( MAT, 2 ) = n

and is adjusted if necessary to verify this relationship in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the randomized power, subspace or block Krylov iterations.

By default, the oversampling size is set to 10.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, orthonormalization is carried out between each step of the power iterations to avoid loss of accuracy due to rounding errors. This means that subspace iterations are used instead of power iterations;

ORTHO=false, orthonormalization is not performed.

The default is to use orthonormalization, e.g., ORTHO=true.

EXTD_SAMP (INPUT, OPTIONAL) logical(lgl)

The optional argument EXTD_SAMP determines if extended sampling (e.g., block Krylov iterations) is used or not for computing the top neig eigen triplets.

On entry, if:

EXTD_SAMP=true, block Krylov iterations are used;

EXTD_SAMP=false, power or subspace iterations are used.

The default is to use power or subspace iterations, e.g., EXTD_SAMP=false.

USE_NYSTROM (INPUT, OPTIONAL) logical(lgl)

If the optional argument USE_NYSTROM is used and is set to:

true, the last step of the randomized algorithm is performed with the Nystrom method and an SVD decomposition;

false, an EVD decomposition is used in the final step of the randomized algorithm.

The default is to use the Nystrom method, e.g., USE_NYSTROM=true.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian test matrix in the randomized EVD algorithm.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to REIG_POS_CMP.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm or in the QR phase of the EVD algorithm, which are used in the last phase of the randomized algorithm.

See description of suboutines SVD_CMP and EIG_CMP for further details about this optional argument.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

The optional argument MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm, which is used in the last phase of the randomized algorithm if the Nystrom method is used.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is the minimum of neig and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

The optional argument PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm, which is used in the last phase of the randomized EVD algorithm if the Nystrom method is used.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false or if the Nystrom method is not used.

If PERFECT_SHIFT and BISECT are both set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed EVD decomposition at the expense of a slightly slower execution time if the Nystrom method is used.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm if the Nystrom method is used and PERFECT_SHIFT is equal to true.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

For a good introduction to randomized linear algebra, see the references (1) and (2).

The randomized subspace iteration was proposed in (3; see Algorithm 4.4) to compute an orthonormal matrix whose range approximates the range of MAT. An approximate partial spectral decomposition can then be computed using the aforementioned orthonormal matrix, see Algorithm 5.3 in (3). Moreover, if the input matrix is positive semi-definite, an improved randomized algorithm exists, the Nystrom method, see Algorithm 5.5 in (3) and also references (1) and (5).

The Nystrom method will be selected in REIG_POS_CMP if the USE_NYSTROM argument is used with the value true (this is the default), otherwise the standard EVD algorithm will be used in the last step of the randomized algorithm. The Nystrom method provides more accurate results for positive (semi-)definite matrices.

The randomized block Krylov iterations for computing an approximate partial EVD was proposed in (4; see Algorithm 2). See also the reference (1).

For further details on randomized linear algebra, computing a partial EVD decomposition using randomized power, subspace or block Krylov iterations, or the Nystrom method, see:

Martinsson, P.G., 2019:

Randomized methods for matrix computations. arXiv.1607.01649

Erichson, N.B., Voronin, S., Brunton, S.L., and Kutz, J.N., 2019:

Randomized matrix decompositions using R. arXiv.1608.02148

Halko, N., Martinsson, P.G., and Tropp, J.A., 2011:

Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53, 217-288.

Musco, C., and Musco, C., 2015:

Randomized block krylov methods for stronger and faster approximate singular value decomposition. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 15, pages 1396-1404, Cambridge, MA, USA, 2015. MIT Press.

Li, H.,Linderman, G.C., Szlam, A., Stanton, K.P., Kluger, Y., and Tygert, M., 2017:

Algorithm 971: An implementation of a randomized algorithm for principal component analysis. ACM Trans. Math. Softw. 43, 3, Article 28 (January 2017).

`subroutine rqr_svd_cmp ( mat, s, failure, v, random_qr, truncated_qr, rng_alg, blk_size, nover, nover_svd, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

RQR_SVD_CMP computes approximations of the nsvd largest singular values and associated left and right singular vectors of a full m-by-n real matrix MAT using a three-step procedure, which can be termed a QR-SVD algorithm:

first, a partial (or complete) QR factorization with column pivoting of MAT is computed;

in a second step, a Singular Value Decomposition (SVD) of the (permuted) upper triangular or trapezoidal (e.g., if n>m) factor, R, of this QR decomposition is computed. The singular values and right singular vectors of this SVD of R are also estimates of the singular values and right singular vectors of MAT;

Estimates of the associated left singular vectors of MAT are then obtained by pre-multiplying the left singular of R by the orthogonal matrix Q in the initial QR decomposition of MAT (or its first k columns if the QR factorization is only partial).

By default, a standard deterministic BLAS2 QR factorization with column pivoting is used in the first phase of the QR-SVD algorithm. However, if the optional logical argument RANDOM_QR is used with the value true, an alternate fast randomized partial QR factorization is used in the first phase of the QR-SVD algorithm.

Furthermore if, in addition, the optional logical argument TRUNCATED_QR is used with the value true, an even faster (but less accurate) randomized partial and truncated QR factorization is used in the first phase of the QR-SVD algorithm.

nsvd is the target rank of the partial SVD, which is sought, and is equal to the size of the output real vector argument S, i.e., nsvd = size( S ). If, nsvd = min( size(MAT,1) , size(MAT,2) ), a full SVD of MAT is obtained with the same or higher accuracy than subroutines SVD_CMP or SVD_CMP2 if the optional logical argument RANDOM_QR is not used (or is set to false).

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, the top nsvd left singular vectors are stored in the first nsvd columns of MAT. The left singular vector associated with the singular value S(j) is stored in the j-th column of MAT. The other part of MAT is used as workspace in the algorithm and is destroyed on exit.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(:) contains the first top nsvd singular values of MAT. The singular values are given in decreasing order and are positive or zero.

The size of S must verify:

size( S ) = nsvd <= min( size(MAT,1) , size(MAT,2) ) .

FAILURE (OUTPUT) logical(lgl)

On exit, if:

FAILURE = false : indicates successful exit in the SVD of the triangular factor R of the QR decomposition of MAT.

FAILURE = true : indicates that the SVD algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of the triangular factor R in the QR decomposition of MAT.

If on entry, RANDOM_QR=true and TRUNCATED_QR=true, a test of the accuracy of the randomized partial and truncated QR factorization used in the first phase is also performed. In that case:

FAILURE = false : indicates also that this randomized partial and truncated QR factorization seems accurate.

FAILURE = true : indicates that this randomized partial and truncated QR factorization is not accurate.

V (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed top nsvd right singular vectors of MAT. The right singular vector associated with the singular value S(j) is stored in the j-th column of V.

The shape of V must verify:

size( V, 1 ) = size( MAT, 2 ) = n ,

size( V, 2 ) = size( S ) = nsvd .

RANDOM_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if RANDOM_QR is used with the value true, a fast randomized partial QR factorization is used in the first phase of the QR-SVD algorithm.

By default, RANDOM_QR = false, i.e., a standard deterministic (partial) QR factorization with column pivoting is used in the first phase of the QR-SVD algorithm.

TRUNCATED_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if TRUNCATED_QR is used with the value true in addition to RANDOM_QR also set to true, a very fast (but less accurate) randomized partial and truncated QR factorization is used in the first phase of the QR-SVD algorithm.

By default, TRUNCATED_QR = false, i.e., a “standard” randomized (partial) QR factorization with column pivoting is used in the first phase of the QR-SVD algorithm if RANDOM_QR = true.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian matrix in the randomized (partial) QR phase of the QR-SVD algorithm if RANDOM_QR = true.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to RQR_SVD_CMP.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

BLK_SIZE (INPUT, OPTIONAL) integer(i4b)

On entry, the block size used in the randomized partial QR phase of the QR-SVD algorithm if RANDOM_QR = true (and TRUNCATED_QR = false).

BLK_SIZE must be greater or equal to one and less than min(m,n) and must be set to a much smaller value than min(m,n) usually, depending also on the architecture of the computer.

See Further Details and the cited references for the meaning of the block size in the randomized (partial) QR algorithm.

By default, BLK_SIZE is set to min( BLKSZ_QR, min(m,n) ), where parameter BLKSZ_QR is the default block size for QR related algorithms specified in module Select_Parameters.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized partial QR phase of the QR-SVD algorithm if RANDOM_QR = true.

NOVER must be positive or null and verify the relationships:

NOVER + BLK_SIZE <= size( MAT, 1 ) if TRUNCATED_QR = false;

NOVER + NOVER_SVD + size( S ) <= size( MAT, 1 ) if TRUNCATED_QR = true.

and is adjusted if necessary to verify these relationships in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the randomized (partial) QR algorithms.

By default, the oversampling size is set to:

10 if TRUNCATED_QR = false;

max( (NOVER_SVD+size(S))/2_i4b, 10 ) if TRUNCATED_QR = true.

NOVER_SVD (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the SVD phase of the QR-SVD algorithm for computing the top nsvd singular triplets.

NOVER_SVD must be positive or null and verify the relationship:

NOVER_SVD + size( S ) <= min( size(MAT,1) , size(MAT,2) )

and is adjusted if necessary to verify this relationship in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the SVD phase of the QR-SVD algorithm.

By default, the oversampling size in the SVD phase is set to 10.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm, which is used in the last step of the QR-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

The optional argument MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g., QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm, which is used in the last step of the QR-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is the minimum of nsvd and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

The optional argument PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm, which is used in the last step of the QR-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed QR-SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

The standard deterministic BLAS2 algorithm for computing a QR factorization with column pivoting is described in the reference (1). The randomized partial QR factorization with column pivoting used if the optional logical argument RANDOM_QR is present with the value true is described in the references (3), (4), (5) and (6). Finally, the randomized partial and truncated QR factorization with column pivoting used if both the optional logical arguments RANDOM_QR and TRUNCATED_QR are present with the value true is described in the reference (7). This algorithm is the fastest, but is less accurate than the randomized partial QR factorization with column pivoting described in the references (3), (4), (5) and (6).

For further details, on computing low-rank matrix approximations from QR factorizations with column pivoting, the QR-SVD or randomized QR algorithms, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Chan, T.F., and Hansen, P.C., 1992:

Some applications of the rank revealing QR factorization. SIAM J. Sci. Statist. Comput., Volume 13, 727-741.

Duersch, J.A., and Gu, M., 2017:

Randomized QR with column pivoting. SIAM J. Sci. Comput., Volume 39, C263-C291.

Martinsson, P.G., Quintana-Orti, G., Heavner, N., and Van de Geijn, R., 2017:

Householder QR factorization with randomization for column pivoting (HQRRP). SIAM J. Sci. Comput., Volume 39, C96-C115.

Xiao, J., Gu, M., and Langou, J., 2017:

Fast parallel randomized QR with column pivoting algorithms for reliable low-rank matrix approximations. IEEE 24th International Conference on High Performance Computing (HiPC), IEEE, 2017, 233-242.

Duersch, J.A., and Gu, M., 2020:

Randomized projection for rank-revealing matrix factorizations and low-rank approximations. SIAM Review, Volume 62, Issue 3, 661-682.

Mary, T., Yamazaki, I., Kurzak, J., Luszczek, P., Tomov, S., and Dongarra, J., 2015:

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs. International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15).

`subroutine rqr_svd_cmp_fixed_precision ( mat, relerr, s, failure, v, random_qr, rng_alg, blk_size, nover, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

RQR_SVD_CMP_FIXED_PRECISION computes approximations of the nsvd largest singular values and associated left and right singular vectors of a full m-by-n real matrix MAT using a three-step procedure, which can be termed a QR-SVD algorithm:

first, a partial QR factorization with column pivoting of MAT is computed;

in a second step, a Singular Value Decomposition (SVD) of the (permuted) upper triangular or trapezoidal (e.g., if n>m) factor, R, of this QR decomposition is computed. The singular values and right singular vectors of this SVD of R are also estimates of the singular values and right singular vectors of MAT.

in a final step, estimates of the associated left singular vectors of MAT are obtained by pre-multiplying the left singular of R by the orthogonal matrix Q of the initial QR decomposition (or its first k columns if the QR factorization is only partial).

By default, a standard deterministic BLAS2 QR factorization with column pivoting is used in the first phase of the QR-SVD algorithm. However, if the optional logical argument RANDOM_QR is used with the value true, an alternate fast randomized partial QR factorization is used in the first phase of the QR-SVD algorithm.

nsvd is the target rank of the partial Singular Value Decomposition (SVD), which is sought, and this partial SVD must have an approximation error which fulfills:

||MAT-rSVD||_F <= ||MAT||_F * relerr

, where rSVD is the computed partial SVD approximation, || ||_F is the Frobenius norm and relerr is a prescribed accuracy tolerance for the relative error of the computed partial SVD approximation, which is specified in the input real argument RELERR.

In other words, nsvd is not known in advance and is determined in the subroutine. This explains why the output real array arguments S and V, which contain the computed singular values and associated right singular vectors of MAT on exit, must be declared in the calling program as pointers.

On exit, nsvd is equal to the size of the output real pointer argument S, which contains the computed singular values, i.e., nsvd = size( S ) and the relative error in the Frobenius norm of the computed partial SVD approximation is output in argument RELERR.

RQR_SVD_CMP_FIXED_PRECISION searches incrementally the best (e.g., smallest) partial SVD approximation, which fulfills the prescribed accuracy tolerance for the relative error. More precisely, the rank of the partial SVD approximation is increased progressively until the prescribed accuracy tolerance is satisfied and then improved and adjusted precisely in a final step to obtain the smallest partial SVD approximation, which satisfies the prescribed tolerance.

In all cases the relative error of the computed partial SVD approximation is output in argument RELERR.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, the top nsvd left singular vectors are stored in the first nsvd columns of MAT. The left singular vector associated with the singular value S(j) is stored in the j-th column of MAT. The other part of MAT is used as workspace in the algorithm and is destroyed on exit.

RELERR (INPUT/OUTPUT) real(stnd)

On entry, the requested accuracy tolerance for the relative error of the computed partial SVD approximation.

The preset value for RELERR must be greater than 4*epsilon( RELERR ) and less than one.

On exit, RELERR contains the relative error of the computed partial SVD approximation in the Frobenius norm:

RELERR = ||MAT-rSVD||_F / ||MAT||_F

S (OUTPUT) real(stnd), dimension(:), pointer

On exit, S(:) contains estimates of the first top nsvd singular values of MAT. The singular values are given in decreasing order and are positive or zero.

The statut of the pointer S must not be undefined on entry. If, on entry, the pointer S is already allocated, it will be first deallocated and then reallocated with the correct size.

On exit, the size of the pointer S will verify:

size( S ) = nsvd <= min( size(MAT,1) , size(MAT,2) ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit in the SVD of the triangular factor R of the QR decomposition of MAT.

FAILURE = true : indicates that the SVD algorithm did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of the triangular factor R of the QR decomposition of MAT.

V (OUTPUT) real(stnd), dimension(:,:), pointer

On exit, the computed nsvd top right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of V.

The statut of the pointer V must not be undefined on entry. If, on entry, the pointer V is already allocated, it will be first deallocated and then reallocated with the correct shape.

On exit, the shape of the pointer V will verify:

size( V, 1 ) = size( MAT, 2 ) = n ,

size( V, 2 ) = size( S ) = nsvd .

RANDOM_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if RANDOM_QR is used with the value true, a fast randomized partial QR factorization is used in the first phase of the QR-SVD algorithm.

By default, RANDOM_QR = false, i.e., a standard deterministic (partial) QR factorization with column pivoting is used in the first phase of the QR-SVD algorithm.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian matrix in the randomized (partial) QR phase of the QR-SVD algorithm if RANDOM_QR = true.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to RQR_SVD_CMP_FIXED_PRECISION.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

BLK_SIZE (INPUT, OPTIONAL) integer(i4b)

On entry, the block size used in the randomized partial QR phase of the QR-SVD algorithm if RANDOM_QR = true.

BLK_SIZE must be greater or equal to one and less than min(m,n) and must be set to a much smaller value than min(m,n) usually, depending also on the architecture of the computer.

See Further Details and the cited references for the meaning of the block size in the randomized (partial) QR algorithm.

By default, BLK_SIZE is set to min( BLKSZ_QR, min(m,n) ), where parameter BLKSZ_QR is the default block size for QR related algorithms specified in module Select_Parameters.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized partial QR phase of the QR-SVD algorithm if RANDOM_QR = true.

NOVER must be positive or null and verify the relationship:

NOVER + BLK_SIZE <= size( MAT, 1 )

and is adjusted if necessary to verify this relationship in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the randomized (partial) QR algorithm.

By default, the oversampling size is set to 10.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm, which is used in the last step of the QR-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

The optional argument MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g., QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm, which is used in the last step of the QR-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is the minimum of nsvd and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

The optional argument PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm, which is used in the last step of the QR-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed QR-SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

Usually, the problem of low-rank matrix approximation falls into two categories:

the fixed-rank problem, where the rank parameter nsvd is given;

the fixed-precision problem, where we seek a partial SVD factorization, rSVD, as small as possible such that

||MAT-rSVD||_F <= eps

, where eps is a given accuracy tolerance.

RQR_SVD_CMP_FIXED_PRECISION is dedicated to solve the fixed-precision problem. The fixed-rank problem can be solved by subroutine RQR_SVD_CMP.

The standard deterministic BLAS2 algorithm for computing a QR factorization with column pivoting is described in the reference (1). The randomized partial QR algorithm with column pivoting used if the optional logical argument RANDOM_QR is present with the value true is described in the references (3), (4), (5) and (6).

For further details, on computing low-rank matrix approximations from QR factorizations with column pivoting, the QR-SVD or randomized QR algorithms, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Chan, T.F., and Hansen, P.C., 1992:

Some applications of the rank revealing QR factorization. SIAM J. Sci. Statist. Comput., Volume 13, 727-741.

Duersch, J.A., and Gu, M., 2017:

Randomized QR with column pivoting. SIAM J. Sci. Comput., Volume 39, C263-C291.

Martinsson, P.G., Quintana-Orti, G., Heavner, N., and Van de Geijn, R., 2017:

Householder QR factorization with randomization for column pivoting (HQRRP). SIAM J. Sci. Comput., Volume 39, C96-C115.

Xiao, J., Gu, M., and Langou, J., 2017:

Fast parallel randomized QR with column pivoting algorithms for reliable low-rank matrix approximations. IEEE 24th International Conference on High Performance Computing (HiPC), IEEE, 2017, 233-242.

Duersch, J.A., and Gu, M., 2020:

Randomized projection for rank-revealing matrix factorizations and low-rank approximations. SIAM Review, Volume 62, Issue 3, 661-682.

`subroutine rqlp_svd_cmp ( mat, s, leftvec, rightvec, failure, niter, random_qr, truncated_qr, rng_alg, blk_size, nover, nover_svd, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

RQLP_SVD_CMP computes approximations of the nsvd largest singular values and associated left and right singular vectors of a full m-by-n real matrix MAT using a four-step procedure, which can be termed a QLP-SVD algorithm:

First, a partial (or complete) QLP factorization of MAT is computed as

MAT = Q * L * P

, where Q is a m-by-m (or m-by-k if the factorization is only partial) orthogonal matrix, P is an n-by-n (or k-by-n if the factorization is partial) orthogonal matrix and L is a lower m-by-n (or k-by-k if the factorization is partial) triangular matrix.

In a second step, the matrix product MAT * P’ is computed and, at the user option, a number of QR-QL iterations are performed (this is equivalent to subspace iterations) to improve the estimates of the principal row and columns subspaces of MAT.

In a third step, a Singular Value Decomposition (SVD) of the matrix product MAT * P’ is computed. The singular values and left singular vectors of this SVD are also estimates of the singular values and left singular vectors of MAT.

In a final step, estimates of the associated right singular vectors of MAT are obtained by pre-multiplying P’ by the right singular vectors in the SVD of this matrix product.

If the optional logical argument RANDOM_QR is used with the value true, a fast randomized (partial) QLP factorization is used in the first phase of the QLP-SVD algorithm. Furthermore if, in addition, the optional logical argument TRUNCATED_QR is used with the value true, an even faster (but slightly less accurate) randomized partial and truncated QLP factorization will be used in the first phase of the QLP-SVD algorithm.

nsvd is the target rank of the partial SVD, which is sought, and is equal to the size of the output real vector argument S, i.e., nsvd = size( S ). If, nsvd = min( size(MAT,1) , size(MAT,2) ), a full SVD of MAT is obtained with the same or higher accuracy than subroutines SVD_CMP or SVD_CMP2.

See Further Details and the cited references for more information.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is not modified by the routine.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(:) contains the first top nsvd singular values of MAT. The singular values are given in decreasing order and are positive or zero.

The size of S must verify:

size( S ) = nsvd <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed top nsvd left singular vectors of MAT. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) = nsvd .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed top nsvd right singular vectors of MAT. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) = nsvd .

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit, if the optional logical argument FAILURE is present, a test of the accuracy of the computed singular triplets is performed and in that case:

FAILURE = false : indicates successful exit;

FAILURE = true : indicates that some singular values and vectors of MAT failed to converge in NITER QR-QL iterations.

If FAILURE = true on exit, results are still useful, but some of the approximated singular triplets have a poor accuracy.

NITER (INPUT, OPTIONAL) integer(i4b)

The number of subspace iterations performed in the subroutine after the initial QLP factorization and first subspace projection for computing the top nsvd singular triplets.

NITER must be positive or null.

By default, no subspace iterations are performed after the initial QLP factorization and first subspace projection.

RANDOM_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if RANDOM_QR is used with the value true, a fast randomized partial QLP factorization is used in the first phase of the QLP-SVD algorithm.

By default, RANDOM_QR = false, i.e., a standard (partial) deterministic QLP factorization is used in the first phase of the QLP-SVD algorithm.

TRUNCATED_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if TRUNCATED_QR is used with the value true in addition to RANDOM_QR also set to true, a very fast (but slightly less accurate) randomized partial QLP factorization is used in the first phase of the QLP-SVD algorithm.

By default, TRUNCATED_QR = false, i.e., a “standard” randomized (partial) QLP factorization is used in the first phase of the QLP-SVD algorithm if RANDOM_QR = true.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian matrix in the randomized (partial) QLP phase of the QLP-SVD algorithm if RANDOM_QR = true.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to RQLP_SVD_CMP.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

BLK_SIZE (INPUT, OPTIONAL) integer(i4b)

On entry, the block size used in the randomized partial QLP phase of the QLP-SVD algorithm if RANDOM_QR = true (and TRUNCATED_QR = false).

BLK_SIZE must be greater or equal to one and less than min(m,n) and must be set to a much smaller value than min(m,n) usually, depending also on the architecture of the computer.

See Further Details and the cited references for the meaning of the block size in the randomized (partial) QR or QLP algorithm.

By default, BLK_SIZE is set to min( BLKSZ_QR, min(m,n) ), where parameter BLKSZ_QR is the default block size for QR related algorithms specified in module Select_Parameters.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized partial QLP phase of the QLP-SVD algorithm if RANDOM_QR = true.

NOVER must be positive or null and verify the relationships:

NOVER + BLK_SIZE <= size( MAT, 1 ) if TRUNCATED_QR = false;

NOVER + NOVER_SVD + size( S ) <= size( MAT, 1 ) if TRUNCATED_QR = true.

and is adjusted if necessary to verify these relationships in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the partial randomized QR and QLP algorithms.

By default, the oversampling size is set to:

10 if TRUNCATED_QR = false;

max( (NOVER_SVD+size(S))/2_i4b, 10 ) if TRUNCATED_QR = true.

NOVER_SVD (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the SVD phase of the QLP-SVD algorithm for computing the top nsvd singular triplets.

NOVER_SVD must be positive or null and verify the relationship:

NOVER_SVD + size( S ) <= min( size(MAT,1) , size(MAT,2) )

and is adjusted if necessary to verify this relationship in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the SVD phase of the QLP-SVD algorithm.

By default, the oversampling size is set to 10.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

The optional argument MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g., QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is the minimum of nsvd and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

The optional argument PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed QLP-SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

The QLP-SVD algorithm implemented in RQLP_SVD_CMP subroutine is a variation of the TXUV algorithm described in the references (2), (5) and (6).

For further details, on computing low-rank matrix approximations with a QLP factorization, the QLP-SVD (e.g., TXUV) algorithm or randomized (partial) QLP and QR factorizations, see:

Stewart, G.W., 1999:

The QLP approximation to the singular value decomposition. SIAM J. Sci. Comput., Volume 20, 1336-1348.

Duersch, J.A., and Gu, M., 2017:

Randomized QR with column pivoting. SIAM J. Sci. Comput., Volume 39, C263-C291.

Martinsson, P.G., Quintana-Orti, G., Heavner, N., and Van de Geijn, R., 2017:

Householder QR factorization with randomization for column pivoting (HQRRP). SIAM J. Sci. Comput., Volume 39, C96-C115.

Xiao, J., Gu, M., and Langou, J., 2017:

Fast parallel randomized QR with column pivoting algorithms for reliable low-rank matrix approximations. IEEE 24th International Conference on High Performance Computing (HiPC), IEEE, 2017, 233-242.

Feng, Y., Xiao, J., and Gu, M., 2019:

Flip-flop spectrum-revealing QR factorizations and its applications to singular value decomposition. Electronic Transactions on Numerical Analysis (ETNA), Volume 51, 469-494.

Duersch, J.A., and Gu, M., 2020:

Randomized projection for rank-revealing matrix factorizations and low-rank approximations. SIAM Review, Volume 62, Issue 3, 661-682.

Huckaby, D.A., and Chan, T.F., 2003:

On the convergence of Stewart’s QLP algorithm for approximating the SVD. Numer. Algorithms, Volume 32, 287-316.

Huckaby, D.A., and Chan, T.F., 2005:

Stewart’s pivoted QLP decomposition for low-rank matrices Numerical Linear Algebra with Applications, Volume 12, 153-159.

`subroutine rqlp_svd_cmp2 ( mat, s, leftvec, rightvec, failure, niter, rng_alg, nover, nover_svd, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

RQLP_SVD_CMP2 computes approximations of the nsvd largest singular values and associated left and right singular vectors of a full m-by-n real matrix MAT using a four-step procedure, which can be termed a randomized QLP-SVD algorithm:

First, an approximate and randomized partial QLP factorization of MAT is computed as

MAT = Q * L * P

, where Q is a m-by-k matrix with orthonormal colmuns, P is an k-by-n matrix with orthonormal rows and L is a lower k-by-k triangular matrix.

In a second step, the matrix product MAT * P’ is computed and, at the user option, a number of QR-QL iterations are performed (this is equivalent to subspace iterations) to improve the estimates of the principal row and columns subspaces of MAT.

In a third step, a Singular Value Decomposition (SVD) of the matrix product MAT * P’ is computed. The singular values and left singular vectors of this SVD are also estimates of the singular values and left singular vectors of MAT.

In a final step, estimates of the associated right singular vectors of MAT are obtained by pre-multiplying P’ by the right singular vectors in the SVD of this matrix product.

A very fast randomized partial and truncated QLP factorization is used in the first phase of the QLP-SVD algorithm.

nsvd is the target rank of the partial SVD, which is sought, and is equal to the size of the output real vector argument S, i.e., nsvd = size( S ).

See Further Details and the cited references for more information.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is not modified by the routine.

S (OUTPUT) real(stnd), dimension(:)

On exit, S(:) contains the first top nsvd singular values of MAT. The singular values are given in decreasing order and are positive or zero.

The size of S must verify:

size( S ) = nsvd <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed top nsvd left singular vectors of MAT. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) = nsvd .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed top nsvd right singular vectors of MAT. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) = nsvd .

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit, if the optional logical argument FAILURE is present, a test of the accuracy of the computed singular triplets is performed and in that case:

FAILURE = false : indicates successful exit;

FAILURE = true : indicates that some singular values and vectors of MAT failed to converge in NITER QR-QL iterations.

If FAILURE = true on exit, results are still useful, but some of the approximated singular triplets may have a poor accuracy.

NITER (INPUT, OPTIONAL) integer(i4b)

The number of subspace iterations performed in the subroutine after the initial QLP factorization and first subspace projection for computing the top nsvd singular triplets.

NITER must be positive or null.

By default, no subspace iterations are performed after the initial QLP factorization and first subspace projection.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian matrix in the randomized (partial) QLP phase of the QLP-SVD algorithm if RANDOM_QR = true.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to RQLP_SVD_CMP.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized partial QLP phase of the QLP-SVD algorithm if RANDOM_QR = true.

NOVER must be positive or null and verify the relationship:

NOVER + NOVER_SVD + size( S ) <= size( MAT, 1 ).

and is adjusted if necessary to verify this relationship in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the partial randomized QR and QLP algorithms.

By default, the oversampling size is set to:

max( (NOVER_SVD+size(S))/2_i4b, 10 ).

NOVER_SVD (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the SVD phase of the QLP-SVD algorithm for computing the top nsvd singular triplets.

NOVER_SVD must be positive or null and verify the relationship:

NOVER_SVD + size( S ) <= min( size(MAT,1) , size(MAT,2) )

and is adjusted if necessary to verify this relationship in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the SVD phase of the QLP-SVD algorithm.

By default, the oversampling size is set to 10.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

The optional argument MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g., QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is the minimum of nsvd and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

The optional argument PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed QLP-SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

The QLP-SVD algorithm implemented in RQLP_SVD_CMP2 subroutine is a variation of the TXUV algorithm described in the references (2), (5) and (6) in which the initial randomized partial QR factorization is replaced by the randomized partial and truncated QR algorithm described in the reference (9).

With this modification, RQLP_SVD_CMP2 subroutine is less accurate than RQLP_SVD_CMP subroutine, but significantly faster and much less memory demanding.

For further details, on computing low-rank matrix approximations with a QLP factorization, the QLP-SVD (e.g., TXUV) algorithm or randomized (partial) QLP and QR factorizations, see:

Stewart, G.W., 1999:

The QLP approximation to the singular value decomposition. SIAM J. Sci. Comput., Volume 20, 1336-1348.

Duersch, J.A., and Gu, M., 2017:

Randomized QR with column pivoting. SIAM J. Sci. Comput., Volume 39, C263-C291.

Martinsson, P.G., Quintana-Orti, G., Heavner, N., and Van de Geijn, R., 2017:

Householder QR factorization with randomization for column pivoting (HQRRP). SIAM J. Sci. Comput., Volume 39, C96-C115.

Xiao, J., Gu, M., and Langou, J., 2017:

Fast parallel randomized QR with column pivoting algorithms for reliable low-rank matrix approximations. IEEE 24th International Conference on High Performance Computing (HiPC), IEEE, 2017, 233-242.

Feng, Y., Xiao, J., and Gu, M., 2019:

Flip-flop spectrum-revealing QR factorizations and its applications to singular value decomposition. Electronic Transactions on Numerical Analysis (ETNA), Volume 51, 469-494.

Duersch, J.A., and Gu, M., 2020:

Randomized projection for rank-revealing matrix factorizations and low-rank approximations. SIAM Review, Volume 62, Issue 3, 661-682.

Huckaby, D.A., and Chan, T.F., 2003:

On the convergence of Stewart’s QLP algorithm for approximating the SVD. Numer. Algorithms, Volume 32, 287-316.

Huckaby, D.A., and Chan, T.F., 2005:

Stewart’s pivoted QLP decomposition for low-rank matrices Numerical Linear Algebra with Applications, Volume 12, 153-159.

Mary, T., Yamazaki, I., Kurzak, J., Luszczek, P., Tomov, S., and Dongarra, J., 2015:

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs. International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15).

`subroutine rqlp_svd_cmp_fixed_precision ( mat, relerr, s, leftvec, rightvec, failure, niter, random_qr, rng_alg, blk_size, nover, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

RQLP_SVD_CMP_FIXED_PRECISION computes approximations of the nsvd largest singular values and associated left and right singular vectors of a full m-by-n real matrix MAT using a four-step procedure, which can be termed a QLP-SVD algorithm:

First, a partial QLP factorization of MAT is computed as

MAT = Q * L * P

, where Q is a m-by-k orthogonal matrix, P’ is an n-by-k orthogonal matrix and L is a lower k-by-k triangular matrix.

In a second step, the matrix product MAT * P’ is computed and at, the user option, a number of QR-QL iterations are performed (this is equivalent to subspace iterations) to improve the estimates of the principal row and columns subspaces of MAT.

In a third step, a Singular Value Decomposition (SVD) of the matrix product MAT * P’ is computed. The singular values and left singular vectors of this SVD are also estimates of the top singular values and left singular vectors of MAT.

In a final step, estimates of the associated right singular vectors of MAT are obtained by pre-multiplying P’ by the right singular vectors in the SVD of this matrix product.

By default, a standard deterministic BLAS2 QR factorization with column pivoting is used in the first phase of the QLP step of the QLP-SVD algorithm. However, if the optional logical argument RANDOM_QR is used with the value true, a fast randomized partial QR factorization is used in the QLP step of the QLP-SVD algorithm.

nsvd is the target rank of the partial SVD, which is sought, and this partial SVD must have an approximation error which fulfills:

||MAT-rSVD||_F <= ||MAT||_F * relerr

, where rSVD is the computed partial SVD approximation, || ||_F is the Frobenius norm and relerr is a prescribed accuracy tolerance for the relative error of the computed partial SVD approximation, which is specified in the input real argument RELERR.

In other words, nsvd is not known in advance and is determined in the subroutine. This explains why the output real array arguments S, LEFTVEC and RIGHTVEC, which contain the computed singular values and associated singular vectors of MAT on exit, must be declared in the calling program as pointers.

On exit, nsvd is equal to the size of the output real pointer argument S, which contains the computed singular values, i.e., nsvd = size( S ) and the relative error in the Frobenius norm of the computed partial SVD approximation is output in argument RELERR.

RQLP_SVD_CMP_FIXED_PRECISION first searches incrementally the best (e.g., smallest) partial QR approximation, which fulfills the prescribed accuracy tolerance for the relative error. More precisely, the rank of this partial QR approximation is increased progressively until the prescribed accuracy tolerance is satisfied. This partial QR approximation is then transformed in a partial QLP factorization, which is improved and adjusted precisely in a final step to obtain the smallest partial SVD approximation, which satisfies the prescribed tolerance.

In all cases the relative error of the computed partial SVD approximation is output in argument RELERR.

See Further Details and the cited references for more information.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is not modified by the routine.

RELERR (INPUT/OUTPUT) real(stnd)

On entry, the requested accuracy tolerance for the relative error of the computed partial SVD approximation.

The preset value for RELERR must be greater than 4*epsilon( RELERR ) and less than one.

On exit, RELERR contains the relative error of the computed partial SVD approximation in the Frobenius norm:

RELERR = ||MAT-rSVD||_F / ||MAT||_F

S (OUTPUT) real(stnd), dimension(:), pointer

On exit, S(:) contains estimates of the first top nsvd singular values of MAT. The singular values are given in decreasing order and are positive or zero.

The statut of the pointer S must not be undefined on entry. If, on entry, the pointer S is already allocated, it will be first deallocated and then reallocated with the correct size.

On exit, the size of the pointer S will verify:

size( S ) = nsvd <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:), pointer

On exit, the computed top nsvd left singular vectors of MAT. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The statut of the pointer LEFTVEC must not be undefined on entry. If, on entry, the pointer LEFTVEC is already allocated, it will be first deallocated and then reallocated with the correct shape.

On exit, the shape of the pointer LEFTVEC will verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) = nsvd .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:), pointer

On exit, the computed top nsvd right singular vectors of MAT. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The statut of the pointer RIGHTVEC must not be undefined on entry. If, on entry, the pointer RIGHTVEC is already allocated, it will be first deallocated and then reallocated with the correct shape.

On exit, the shape of the pointer RIGHTVEC will verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) = nsvd .

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit, if the optional logical argument FAILURE is present, a test of the accuracy of the computed singular triplets is performed and in that case:

FAILURE = false : indicates successful exit;

FAILURE = true : indicates that some singular values and vectors of MAT failed to converge in NITER QR-QL iterations for the requested accuracy tolerance for the relative error of the computed partial SVD approximation.

If FAILURE = true on exit, results are still useful, but some of the approximated singular triplets have a poor accuracy.

NITER (INPUT, OPTIONAL) integer(i4b)

The number of subspace iterations performed in the subroutine after the initial QLP factorization and first subspace projection for computing the top nsvd singular triplets.

NITER must be positive or null.

By default, no subspace iterations are performed after the initial QLP factorization and first subspace projection.

RANDOM_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if RANDOM_QR is used with the value true, a fast randomized partial QLP factorization is used in the first phase of the QLP-SVD algorithm.

By default, RANDOM_QR = false, i.e., a standard (partial) deterministic QLP factorization is used in the first phase of the QLP-SVD algorithm.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian matrix in the randomized (partial) QLP phase of the QLP-SVD algorithm if RANDOM_QR = true.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to RQLP_SVD_CMP.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

BLK_SIZE (INPUT, OPTIONAL) integer(i4b)

On entry, the block size used in the randomized partial QLP phase of the QLP-SVD algorithm if RANDOM_QR = true.

BLK_SIZE must be greater or equal to one and less than min(m,n) and must be set to a much smaller value than min(m,n) usually, depending also on the architecture of the computer.

See Further Details and the cited references for the meaning of the block size in the randomized (partial) QR or QLP algorithm.

By default, BLK_SIZE is set to min( BLKSZ_QR, min(m,n) ), where parameter BLKSZ_QR is the default block size for QR related algorithms specified in module Select_Parameters.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized partial QLP phase of the QLP-SVD algorithm if RANDOM_QR = true.

NOVER must be positive or null and verify the relationship:

NOVER + BLK_SIZE <= size( MAT, 1 )

and is adjusted if necessary to verify these relationships in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the partial randomized QR and QLP algorithms.

By default, the oversampling size is set to 10.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

The optional argument MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g., QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is the minimum of nsvd and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

The optional argument PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm, which is used in the last step of the QLP-SVD algorithm.

See description of suboutine SVD_CMP for further details about this optional argument.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed QLP-SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

Usually, the problem of low-rank matrix approximation falls into two categories:

the fixed-rank problem, where the rank parameter nsvd is given;

the fixed-precision problem, where we seek a partial SVD factorization, rSVD, as small as possible such that

||MAT-rSVD||_F <= eps

, where eps is a given accuracy tolerance.

RQLP_SVD_CMP_FIXED_PRECISION is dedicated to solve the fixed-precision problem. The fixed-rank problem can be solved by subroutines RQLP_SVD_CMP or RQLP_SVD_CMP2.

The QLP-SVD algorithm implemented in RQLP_SVD_CMP_FIXED_PRECISION subroutine is a variation of the TXUV algorithm described in the references (2), (5) and (6).

For further details, on computing low-rank matrix approximations with a QLP factorization, the QLP-SVD (e.g., TXUV) algorithm or randomized (partial) QLP and QR factorizations, see:

Stewart, G.W., 1999:

The QLP approximation to the singular value decomposition. SIAM J. Sci. Comput., Volume 20, 1336-1348.

Duersch, J.A., and Gu, M., 2017:

Randomized QR with column pivoting. SIAM J. Sci. Comput., Volume 39, C263-C291.

Martinsson, P.G., Quintana-Orti, G., Heavner, N., and Van de Geijn, R., 2017:

Householder QR factorization with randomization for column pivoting (HQRRP). SIAM J. Sci. Comput., Volume 39, C96-C115.

Xiao, J., Gu, M., and Langou, J., 2017:

Fast parallel randomized QR with column pivoting algorithms for reliable low-rank matrix approximations. IEEE 24th International Conference on High Performance Computing (HiPC), IEEE, 2017, 233-242.

Feng, Y., Xiao, J., and Gu, M., 2019:

Flip-flop spectrum-revealing QR factorizations and its applications to singular value decomposition. Electronic Transactions on Numerical Analysis (ETNA), Volume 51, 469-494.

Duersch, J.A., and Gu, M., 2020:

Randomized projection for rank-revealing matrix factorizations and low-rank approximations. SIAM Review, Volume 62, Issue 3, 661-682.

Huckaby, D.A., and Chan, T.F., 2003:

On the convergence of Stewart’s QLP algorithm for approximating the SVD. Numer. Algorithms, Volume 32, 287-316.

Huckaby, D.A., and Chan, T.F., 2005:

Stewart’s pivoted QLP decomposition for low-rank matrices Numerical Linear Algebra with Applications, Volume 12, 153-159.

`subroutine qlp_cmp ( mat, beta, tau, lmat, qmat, pmat, random_qr, truncated_qr, rng_alg, blk_size, nover )`¶

Purpose¶

QLP_CMP computes a partial or complete QLP factorization of a m-by-n matrix MAT:

MAT = Q * L * P

, where Q is a m-by-krank orthogonal matrix, P is a krank-by-n orthogonal matrix and L is a krank-by-krank lower triangular matrix. If krank = min(m,n), the QLP factorization is complete and MAT = Q * L * P .

The QLP factorization is obtained by a two-step algorithm:

first, a partial (or complete) QR factorization with column pivoting of MAT is computed;

in a second step, a LQ Decomposition of the (permuted) upper triangular or trapezoidal (e.g., if n>m) factor, R, of this QR decomposition is computed.

By default, a standard deterministic QR factorization with column pivoting is used in the first phase of the QLP algorithm. However, if the optional logical argument RANDOM_QR is used with the value true, an alternate fast randomized (partial) QR factorization is used in the first phase of the QLP algorithm. Furthermore if, in addition, the optional logical argument TRUNCATED_QR is used with the value true, an even faster (but less accurate) randomized partial and truncated QR factorization will be used in the first phase of the QLP algorithm. In all cases, a deterministic blocked LQ factorization is used in the second step of the QLP factorization.

At the user option, the QLP factorization can also be only partial, e.g., the subroutine stops the computations when the numbers of columns of Q and of rows of P are equal to a predefined value equals to krank = size( BETA ) = size( TAU ).

The QLP decomposition provides a reasonable and cheap estimate of the Singular Value Decomposition (SVD) of a matrix when this matrix has a low rank or a significant gap in its singular values spectrum.

See Further Details and the cited references for more information.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the real m-by-n matrix to be decomposed.

On exit, MAT has been overwritten by details of its (partial) QLP factorization.

See Further Details.

BETA (OUTPUT) real(stnd), dimension(:)

On exit, the scalars factors of the elementary reflectors defining Q.

See Further Details.

The size of BETA must verify:

size( BETA ) = krank <= min( m , n ) = min( size(MAT,1) , size(MAT,2) ).

TAU (OUTPUT) real(stnd), dimension(:)

On exit, the scalars factors of the elementary reflectors defining P.

See Further Details.

The size of TAU must verify:

size( TAU ) = krank <= min( m , n ) = min( size(MAT,1) , size(MAT,2) ).

LMAT (OUTPUT, OPTIONAL) real(stnd), dimension(:,:)

On exit, LMAT stores the lower triangular matrix L in the (partial) QLP factorization of MAT. The diagonal elements of LMAT (called the L-values) are estimates of the singular values of MAT if there is a significant gap in the singular values spectrum of MAT.

See Further Details.

The shape of LMAT must verify:

size( LMAT, 1 ) = size( LMAT, 2 ) = krank.

QMAT (OUTPUT, OPTIONAL) real(stnd), dimension(:,:)

On exit, QMAT stores the first krank columns of the orthogonal matrix Q in the (partial) QLP factorization of MAT.

See Further Details.

The shape of QMAT must verify:

size( QMAT, 1 ) = m.

size( QMAT, 2 ) = krank.

PMAT (OUTPUT, OPTIONAL) real(stnd), dimension(:,:)

On exit, PMAT stores the first krank rows of the orthogonal matrix P in the (partial) QLP factorization of MAT.

See Further Details.

The shape of PMAT must verify:

size( QMAT, 1 ) = krank.

size( QMAT, 2 ) = n.

RANDOM_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if RANDOM_QR is used with the value true, a fast randomized (partial) QR factorization with column pivoting is used in the first phase of the QLP algorithm.

By default, RANDOM_QR = false, i.e., A standard deterministic (partial) QR factorization with column pivoting is used in the first phase of the QLP algorithm.

TRUNCATED_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if TRUNCATED_QR is used with the value true in addition to RANDOM_QR also set to true, a very fast (but less accurate) randomized partial and truncated QR factorization is used in the first phase of the QLP algorithm.

By default, TRUNCATED_QR = false, i.e., a “standard” randomized (partial) QR factorization with column pivoting is used in the first phase of the QLP algorithm if RANDOM_QR = true.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian matrix in the randomized (partial) QR phase of the QLP algorithm, if RANDOM_QR = true.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to QLP_CMP.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

BLK_SIZE (INPUT, OPTIONAL) integer(i4b)

On entry, the block size used in the randomized (partial) QR phase of the QLP algorithm, if RANDOM_QR = true (and TRUNCATED_QR = false).

BLK_SIZE must be greater or equal to one and less than min(m,n) and must be set to a much smaller value than min(m,n) usually, depending also on the architecture of the computer.

See Further Details and the cited references for the meaning of the block size in the randomized (partial) QR algorithm.

By default, BLK_SIZE is set to min( BLKSZ_QR, min(m,n) ), where parameter BLKSZ_QR is the default block size for QR related algorithms specified in module Select_Parameters.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized (partial) QR phase of the QLP algorithm, if RANDOM_QR = true.

NOVER must be positive or null and verify the relationships:

NOVER + BLK_SIZE <= size( MAT, 1 ) if TRUNCATED_QR = false;

NOVER + size( BETA ) <= size( MAT, 1 ) if TRUNCATED_QR = true.

and is adjusted if necessary to verify these relationships in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the randomized (partial) QR algorithms.

By default, the oversampling size is set to:

10 if TRUNCATED_QR = false;

max( size(BETA)/2_i4b, 10 ) if TRUNCATED_QR = true.

Further Details¶

QLP_CMP first computes a (partial or complete) QR factorization with column pivoting of the m-by-n matrix MAT:

MAT * N = Q * R

, where N is a n-by-n permutation matrix, R is a upper triangular or trapezoidal (i.e., if n>m) matrix and Q is a m-by-m orthogonal matrix.

If the optional logical argument RANDOM_QR is used with the value true, a fast randomized partial (and truncated, if the optional logical argument TRUNCATED_QR is also used with the value true) QR factorization with column pivoting is used in this first phase of the QLP algorithm.

At the user option, this QR factorization can also be only partial, e.g., the subroutine ends when the numbers of columns of Q is equal to a predefined value equals to krank = size( BETA ) = size( TAU ).

This leads implicitly to the following partition of Q:

[ Q1 Q2 ]

where Q1 is a m-by-krank orthonormal matrix and Q2 is a m-by-(m-krank) orthonormal matrix orthogonal to Q1, and to the following corresponding partition of R:

[ R11 R12 ]

[ R21 R22 ]

where R11 is a krank-by-krank triangular matrix, R21 is zero by construction, R12 is a full krank-by-(n-krank) matrix and R22 is a full (m-krank)-by-(n-krank) matrix.

In a second step, QLP_CMP computes a deterministic LQ factorization of the matrix product:

R * N’ = L * P

if the first QR factorization is complete, or of the matrix product:

[ R11 R12 ] * N’ = L * P

if this first QR factorization is only partial. This leads to the (partial) QLP factorization of MAT:

MAT = Q1 * L * P

where L is a krank-by-krank triangular matrix and P is a krank-by-n matrix with orthonormal rows.

The properties of the QLP factorization and when it can be used as a good proxy for the (partial or complete) SVD of a matrix are discussed in the references (2), (7), (8) and (9).

The computations are parallelized if OPENMP is used. However, note that QLP_CMP uses a standard “BLAS2” algorithm without any blocking for performing the first QR factorization with column pivoting if the optional logical argument RANDOM_QR is not used or used with the value false. On the other hand, QLP_CMP uses fast randomized and blocked QR algorithms with column pivoting (see the references (3), (4), (5) and (6)) if RANDOM_QR is used with the value true. These randomized algorithms are thus particularly efficient for large matrices.

The standard deterministic BLAS2 algorithm for computing a QR factorization with column pivoting is described in the reference (1). The randomized partial QR algorithm with column pivoting used if the optional logical argument RANDOM_QR is present with the value true is described in the references (3), (4) and (5). Finally, the randomized partial and truncated QR algorithm with column pivoting used if both the optional logical arguments RANDOM_QR and TRUNCATED_QR are present with the value true is described in the reference (6). This algorithm is the fastest, but less accurate than the randomized partial QR algorithm with column pivoting described in the references (3), (4) and (5).

In all cases, QLP_CMP uses an efficient (but deterministic) blocked algorithm for performing the LQ factorization in the second step of the QLP decomposition. The LQ factorization is described in the reference (1).

On exit, the matrix Q is represented as a product of elementary reflectors

Q = H(1) * H(2) * … * H(krank), where krank = size( BETA ) <= min( m , n ).

Each H(i) has the form

H(i) = I + beta * ( v * v’ ) ,

where beta is a real scalar and v is a real m-element vector with v(1:i-1) = 0. v(i:m) is stored on exit in MAT(i:m,i) and beta in BETA(i). Note also that v(i) = 1.

On exit of QLP_CMP, the orthonormal matrix Q stored in factored form in MAT can be generated by a call to suboutine ORTHO_GEN_QR with arguments MAT and BETA. Alternatively, QLP_CMP computes the first krank columns of Q explicitly if the optional array argument QMAT is present.

The matrix P is represented as a product of elementary reflectors

Q = G(k) * … * G(2) * G(1), where krank = size( TAU ) <= min( m , n ).

Each G(i) has the form

G(i) = I + tau * ( u * u’ ) ,

where tau is a real scalar and u is a real n-element vector with u(1:i-1) = 0. u(i:n) is stored on exit in MAT(i,i:n) and tau in TAU(i). Note also that u(i) = 1.

On exit of QLP_CMP, the orthonormal matrix P stored in factored in MAT can be generated by a call to suboutine ORTHO_GEN_LQ with arguments MAT and TAU. Alternatively, QLP_CMP computes the first krank rows of P explicitly if the optional array argument PMAT is present.

Finally, QLP_CMP outputs the krank-by-krank lower triangular matrix L in the optional array argument LMAT. If LMAT is not specified in the QLP_CMP call, the L factor of the QLP decomposition is not stored on exit.

For further details on the QLP factorization and its use, or randomized QR and QLP algorithms, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Stewart, G.W., 1999:

The QLP approximation to the singular value decomposition. SIAM J. Sci. Comput., Volume 20, 1336-1348.

Duersch, J.A., and Gu, M., 2017:

Randomized QR with column pivoting. SIAM J. Sci. Comput., Volume 39, C263-C291.

Martinsson, P.G., Quintana-Orti, G., Heavner, N., and Van de Geijn, R., 2017:

Householder QR factorization with randomization for column pivoting (HQRRP). SIAM J. Sci. Comput., Volume 39, C96-C115.

Duersch, J.A., and Gu, M., 2020:

Randomized projection for rank-revealing matrix factorizations and low-rank approximations. SIAM Review, Volume 62, Issue 3, 661-682.

Mary, T., Yamazaki, I., Kurzak, J., Luszczek, P., Tomov, S., and Dongarra, J., 2015:

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs. International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15).

Wu, N., and Xiang, H., 2020:

Randomized QLP decomposition. Linear algebra and its applications, Volume 599, 18-35

Huckaby, D.A., and Chan, T.F., 2003:

On the convergence of Stewart’s QLP algorithm for approximating the SVD. Numer. Algorithms, Volume 32, 287-316.

Huckaby, D.A., and Chan, T.F., 2005:

Stewart’s pivoted QLP decomposition for low-rank matrices Numerical Linear Algebra with Applications, Volume 12, 153-159.

`subroutine qlp_cmp2 ( mat, lmat, qmat, pmat, niter_qrql, random_qr, truncated_qr, rng_alg, blk_size, nover )`¶

Purpose¶

QLP_CMP2 computes a partial or complete QLP factorization of an m-by-n matrix MAT:

MAT = Q * L * P

, where Q is a m-by-krank orthogonal matrix, P is a krank-by-n orthogonal matrix and L is a krank-by-krank lower triangular matrix. If krank = min(m,n), the QLP factorization is complete and MAT = Q * L * P .

The QLP factorization is obtained by a three-step algorithm:

first, a partial (or complete) QR factorization with column pivoting of MAT is computed;

in a second step, a LQ Decomposition of the (permuted) upper triangular or trapezoidal (e.g., if n>m) factor, R, in this QR decomposition of MAT is computed.

and, in a final step, NITER_QRQL QR-QL iterations can be performed on the L factor in this LQ decomposition to improve the accuracy of the diagonal elements of L (the so called L-values) as estimates of the singular values of MAT (see references (2), (7), (8) and (9) for details).

By default, a standard deterministic QR factorization with column pivoting is used in the first phase of the QLP algorithm. However, if the optional logical argument RANDOM_QR is used with the value true, an alternate fast randomized (partial) QR factorization is used in the first phase of the QLP algorithm. Furthermore if, in addition, the optional logical argument TRUNCATED_QR is used with the value true, an even faster (but less accurate) randomized partial and truncated QR factorization will be used in the first phase of the QLP algorithm. In all cases, deterministic blocked LQ and QR factorizations are used in the second and third steps of the QLP factorization.

At the user option, the QLP factorization can also be only partial, e.g., the subroutine stops the computations when the numbers of columns of Q and of rows of P are equal to a predefined value equals to krank = size( LMAT, 1 ) = size( LMAT, 2 ).

The QLP decomposition provides a reasonable and cheap estimate of the Singular Value Decomposition (SVD) of a matrix when this matrix has a low rank or a significant gap in its singular values spectrum.

See Further Details and the cited references for more information.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the real m-by-n matrix to be decomposed.

On exit, MAT is destroyed as MAT is used as workspace in the routine.

See Further Details.

LMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, LMAT stores the lower triangular matrix L in the (partial) QLP factorization of MAT.

See Further Details.

The shape of LMAT must verify:

size( LMAT, 1 ) = size( LMAT, 2 ) = krank.

QMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, QMAT stores the first krank columns of the orthogonal matrix Q in the (partial) QLP factorization of MAT.

See Further Details.

The shape of QMAT must verify:

size( QMAT, 1 ) = m.

size( QMAT, 2 ) = krank.

PMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, PMAT stores the first krank rows of the orthogonal matrix P in the (partial) QLP factorization of MAT.

See Further Details.

The shape of PMAT must verify:

size( QMAT, 1 ) = krank.

size( QMAT, 2 ) = n.

NITER_QRQL (INPUT, OPTIONAL) integer(i4b)

The number of QR-QL iterations performed on L after the initial QLP factorization for improving the accuracy of the L-values. NITER_QRQL must be positive or null.

By default, no QR-QL iterations are performed after the initial QLP factorization.

RANDOM_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if RANDOM_QR is used with the value true, a fast randomized (partial) QR factorization with column pivoting is used in the first phase of the QLP algorithm.

By default, RANDOM_QR = false, i.e., A standard deterministic (partial) QR factorization with column pivoting is used in the first phase of the QLP algorithm.

TRUNCATED_QR (INPUT, OPTIONAL) logical(lgl)

On entry, if TRUNCATED_QR is used with the value true in addition to RANDOM_QR also set to true, a very fast (but less accurate) randomized partial and truncated QR factorization is used in the first phase of the QLP algorithm.

By default, TRUNCATED_QR = false, i.e., a “standard” randomized (partial) QR factorization with column pivoting is used in the first phase of the QLP algorithm if RANDOM_QR = true.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian matrix in the randomized (partial) QR phase of the QLP algorithm, if RANDOM_QR = true.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to QLP_CMP2.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

BLK_SIZE (INPUT, OPTIONAL) integer(i4b)

On entry, the block size used in the randomized (partial) QR phase of the QLP algorithm, if RANDOM_QR = true (and TRUNCATED_QR = false).

BLK_SIZE must be greater or equal to one and less than min(m,n) and must be set to a much smaller value than min(m,n) usually, depending also on the architecture of the computer.

See Further Details and the cited references for the meaning of the block size in the randomized (partial) QR algorithm.

By default, BLK_SIZE is set to min( BLKSZ_QR, min(m,n) ), where parameter BLKSZ_QR is the default block size for QR related algorithms specified in module Select_Parameters.

NOVER (INPUT, OPTIONAL) integer(i4b)

The oversampling size used in the randomized (partial) QR phase of the QLP algorithm, if RANDOM_QR = true.

NOVER must be positive or null and verify the relationships:

NOVER + BLK_SIZE <= size( MAT, 1 ) if TRUNCATED_QR = false;

NOVER + size( LMAT, 1 ) <= size( MAT, 1 ) if TRUNCATED_QR = true.

and is adjusted if necessary to verify these relationships in all cases.

See Further Details and the cited references for the meaning and usefulness of the oversampling size in the randomized partial QR algorithm.

By default, the oversampling size is set to:

10 if TRUNCATED_QR = false;

max( size(LMAT,1)/2_i4b, 10 ) if TRUNCATED_QR = true.

Further Details¶

QLP_CMP2 first computes a (partial or complete) QR factorization with column pivoting of the m-by-n matrix MAT:

MAT * N = Q * R

, where N is a n-by-n permutation matrix, R is a upper triangular or trapezoidal (i.e., if n>m) matrix and Q is a m-by-m orthogonal matrix.

If the optional logical argument RANDOM_QR is used with the value true, a fast randomized partial (and truncated, if the optional logical argument TRUNCATED_QR is also used with the value true) QR factorization with column pivoting is used in this first phase of the QLP algorithm.

At the user option, this QR factorization can also be only partial, e.g., the subroutine ends when the numbers of columns of Q is equal to a predefined value equals to krank = size( LMAT, 1 ) = size( LMAT, 2 ).

This leads implicitly to the following partition of Q:

[ Q1 Q2 ]

where Q1 is a m-by-krank orthonormal matrix and Q2 is a m-by-(m-krank) orthonormal matrix orthogonal to Q1, and to the following corresponding partition of R:

[ R11 R12 ]

[ R21 R22 ]

where R11 is a krank-by-krank triangular matrix, R21 is zero by construction, R12 is a full krank-by-(n-krank) matrix and R22 is a full (m-krank)-by-(n-krank) matrix.

In a second step, QLP_CMP2 computes a deterministic LQ factorization of the matrix product:

R * N’ = L * P

if the first QR factorization is complete, or of the matrix product:

[ R11 R12 ] * N’ = L * P

if this first QR factorization is only partial. This leads to the (partial) QLP factorization of MAT:

MAT = Q1 * L * P

where L is a krank-by-krank triangular matrix and P is a krank-by-n matrix with orthonormal rows.

In a final step, NITER_QRQL QR-QL iterations can be performed on L to improve the accuracy of the diagonal elements of L (the so called L-values) as estimates of the singular values of MAT (see references (2), (7), (8) and (9) for details) and the orthogonal matrices Q and P are updated accordingly.

The properties of the QLP factorization and when it can be used as a good proxy for the (partial or complete) SVD of a matrix are discussed in the references (2), (7), (8) and (9).

The computations are parallelized if OPENMP is used. However, note that QLP_CMP2 uses a standard “BLAS2” algorithm without any blocking for performing the first QR factorization with column pivoting if the optional logical argument RANDOM_QR is not used or used with the value false. On the other hand, QLP_CMP2 uses an efficient randomized and blocked QR algorithm with column pivoting (see the references (2), (3) and (4)) if RANDOM_QR is used with the value true. This randomized algorithm is thus particularly efficient for large matrices.

The standard deterministic BLAS2 algorithm for computing a QR factorization with column pivoting is described in the reference (1). The randomized partial QR algorithm with column pivoting used if the optional logical argument RANDOM_QR is present with the value true is described in the references (3), (4) and (5). Finally, the randomized partial and truncated QR algorithm with column pivoting used if both the optional logical arguments RANDOM_QR and TRUNCATED_QR are present with the value true is described in the reference (6). This algorithm is the fastest, but less accurate than the randomized partial QR algorithm with column pivoting described in the references (3), (4) and (5).

In all cases, QLP_CMP2 uses efficient blocked algorithms for performing the LQ factorization in the second step of the QLP algorithm and also in the final QR-QL iterations performed on L. The LQ factorization is described in the reference (1).

On exit, QLP_CMP2 stores:

the krank-by-krank lower triangular matrix L in the array argument LMAT;

the first krank columns of Q in the array argument QMAT;

and the first krank rows of P in the array argument PMAT.

For further details on the QLP factorization and its use, randomized QR and QLP algorithms or QR-QL iterations, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Stewart, G.W., 1999:

The QLP approximation to the singular value decomposition. SIAM J. Sci. Comput., Volume 20, 1336-1348.

Duersch, J.A., and Gu, M., 2017:

Randomized QR with column pivoting. SIAM J. Sci. Comput., Volume 39, C263-C291.

Martinsson, P.G., Quintana-Orti, G., Heavner, N., and Van de Geijn, R., 2017:

Householder QR factorization with randomization for column pivoting (HQRRP). SIAM J. Sci. Comput., Volume 39, C96-C115.

Duersch, J.A., and Gu, M., 2020:

Randomized projection for rank-revealing matrix factorizations and low-rank approximations. SIAM Review, Volume 62, Issue 3, 661-682.

Mary, T., Yamazaki, I., Kurzak, J., Luszczek, P., Tomov, S., and Dongarra, J., 2015:

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs. International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15).

Wu, N., and Xiang, H., 2020:

Randomized QLP decomposition. Linear algebra and its applications, Volume 599, 18-35

Huckaby, D.A., and Chan, T.F., 2003:

On the convergence of Stewart’s QLP algorithm for approximating the SVD. Numer. Algorithms, Volume 32, 287-316.

Huckaby, D.A., and Chan, T.F., 2005:

Stewart’s pivoted QLP decomposition for low-rank matrices Numerical Linear Algebra with Applications, Volume 12, 153-159.

`subroutine rqlp_cmp ( mat, lmat, qmat, pmat, niter, rng_alg, ortho, niter_qrql )`¶

Purpose¶

RQLP_CMP computes a randomized partial QLP factorization of an m-by-n matrix MAT:

MAT = Q * L * P

, where Q is a m-by-krank matrix with orthonormal columns, P is a krank-by-n matrix with orthonormal rows and L is a krank-by-krank lower triangular matrix.

The randomized QLP factorization is only partial, e.g., the subroutine stops the computations when the numbers of columns of Q and of rows of P are equal to a predefined value equals to krank = size( LMAT, 1 ) = size( LMAT, 2 ).

The randomized partial QLP factorization is obtained by a four-step algorithm:

first, the routines computes a partial QB factorization of MAT with the help of a randomized algorithm:

MAT = Q * B

, where Q is a m-by-krank orthonormal matrix, B is a krank-by-n matrix and the product Q*B is a good approximation of MAT according to the spectral or Frobenius norm;

second, a QR factorization with column pivoting of B is computed and Q is post-multiplied by the krank-by-krank orthogonal matrix, O, in this QR factorization of B;

in a third step, a LQ Decomposition of the (permuted) upper trapezoidal factor, R, in this QR decomposition of B is computed.

and, in a final step, NITER_QRQL QR-QL iterations are performed on the L matrix in this LQ decomposition to improve the accuracy of the diagonal elements of L (the so called L-values) as estimates of the singular values of MAT (see references (5), (6) and (7) for details).

This randomized QLP decomposition provides a reasonable and cheap estimate of the Singular Value Decomposition (SVD) of a matrix when this matrix has a low rank or a significant gap in its singular values spectrum.

See Further Details and the cited references for more information.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the real m-by-n matrix to be decomposed.

MAT is not modified by the routine.

LMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, LMAT stores the lower triangular matrix L in the (partial) QLP factorization of MAT.

See Further Details.

The shape of LMAT must verify:

size( LMAT, 1 ) = size( LMAT, 2 ) = krank.

QMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, QMAT stores the first krank columns of the orthogonal matrix Q in the (partial) QLP factorization of MAT.

See Further Details.

The shape of QMAT must verify:

size( QMAT, 1 ) = m.

size( QMAT, 2 ) = krank.

PMAT (OUTPUT) real(stnd), dimension(:,:)

On exit, PMAT stores the first krank rows of the orthogonal matrix P in the (partial) QLP factorization of MAT.

See Further Details.

The shape of PMAT must verify:

size( QMAT, 1 ) = krank.

size( QMAT, 2 ) = n.

NITER (INPUT, OPTIONAL) integer(i4b)

The number of randomized power or subspace iterations performed in the first phase of the randomized QLP algorithm for computing the preliminary randomized QB factorization.

NITER must be positive or null.

By default, 5 randomized power or subspace iterations are performed.

RNG_ALG (INPUT, OPTIONAL) integer(i4b)

On entry, a scalar integer to select the random (uniform) number generator used to build the random gaussian test matrix in the initial randomized partial QB factorization.

The possible values are:

ALG=1 : selects the Marsaglia’s KISS random number generator;

ALG=2 : selects the fast Marsaglia’s KISS random number generator;

ALG=3 : selects the L’Ecuyer’s LFSR113 random number generator;

ALG=4 : selects the Mersenne Twister random number generator;

ALG=5 : selects the maximally equidistributed Mersenne Twister random number generator;

ALG=6 : selects the extended precision of the Marsaglia’s KISS random number generator;

ALG=7 : selects the extended precision of the fast Marsaglia’s KISS random number generator;

ALG=8 : selects the extended precision of the L’Ecuyer’s LFSR113 random number generator.

ALG=9 : selects the extended precision of Mersenne Twister random number generator;

ALG=10 : selects the extended precision of maximally equidistributed Mersenne Twister random number generator;

For other values, the current random number generator and its current state are not changed. Note further, that, on exit, the current random number generator is not reset to its previous value before the call to RQLP_CMP.

See the documentation of subroutine RANDOM_SEED_ in module Random for further information.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, orthonormalization is carried out between each step of the power iterations, to avoid loss of accuracy due to rounding errors. This means that subspace iterations are used instead of power iterations in the QB phase of the algorithm,

ORTHO=false, orthonormalization is not performed.

The default is to use orthonormalization, e.g., ORTHO=true.

NITER_QRQL (INPUT, OPTIONAL) integer(i4b)

The number of QR-QL iterations performed on L after the initial QLP factorization for improving the accuracy of the L-values. NITER_QRQL must be positive or null.

By default, no QR-QL iterations are performed after the initial QLP factorization.

Further Details¶

RQLP_CMP first computes a partial QB factorization of MAT with the help of a randomized algorithm:

MAT = Q * B

, where Q is a m-by-krank orthonormal matrix, B is a krank-by-n matrix and the product Q*B is a good approximation of MAT according to the spectral or Frobenius norm. Here, krank = size( LMAT, 1 ) = size( LMAT, 2 ).

In a second step, RQLP_CMP computes a deterministic QR factorization with column pivoting of the krank-by-n matrix B, to obtain an approximate QR factorization with column pivoting of MAT :

MAT * N = Q * ( B * N ) = Q * ( O * R ) = (Q * O ) * R

, where N is a n-by-n permutation matrix, O is a krank-by-krank orthogonal matrix and R is a krank-by-n upper trapezoidal matrix.

In a third step, RQLP_CMP computes a deterministic LQ factorization of the matrix product:

R * N’ = L * P

This leads to the approximate QLP factorization of MAT:

MAT = ( Q * O ) * L * P

where Q is a m-by-krank matrix with orthonormal columns, O is a krank-by-krank orthogonal matrix, L is a krank-by-krank lower triangular matrix and P is a krank-by-n matrix with orthonormal rows.

In a final step, NITER_QRQL QR-QL iterations can be performed on L to improve the accuracy of the diagonal elements of L (the so called L-values) as estimates of the singular values of MAT (see references (1), (5), (6) and (7) for details) and the orthogonal matrices Q and P are updated accordingly.

The computations are parallelized if OPENMP is used.

In all cases, RQLP_CMP uses efficient blocked algorithms for performing the QB, QR and LQ steps in the randomized QLP algorithm and also in the final QR-QL iterations performed on L.

On exit, RQLP_CMP stores:

the krank-by-krank lower triangular matrix L in the array argument LMAT;

the first krank columns of Q in the array argument QMAT;

and the first krank rows of P in the array argument PMAT.

For further details on the QLP factorization and its use or randomized QB, QR and QLP algorithms or QR-QL iterations, see:

Stewart, G.W., 1999:

The QLP approximation to the singular value decomposition. SIAM J. Sci. Comput., Volume 20, 1336-1348.

Duersch, J.A., and Gu, M., 2017:

Randomized QR with column pivoting. SIAM J. Sci. Comput., Volume 39, C263-C291.

Martinsson, P.G., Quintana-Orti, G., Heavner, N., and Van de Geijn, R., 2017:

Householder QR factorization with randomization for column pivoting (HQRRP). SIAM J. Sci. Comput., Volume 39, C96-C115.

Martinsson, P.G., and Voronin, S., 2016:

A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices. SIAM J. Sci. Comput., 38:5, S485-S507.

Wu, N., and Xiang, H., 2020:

Randomized QLP decomposition. Linear algebra and its applications, Volume 599, 18-35

Huckaby, D.A., and Chan, T.F., 2003::

On the convergence of Stewart’s QLP algorithm for approximating the SVD. Numer. Algorithms, Volume 32, 287-316.

Huckaby, D.A., and Chan, T.F., 2005:

Stewart’s pivoted QLP decomposition for low-rank matrices Numerical Linear Algebra with Applications, Volume 12, 153-159.

`function maxdiag_gkinv_qr ( e, lambda )`¶

Purpose¶

This function computes the index of the element of maximum absolute value in the diagonal entries of

( GK - LAMBDA * I )**(-1)

where GK is a n-by-n symmetric tridiagonal matrix with a zero diagonal, I is the identity matrix and LAMBDA is a scalar.

Arguments¶

E (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the tridiagonal matrix.

LAMBDA (INPUT) real(stnd)

On entry, the eigenvalue or shift used in the QR factorization.

Further Details¶

The diagonal entries of ( GK - LAMBDA * I )**(-1) are computed by means of the QR factorization of ( GK - LAMBDA * I ). For the latter computation, the semiseparable structure of ( GK - LAMBDA * I )**(-1) is used, see the reference (1). Moreover, it is assumed that GK is unreduced, but no check is done in the subroutine to verify this assumption.

This subroutine is adapted from the pseudo-code trace_Tinv given in the reference (1).

For further details, see:

Bini, D.A., Gemignani, L., and Tisseur, F., 2005:

The Ehrlich-Aberth method for the nonsymmetric tridiagonal eigenvalue problem. SIAM J. Matrix Anal. Appl., 27, 153-175.

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

`function maxdiag_gkinv_ldu ( e, lambda )`¶

Purpose¶

This function computes the index of the element of maximum absolute value in the diagonal entries of

( GK - LAMBDA * I )**(-1)

where GK is a n-by-n symmetric tridiagonal matrix with a zero diagonal, I is the identity matrix and LAMBDA is a scalar.

Arguments¶

E (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the tridiagonal matrix.

LAMBDA (INPUT) real(stnd)

On entry, the eigenvalue or shift used.

Further Details¶

The diagonal entries of ( GK - LAMBDA * I )**(-1) are computed by means of two triangular factorizations of ( GK - LAMBDA * I ) of the forms L(+) * D(+) * U(+) and U(-) * D(-) * L(-) where L(+) and L(-) are unit lower bidiagonal, U(+) and U(-) are unit upper bidiagonal, and D(+) and D(-) are diagonal.

It is assumed that GK is unreduced, but no check is done in the subroutine to verify this assumption.

This subroutine is adapted from the references (1) and (2).

For further details, on Fernando’s method for computing eigenvectors of tridiagonal matrices, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

`subroutine gk_qr_cmp ( e, lambda, cs, sn, diag, sup1, sup2, maxdiag_gkinv )`¶

Purpose¶

GK_QR_CMP factorizes the symmetric matrix GK - LAMBDA * I, where GK is an n-by-n symmetric tridiagonal matrix with a zero diagonal, I is the identity matrix and LAMBDA is a scalar, as

GK - LAMBDA * I = Q * R

where Q is an orthogonal matrix represented as the product of n-1 Givens rotations and R is an upper triangular matrix with at most two non-zero super-diagonal elements per column.

The parameter LAMBDA is included in the routine so that GK_QR_CMP may be used to obtain eigenvectors of GK by inverse iteration.

The subroutine also computes the index of the entry of maximum absolute value in the diagonal of ( GK - LAMBDA * I )**(-1), which provides a good initial approximation to start the inverse iteration process for computing the eigenvector associated with the eigenvalue LAMBDA, see the references (1), (2) and (3) for further details.

Arguments¶

E (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the tridiagonal matrix.

LAMBDA (INPUT) real(stnd)

On entry, the eigenvalue or shift used in the QR factorization.

CS (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations for the QR factorization of GK - LAMBDA * I.

The size of CS must be size( CS ) = size( E ) = n - 1.

SN (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations for the QR factorization of GK - LAMBDA * I.

The size of SN must be size( SN ) = size( E ) = n - 1.

DIAG (OUTPUT) real(stnd), dimension(:)

On exit, DIAG(:) contains the n diagonal elements of the upper triangular matrix R of the QR factorization of GK - LAMBDA * I.

The size of DIAG must verify: size( DIAG ) = size( E ) + 1 = n .

SUP1 (OUTPUT) real(stnd), dimension(:)

On exit, SUP1(:n-1) contains the n-1 superdiagonal elements of the upper triangular matrix R of the QR factorization of GK - LAMBDA * I, SUP1(n) is arbitrary .

The size of SUP1 must verify: size( SUP1 ) = size( E ) + 1 = n .

SUP2 (OUTPUT) real(stnd), dimension(:)

On exit, SUP2(:n-2) contains the n-2 second superdiagonal elements of the upper triangular matrix R of the QR factorization of GK - LAMBDA * I, SUP2(n-1:n) is arbitrary .

The size of SUP2 must verify: size( SUP2 ) = size( E ) + 1 = n .

MAXDIAG_GKINV (OUPTPUT) integer(i4b)

On exit, MAXDIAG_GKINV is the index of the entry of maximum modulus in the main diagonal of ( GK - LAMBDA * I )**(-1).

Further Details¶

The QR factorization of ( GK - LAMBDA * I ) is obtained by means of n-1 unitary Givens rotations.

The diagonal entries of ( GK - LAMBDA * I )**(-1) are computed by means of this QR factorization of ( GK - LAMBDA * I ). For the latter computation, the semiseparable structure of ( GK - LAMBDA * I )**(-1) is used, see the reference (1). Moreover, it is assumed that GK is unreduced for computing the index of the entry of maximum absolute value in the diagonal of ( GK - LAMBDA * I )**(-1), but no check is done in the subroutine to verify this assumption.

For further details, see:

Bini, D.A., Gemignani, L., and Tisseur, F., 2005:

The Ehrlich-Aberth method for the nonsymmetric tridiagonal eigenvalue problem. SIAM J. Matrix Anal. Appl., 27, 153-175.

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

`subroutine bd_inviter ( upper, d, e, s, leftvec, rightvec, failure, maxiter, scaling, initvec )`¶

Purpose¶

BD_INVITER computes the left and right singular vectors of a real n-by-n bidiagonal matrix BD corresponding to a specified singular value, using Fernando’s method and inverse iteration on the tridiagonal Golub-Kahan (TGK) form of the bidiagonal matrix BD.

Arguments¶

UPPER (INPUT) logical(lgl)

On entry, if:

UPPER = true : BD is upper bidiagonal ;

UPPER = false : BD is lower bidiagonal.

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD. E(1) is arbitrary.

The size of E must verify: size( E ) = size( D ) = n .

S (INPUT) real(stnd)

On entry, the selected singular value of the bidiagonal matrix BD. The singular value must be positive or zero.

LEFTVEC (OUTPUT) real(stnd), dimension(:)

On exit, the computed left singular vector.

The shape of LEFTVEC must verify: size( LEFTVEC ) = size( D ) = n .

RIGHTVEC (OUTPUT) real(stnd), dimension(:)

On exit, the computed right singular vector.

The shape of RIGHTVEC must verify: size( RIGHTVEC ) = size( D ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = FALSE : indicates successful exit,

FAILURE = TRUE : indicates that some singular vectors failed to converge in MAXITER iterations.

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed in the subroutine. By default, 2 inverse iterations are performed.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix BD is scaled before computing the singular vector;

SCALING=false, the bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, a Fernando vector is used to start the inverse iteration process for computing the singular vectors of the bidiagonal matrix BD (e.g. the eigenvector of the associated tridiagonal Golub-Kahan matrix);

INITVEC=false, a random uniform starting vector is used.

The default is to use a Fernando starting vector if the Golub-Kahan form of the input bidiagonal matrix is unreduced, and a random uniform starting vector otherwise.

Further Details¶

A first estimate of the singular vectors is computed by the Fernando method applied to the tridiagonal Golub-Kahan matrix associated with the bidiagonal matrix BD (see the reference (1) for details) if this Golub-Kahan form of the input bidiagonal matrix is unreduced. Otherwise, a random start is used as a first estimate of the singular vectors as in the standard inverse-iteration algorithm.

The singular vectors are then computed or refined using inverse iteration on the tridiagonal Golub-Kahan matrix.

For further details, on Fernando’s method for computing eigenvectors of tridiagonal matrices or inverse iteration, see

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Bini, D.A., Gemignani, L., and Tisseur, F., 2005:

The Ehrlich-Aberth method for the nonsymmetric tridiagonal eigenvalue problem. SIAM J. Matrix Anal. Appl., 27, 153-175.

`subroutine bd_inviter ( upper, d, e, s, leftvec, rightvec, failure, maxiter, ortho, backward_sweep, scaling, initvec )`¶

Purpose¶

BD_INVITER computes the left and right singular vectors of a real n-by-n bidiagonal matrix BD corresponding to specified singular values, using Fernando’s method and inverse iteration on the tridiagonal Golub-Kahan (TGK) form of the bidiagonal matrix BD.

Arguments¶

UPPER (INPUT) logical(lgl)

On entry, if:

UPPER = true : BD is upper bidiagonal ;

UPPER = false : BD is lower bidiagonal.

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD. E(1) is arbitrary.

The size of E must verify: size( E ) = size( D ) = n .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix BD. The singular values must be given in decreasing order and must be positive or zero.

The size of S must verify: size( S ) <= size( D ) = n .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( D ) = n ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( D ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = FALSE : indicates successful exit,

FAILURE = TRUE : indicates that some singular vectors failed to converge in MAXITER iterations.

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed in the subroutine. By default, 2 inverse iterations are performed for all the singular vectors.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, all the singular vectors are orthogonalized by the Modified Gram-Schmidt or QR algorithm;

ORTHO=false, the singular vectors are not orthogonalized by the Modified Gram-Schmidt or QR algorithm.

The default is to orthogonalize the singular vectors only for the singular values, which are not well-separated.

BACKWARD_SWEEP (INPUT, OPTIONAL) logical(lgl)

On entry, if:

BACKWARD_SWEEP=true and the singular vectors are orthogonalized by the modified Gram-Schmidt algorithm, a backward sweep of the modified Gram-Schmidt algorithm is also performed;

BACKWARD_SWEEP=false, a backward sweep is not performed.

The default is not to perform a backward sweep of the modified Gram-Schmidt algorithm.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix BD is scaled before computing the singular vectors;

SCALING=false, the bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, Fernando vectors are used to start the inverse iteration process for computing the singular vectors of the bidiagonal matrix BD (e.g. the eigenvectors of the associated Golub-Kahan tridiagonal matrix);

INITVEC=false, random uniform starting vectors are used.

The default is to use Fernando starting vectors if the singular values are well-separated and the Golub-Kahan form of the input bidiagonal matrix is unreduced, and random uniform starting vectors otherwise.

Further Details¶

A first estimate of the singular vectors is computed by the Fernando method applied to the tridiagonal Golub-Kahan matrix associated with the bidiagonal matrix BD (see the reference (1) for details) for the singular values which are well-separated and if the Golub-Kahan form of the input bidiagonal matrix is unreduced. For the other singular values, a random start is used as a first estimate of the singular vectors as in the standard inverse-iteration algorithm.

The singular vectors are then computed or refined using inverse iteration on the tridiagonal Golub-Kahan matrix for all the singular values at one step.

By default, the singular vectors are then orthogonalized by the Modified Gram-Schmidt or QR algorithm only if the singular values are not well-separated.

The computation of the singular vectors is parallelized if OPENMP is used.

BD_INVITER may fail if clusters of tiny singular values are present in parameter S.

For further details, on Fernando’s method for computing eigenvectors of tridiagonal matrices or inverse iteration, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

Bini, D.A., Gemignani, L., and Tisseur, F., 2005:

The Ehrlich-Aberth method for the nonsymmetric tridiagonal eigenvalue problem. SIAM J. Matrix Anal. Appl., 27, 153-175.

`subroutine bd_inviter2 ( mat, tauq, taup, d, e, s, leftvec, rightvec, failure, maxiter, ortho, backward_sweep, scaling, initvec )`¶

Purpose¶

BD_INVITER2 computes the left and right singular vectors of a full real m-by-n matrix MAT corresponding to specified singular values, using inverse iteration.

It is required that the original matrix MAT has been reduced to upper or lower bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal. This can be done with a call to BD_CMP with parameters TAUQ and TAUP, before calling BD_SVD, BD_SINGVAL or BD_SINGVAL2 subroutines for computing singular values and, finally, BD_INVITER2 for computing selected singular vectors.

Alternatively and more simply, the user can call SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines, with parameters TAUQ and TAUP present, to compute the bidiagonal reduction and singular values in one step and, finally, call BD_INVITER2 for computing all or selected singular vectors of MAT.

If m >= n, BD is upper bidiagonal and if m < n, BD is lower bidiagonal.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the original m-by-n matrix after reduction by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines. MAT must contains the vectors which define the elementary reflectors H(i) and G(i) whose products determine the matrices Q and P, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines when the arguments TAUQ and TAUP are present in the call to these subroutines. MAT must be specified as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 and is not modified by the routine.

TAUQ (INPUT) real(stnd), dimension(:)

TAUQ(i) must contain the scalar factor of the elementary reflector H(i) which determines Q, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 in their array argument TAUQ.

The size of TAUQ must verify: size( TAUQ ) = min( size(MAT,1) , size(MAT,2) ) .

TAUP (INPUT) real(stnd), dimension(:)

TAUP(i) must contain the scalar factor of the elementary reflector G(i), which determines P, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 in their array argument TAUP.

The size of TAUP must verify: size( TAUP ) = min( size(MAT,1) , size(MAT,2) ) .

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2.

The size of D must verify: size( D ) = min( size(MAT,1) , size(MAT,2) ) .

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2:

if m >= n, E(i) = BD(i-1,i) for i = 2,3,…,n;

if m < n, E(i) = BD(i,i-1) for i = 2,3,…,m.

E(1) is arbitrary.

The size of E must verify: size( E ) = min( size(MAT,1) , size(MAT,2) ) .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = FALSE : indicates successful exit,

FAILURE = TRUE : indicates that some singular vectors of BD failed to converge in MAXITER iterations.

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed in the subroutine.

By default, 2 inverse iterations are performed for all the singular vectors.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, all the singular vectors of the bidiagonal matrix BD are orthogonalized by the Modified Gram-Schmidt or QR algorithm;

ORTHO=false, the singular vectors of the bidiagonal matrix BD are not orthogonalized by the Modified Gram-Schmidt or QR algorithm.

The default is to orthogonalize the singular vectors only for the singular values, which are not well-separated.

BACKWARD_SWEEP (INPUT, OPTIONAL) logical(lgl)

On entry, if:

BACKWARD_SWEEP=true and the singular vectors of the bidiagonal matrix BD are orthogonalized by the modified Gram-Schmidt algorithm, a backward sweep of the modified Gram-Schmidt algorithm is also performed;

BACKWARD_SWEEP=false, a backward sweep is not performed.

The default is not to perform a backward sweep of the modified Gram-Schmidt algorithm.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix BD is scaled before computing the singular vectors;

SCALING=false, the bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, Fernando vectors are used to start the inverse iteration process for computing the singular vectors of the bidiagonal matrix BD (e.g. the eigenvectors of the associated Golub-Kahan tridiagonal matrix);

INITVEC=false, random uniform starting vectors are used.

The default is to use Fernando starting vectors if the singular values are well-separated and the Golub-Kahan form of the input bidiagonal matrix is unreduced, and random uniform starting vectors otherwise.

Further Details¶

A first estimate of the singular vectors is computed by the Fernando method applied to the tridiagonal Golub-Kahan matrix associated with the bidiagonal matrix BD (see the reference (1) for details) for the singular values which are well-separated and if the Golub-Kahan form of the input bidiagonal matrix is unreduced. For the other singular values, a random start is used as a first estimate of the singular vectors as in the standard inverse-iteration algorithm.

The singular vectors of BD are then computed or refined using inverse iteration on the tridiagonal Golub-Kahan matrix for all the singular values at one step.

By default, the singular vectors of BD are then orthogonalized by the Modified Gram-Schmidt or QR algorithm only if the singular values are not well-separated.

The singular vectors of MAT are finally computed by a blocked back-transformation algorithm.

The computation of the singular vectors of BD and the blocked back-transformation algorithm to find the singular vectors of MAT are parallelized if OPENMP is used.

BD_INVITER2 may fail if some singular values specified in parameter S are nearly identical for some pathological matrices.

For further details on Fernando method for computing eigenvectors of tridiagonal matrices, the blocked back-transformation algorithm or inverse iteration, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine bd_inviter2 ( mat, p, d, e, s, leftvec, rightvec, failure, maxiter, ortho, backward_sweep, scaling, initvec, tol_reortho )`¶

Purpose¶

BD_INVITER2 computes the left and right singular vectors of a full real m-by-n matrix MAT with m>=n corresponding to specified singular values, using inverse iteration.

It is required that the original matrix MAT has been reduced to upper bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal. This can be done with a call to BD_CMP2 (or a call to BD_CMP followed by a call to ORTHO_GEN_BD), before calling BD_SVD, BD_SINGVAL or BD_SINGVAL2 subroutines for computing singular values and BD_INVITER2 for computing selected singular vectors.

Alternatively and more simply, the user can call SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines, with parameter P present, to compute the bidiagonal reduction and singular values in one step and, finally, call BD_INVITER2 for computing all or selected singular vectors of MAT.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n orthogonal matrix Q after reduction by BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 (or by BD_CMP and ORTHO_GEN_BD). MAT is not modified by the routine.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

P (INPUT) real(stnd), dimension(:,:)

On entry, the n-by-n orthogonal matrix P after reduction by BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 (or by BD_CMP and ORTHO_GEN_BD). If P has been computed by BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4, P can be stored in factored form or not. Both cases are handled by the subroutine. P is not modified by the routine.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = size( MAT, 2 ) = n .

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD as returned by BD_CMP, BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL4 subroutines.

The size of D must verify: size( D ) = size( MAT, 2 ) = n .

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD as returned by BD_CMP, BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL4 subroutines:

E(i) = BD(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must verify: size( E ) = size( MAT, 2 ) = n .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= size( MAT, 2 ) = n .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = FALSE : indicates successful exit,

FAILURE = TRUE : indicates that some singular vectors of BD failed to converge in MAXITER iterations.

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed in the subroutine.

By default, 2 inverse iterations are performed for all the singular vectors.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, all the singular vectors of the bidiagonal matrix BD are orthogonalized by the Modified Gram-Schmidt or QR algorithm;

ORTHO=false, the singular vectors of the bidiagonal matrix BD are not orthogonalized by the Modified Gram-Schmidt or QR algorithm.

The default is to orthogonalize the singular vectors only for the singular values, which are not well-separated.

BACKWARD_SWEEP (INPUT, OPTIONAL) logical(lgl)

On entry, if:

BACKWARD_SWEEP=true and the singular vectors of the bidiagonal matrix BD are orthogonalized by the modified Gram-Schmidt algorithm, a backward sweep of the modified Gram-Schmidt algorithm is also performed;

BACKWARD_SWEEP=false, a backward sweep is not performed.

The default is not to perform a backward sweep of the modified Gram-Schmidt algorithm.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix BD is scaled before computing the singular vectors;

SCALING=false, the bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, Fernando vectors are used to start the inverse iteration process for computing the singular vectors of the bidiagonal matrix BD (e.g. the eigenvectors of the associated Golub-Kahan tridiagonal matrix);

INITVEC=false, random uniform starting vectors are used.

The default is to use Fernando starting vectors if the singular values are well-separated and the Golub-Kahan form of the input bidiagonal matrix is unreduced, and random uniform starting vectors otherwise.

TOL_REORTHO (INPUT, OPTIONAL) real(stnd)

On entry, TOL_REORTHO is used to determine if the left singular vectors stored in LEFTVEC must be reortogonalized on exit in order to correct for the loss of orthogonality in the Ralha-Barlow one-sided bidiagonal reduction algorithm if MAT is nearly deficient. If one of the singular values, S(i), verifies the condition

S(i) <= TOL_REORTHO * S(1)

all the computed left singular vectors are reorthogonalized with a QR factorization. If S(1) is the largest singular value of MAT, this condition leads to the assertion that the rank of MAT is less than size(S) and is thus a nearly singular matrix if TOL_REORTHO is a small positive value of the order of the machine epsilon.

TOL_REORTHO must be greater or equal to zero and less than or equal to one. If TOL_REORTHO = 0. is used, the left singular vectors are reorthogonalized only if some singular values are almost zero. On the other hand, If TOL_REORTHO = 1. is used, the left singular vectors are always reorthogonalized. If TOL_REORTHO is specified as less than zero or greater than one, the default value is used.

The default value is the value of the module parameter tol_reortho_def if size( S ) = n and tol_reortho_partial_def otherwise.

Further Details¶

A first estimate of the singular vectors is computed by the Fernando method applied to the tridiagonal Golub-Kahan matrix associated with the bidiagonal matrix BD (see the reference (1) for details) for the singular values which are well-separated and if the Golub-Kahan form of the input bidiagonal matrix is unreduced. For the other singular values, a random start is used as a first estimate of the singular vectors as in the standard inverse-iteration algorithm.

The singular vectors of BD are then computed or refined using inverse iteration on the tridiagonal Golub-Kahan matrix for all the singular values at one step.

By default, the singular vectors of BD are then orthogonalized by the Modified Gram-Schmidt or QR algorithm only if the singular values are not well-separated.

The singular vectors of MAT are finally computed by a blocked back-transformation algorithm.

The computation of the singular vectors of BD and the blocked back-transformation algorithm to find the singular vectors of MAT are parallelized if OPENMP is used.

BD_INVITER2 may fail if some singular values specified in parameter S are nearly identical for some pathological matrices.

For further details on Fernando method for computing eigenvectors of tridiagonal matrices, the blocked back-transformation algorithm or inverse iteration, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine bd_inviter2 ( mat, tauq, taup, rlmat, d, e, s, leftvec, rightvec, failure, tauo, maxiter, ortho, backward_sweep, scaling, initvec )`¶

Purpose¶

BD_INVITER2 computes the left and right singular vectors of a full real m-by-n matrix MAT corresponding to specified singular values, using inverse iteration.

It is required that the original matrix MAT has been reduced to upper bidiagonal form BD by a two-step algorithm as performed by BD_CMP subroutine with parameters TAUQ, TAUP, RLMAT, and eventually TAUO, or more simply by SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines with the same arguments:

If m >= n, a QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular. In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation:

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

If m < n, an LQ factorization of the real m-by-n matrix MAT is first computed

MAT = L * O

where O is orthogonal and L is lower triangular. In a second step, the m-by-m lower triangular matrix L is reduced to upper bidiagonal form BD by an orthogonal transformation:

Q’ * L * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

After this call to BD_CMP with parameters TAUQ, TAUP, RLMAT, and eventually TAUO, the user can call BD_SVD, BD_SINGVAL or BD_SINGVAL2 subroutines for computing singular values of BD and, finally, BD_INVITER2 for computing all or selected singular vectors of MAT.

Alternatively and more simply, the user can call SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines, with parameters TAUQ, TAUP, RLMAT, and eventually TAUO, to perform the two-stage bidiagonal reduction and get singular values in one step and, finally, call BD_INVITER2 for computing all or selected singular vectors of MAT.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the original m-by-n matrix after a two-stage bidiagonal reduction by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines (e.g. when the matrix argument RLMAT is present in the call to these subroutines). The precise content of MAT is determined by the presence or absence of the optional argument TAUO:

if the optional argument TAUO is absent, it is assumed that the first n columns (if m>=n) or the first m rows (if m<n) of the orthogonal matrix O is stored explicitly in the argument MAT on entry;

if the optional argument TAUO is present, it is assumed that the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on entry. In this case, MAT must contains the vectors which define the elementary reflectors W(i) whose products determine the matrix O, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines when the matrix argument RLMAT is present in the call to these subroutines.

In both cases, MAT must be specified as returned by BD_CMP and is not modified by the routine.

TAUQ (INPUT) real(stnd), dimension(:)

TAUQ(i) must contain the scalar factor of the elementary reflector H(i) which determines Q, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 in their array argument TAUQ.

The size of TAUQ must verify: size( TAUQ ) = min( size(MAT,1) , size(MAT,2) ) .

TAUP (INPUT) real(stnd), dimension(:)

TAUP(i) must contain the scalar factor of the elementary reflector G(i), which determines P, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 in their array argument TAUP.

The size of TAUP must verify: size( TAUP ) = min( size(MAT,1) , size(MAT,2) ) .

RLMAT (INPUT) real(stnd), dimension(:,:)

On entry, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors. RLMAT must be specified as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 and is not modified by the routine.

The shape of RLMAT must verify: size( RLMAT, 1 ) = size( RLMAT, 2 ) = min( size(MAT,1) , size(MAT,2) ) .

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the upper bidiagonal matrix BD as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2.

The size of D must verify: size( D ) = min( size(MAT,1) , size(MAT,2) ) .

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the upper bidiagonal matrix BD as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2:

E(i) = BD(i-1,i) for i = 2,3,…,min(m,n);

E(1) is arbitrary.

The size of E must verify: size( E ) = min( size(MAT,1) , size(MAT,2) ) .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the upper bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = FALSE : indicates successful exit ,

FAILURE = TRUE : indicates that some singular vectors of BD failed to converge in MAXITER iterations .

TAUO (INPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors W(i), which represent the orthogonal matrix O of the QR or LQ decomposition of MAT.

If the optional argument TAUO is present, it is assumed that the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on entry.

If the optional argument TAUO is absent, it is assumed that the orthogonal matrix O is stored explicitly in the argument MAT on entry.

If the optional argument TAUO has been specified in the initial call to the BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines, this optional argument TAUO must also be specified in the call to BD_INVITER2 for computing singular vectors, otherwise the results will be incorrect.

See description of the argument MAT in the description of the BD_CMP subroutine, when the argument RLMAT is also present, for further details.

The size of TAUO must be min( size(MAT,1) , size(MAT,2) ).

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed in the subroutine.

By default, 2 inverse iterations are performed for all the singular vectors.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, all the singular vectors of the bidiagonal matrix BD are orthogonalized by the Modified Gram-Schmidt or QR algorithm;

ORTHO=false, the singular vectors of the bidiagonal matrix BD are not orthogonalized by the Modified Gram-Schmidt or QR algorithm.

The default is to orthogonalize the singular vectors only for the singular values, which are not well-separated.

BACKWARD_SWEEP (INPUT, OPTIONAL) logical(lgl)

On entry, if:

BACKWARD_SWEEP=true and the singular vectors of the bidiagonal matrix BD are orthogonalized by the modified Gram-Schmidt algorithm, a backward sweep of the modified Gram-Schmidt algorithm is also performed;

BACKWARD_SWEEP=false, a backward sweep is not performed.

The default is not to perform a backward sweep of the modified Gram-Schmidt algorithm.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix BD is scaled before computing the singular vectors;

SCALING=false, the bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, Fernando vectors are used to start the inverse iteration process for computing the singular vectors of the bidiagonal matrix BD (e.g. the eigenvectors of the associated Golub-Kahan tridiagonal matrix);

INITVEC=false, random uniform starting vectors are used.

The default is to use Fernando starting vectors if the singular values are well-separated and the Golub-Kahan form of the input bidiagonal matrix is unreduced, and random uniform starting vectors otherwise.

Further Details¶

A first estimate of the singular vectors is computed by the Fernando method applied to the tridiagonal Golub-Kahan matrix associated with the bidiagonal matrix BD (see the reference (1) for details) for the singular values which are well-separated and if the Golub-Kahan form of the input bidiagonal matrix is unreduced. For the other singular values, a random start is used as a first estimate of the singular vectors as in the standard inverse-iteration algorithm.

The singular vectors of BD are then computed or refined using inverse iteration on the tridiagonal Golub-Kahan matrix for all the singular values at one step.

By default, the singular vectors of BD are then orthogonalized by the Modified Gram-Schmidt or QR algorithm only if the singular values are not well-separated.

The singular vectors of MAT are finally computed by a blocked back-transformation algorithm.

The computation of the singular vectors of BD and the blocked back-transformation algorithm to find the singular vectors of MAT are parallelized if OPENMP is used.

BD_INVITER2 may fail if some singular values specified in parameter S are nearly identical for some pathological matrices.

For further details on Fernando method for computing eigenvectors of tridiagonal matrices, the blocked back-transformation algorithm or inverse iteration, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine bd_inviter2 ( mat, rmat, p, d, e, s, leftvec, rightvec, failure, tauo, maxiter, ortho, backward_sweep, scaling, initvec, tol_reortho )`¶

Purpose¶

BD_INVITER2 computes all or selected left and right singular vectors of a full real m-by-n matrix MAT with m>=n corresponding to specified singular values, using inverse iteration.

It is required that the original matrix MAT has been reduced to upper bidiagonal form BD by a two-step algorithm as performed by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines with parameters P, RMAT, and eventually TAUO:

A QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular. In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix. Subroutines SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 compute O, Q, P, BD and also all or some of the singular values of R, which are also the singular values of MAT. Using this two-step factorization, BD_INVITER2 computes all or selected left and right singular vectors of R and apply to them a back-transformation algorithm to obtain the corresponding left and right singular vectors of MAT.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the original m-by-n matrix after the two-step bidiagonal reduction by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines with arguments P, RMAT, and eventually TAUO. The precise content of MAT is determined by the presence or absence of the optional argument TAUO:

if the optional argument TAUO is absent, it is assumed that the first n columns of the orthogonal matrix O is stored explicitly in the argument MAT on entry,

if the optional argument TAUO is present, it is assumed that the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on entry. In this case, MAT must contains the vectors which define the elementary reflectors W(i) whose products determine the matrix O, as returned by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines when the matrix argument RLMAT is present in the call to these subroutines.

In both cases, MAT must be specified as returned by these subroutines and is not modified by the routine. See the description of the argument TAUO below for further details.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

RMAT (INPUT) real(stnd), dimension(:,:)

On entry, the n-by-n orthogonal matrix Q after reduction by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines. RMAT must be specified as returned by these subroutines and is not modified by the routine.

The shape of RMAT must verify: size( RMAT, 1 ) = size( RMAT, 2 ) = size( MAT, 2 ) = n.

P (INPUT) real(stnd), dimension(:,:)

On entry, the n-by-n orthogonal matrix P after reduction by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines. P can be stored in factored form or not. Both cases are handled by the subroutine and P is not modified by the routine.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = size( MAT, 2 ) = n .

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD as returned by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4.

The size of D must verify: size( D ) = size( MAT, 2 ) = n .

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD as returned by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4:

E(i) = BD(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must verify: size( E ) = size( MAT, 2 ) = n .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= size( MAT, 2 ) = n .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = FALSE : indicates successful exit,

FAILURE = TRUE : indicates that some singular vectors of BD failed to converge in MAXITER iterations.

TAUO (INPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors W(i), which represent the orthogonal matrix O of the QR decomposition of MAT as returned by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4.

If the optional argument TAUO is present, it is assumed that the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on entry.

If the optional argument TAUO is absent, it is assumed that the orthogonal matrix O is stored explicitly in the argument MAT on entry.

If the optional argument TAUO has been specified in the initial call to the SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines, this optional argument TAUO must also be specified in the call to BD_INVITER2, otherwise the results will be incorrect.

See description of the argument MAT in the description of the SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines, when the argument RMAT is also present, for further details.

The size of TAUO must be size(MAT,2) = n .

MAXITER (INPUT, OPTIONAL) integer(i4b)

The number of inverse iterations performed in the subroutine.

By default, 2 inverse iterations are performed for all the singular vectors.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, all the singular vectors of the bidiagonal matrix BD are orthogonalized by the Modified Gram-Schmidt or QR algorithm;

ORTHO=false, the singular vectors of the bidiagonal matrix BD are not orthogonalized by the Modified Gram-Schmidt or QR algorithm.

The default is to orthogonalize the singular vectors only for the singular values, which are not well-separated.

BACKWARD_SWEEP (INPUT, OPTIONAL) logical(lgl)

On entry, if:

BACKWARD_SWEEP=true and the singular vectors of the bidiagonal matrix BD are orthogonalized by the modified Gram-Schmidt algorithm, a backward sweep of the modified Gram-Schmidt algorithm is also performed;

BACKWARD_SWEEP=false, a backward sweep is not performed.

The default is not to perform a backward sweep of the modified Gram-Schmidt algorithm.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix BD is scaled before computing the singular vectors;

SCALING=false, the bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INITVEC (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INITVEC=true, Fernando vectors are used to start the inverse iteration process for computing the singular vectors of the bidiagonal matrix BD (e.g. the eigenvectors of the associated Golub-Kahan tridiagonal matrix);

INITVEC=false, random uniform starting vectors are used.

The default is to use Fernando starting vectors if the singular values are well-separated and the Golub-Kahan form of the input bidiagonal matrix is unreduced, and random uniform starting vectors otherwise.

TOL_REORTHO (INPUT, OPTIONAL) real(stnd)

On entry, TOL_REORTHO is used to determine if the left singular vectors stored in LEFTVEC must be reortogonalized on exit in order to correct for the loss of orthogonality in the Ralha-Barlow one-sided bidiagonal reduction algorithm if MAT is nearly deficient. If one of the singular values, S(i), verifies the condition

S(i) <= TOL_REORTHO * S(1)

all the computed left singular vectors are reorthogonalized with a QR factorization. If S(1) is the largest singular value of MAT, this condition leads to the assertion that the rank of MAT is less than size(S) and is thus a nearly singular matrix if TOL_REORTHO is a small positive value of the order of the machine epsilon.

TOL_REORTHO must be greater or equal to zero and less than or equal to one. If TOL_REORTHO = 0. is used, the left singular vectors are reorthogonalized only if some singular values are almost zero. On the other hand, If TOL_REORTHO = 1. is used, the left singular vectors are always reorthogonalized. If TOL_REORTHO is specified as less than zero or greater than one, the default value is used.

The default value is the value of the module parameter tol_reortho_def if size( S ) = n and tol_reortho_partial_def otherwise.

Further Details¶

A first estimate of the singular vectors is computed by the Fernando method applied to the tridiagonal Golub-Kahan matrix associated with the bidiagonal matrix BD (see the reference (1) for details) for the singular values which are well-separated and if the Golub-Kahan form of the input bidiagonal matrix is unreduced. For the other singular values, a random start is used as a first estimate of the singular vectors as in the standard inverse-iteration algorithm.

The singular vectors of BD are then computed or refined using inverse iteration on the tridiagonal Golub-Kahan matrix for all the singular values at one step.

By default, the singular vectors of BD are then orthogonalized by the Modified Gram-Schmidt or QR algorithm only if the singular values are not well-separated.

The singular vectors of MAT are finally computed by a blocked back-transformation algorithm.

The computation of the singular vectors of BD and the blocked back-transformation algorithm to find the singular vectors of MAT are parallelized if OPENMP is used.

BD_INVITER2 may fail if some singular values specified in parameter S are nearly identical for some pathological matrices.

For further details on Fernando method for computing eigenvectors of tridiagonal matrices, the blocked back-transformation algorithm or inverse iteration, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Parlett, B.N., and Dhillon, I.S., 1997:

Fernando’s solution to Wilkinson’s problem: An application of double factorization. Linear Algebra and its Appl., 267, pp.247-279.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine upper_bd_dsqd2 ( q2, e2, shift, flip, d )`¶

Purpose¶

UPPER_BD_DSQD2 computes:

the L * D * L’ factorization of the matrix BD’ * BD - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD * BD’ - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the stationary QD algorithm of Rutishauser is used to compute the factorization from the squared elements of the bidiagonal matrix BD (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization.

Arguments¶

Q2 (INPUT) real(stnd), dimension(:)

On entry, Q2 contains the squared diagonal elements of the bidiagonal matrix BD.

E2 (INPUT) real(stnd), dimension(:)

On entry, the n-1 squared off-diagonal elements of the bidiagonal matrix BD.

The size of E2 must be size( E2 ) = size( Q2 ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD’ * BD - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD * BD’ - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( Q2 ).

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dpqd2 ( q2, e2, shift, flip, d )`¶

Purpose¶

UPPER_BD_DPQD2 computes:

the L * D * L’ factorization of the matrix BD * BD’ - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD’ * BD - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the progressive QD algorithm of Rutishauser is used to compute the factorization from the squared elements of the bidiagonal matrix BD (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization.

Arguments¶

Q2 (INPUT) real(stnd), dimension(:)

On entry, Q2 contains the squared diagonal elements of the bidiagonal matrix BD.

E2 (INPUT) real(stnd), dimension(:)

On entry, the n-1 squared off-diagonal elements of the bidiagonal matrix BD.

The size of E2 must be size( E2 ) = size( Q2 ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD * BD’ - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD’ * BD - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( Q2 ).

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dsqd2 ( q2, e2, shift, flip, d, t )`¶

Purpose¶

UPPER_BD_DSQD2 computes:

the L * D * L’ factorization of the matrix BD’ * BD - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD * BD’ - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the stationary QD algorithm of Rutishauser is used to compute the factorization from the squared elements of the bidiagonal matrix BD (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization and the auxiliary variable T in the differential form of the stationary QD algorithm.

Arguments¶

Q2 (INPUT) real(stnd), dimension(:)

On entry, Q2 contains the squared diagonal elements of the bidiagonal matrix BD.

E2 (INPUT) real(stnd), dimension(:)

On entry, the n-1 squared off-diagonal elements of the bidiagonal matrix BD.

The size of E2 must be size( E2 ) = size( Q2 ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD’ * BD - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD * BD’ - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( Q2 ).

T (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the auxiliary values T(i) in the differential form of the stationary QD algorithm.

The size of T must be size( T ) = size( D ) = size( Q2 ).

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dpqd2 ( q2, e2, shift, flip, d, s )`¶

Purpose¶

UPPER_BD_DPQD2 computes:

the L * D * L’ factorization of the matrix BD * BD’ - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD’ * BD - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the progressive QD algorithm of Rutishauser is used to compute the factorization from the squared elements of the bidiagonal matrix BD (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization and the auxiliary variable S in the differential form of the progressive QD algorithm.

Arguments¶

Q2 (INPUT) real(stnd), dimension(:)

On entry, Q2 contains the squared diagonal elements of the bidiagonal matrix BD.

E2 (INPUT) real(stnd), dimension(:)

On entry, the n-1 squared off-diagonal elements of the bidiagonal matrix BD.

The size of E2 must be size( E2 ) = size( Q2 ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD * BD’ - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD’ * BD - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( Q2 ).

S (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the auxiliary values S(i) in the differential form of the progressive QD algorithm.

The size of S must be size( S ) = size( D ) = size( Q2 ).

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dsqd ( a, b, shift, flip, d )`¶

Purpose¶

UPPER_BD_DSQD computes:

the L * D * L’ factorization of the matrix BD’ * BD - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD * BD’ - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the stationary QD algorithm of Rutishauser is used to compute the factorization (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization.

Arguments¶

A (INPUT) real(stnd), dimension(:)

On entry, A contains the diagonal elements of the bidiagonal matrix BD.

B (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

The size of B must be size( B ) = size( A ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD’ * BD - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD * BD’ - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( A ).

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dpqd ( a, b, shift, flip, d )`¶

Purpose¶

UPPER_BD_DPQD computes:

the L * D * L’ factorization of the matrix BD * BD’ - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD’ * BD - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the progressive QD algorithm of Rutishauser is used to compute the factorization (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization.

Arguments¶

A (INPUT) real(stnd), dimension(:)

On entry, A contains the diagonal elements of the bidiagonal matrix BD.

B (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

The size of B must be size( B ) = size( A ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD * BD’ - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD’ * BD - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( A ).

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dsqd ( a, b, shift, flip, d, t )`¶

Purpose¶

UPPER_BD_DSQD computes:

the L * D * L’ factorization of the matrix BD’ * BD - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD * BD’ - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the stationary QD algorithm of Rutishauser is used to compute the factorization (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization and the auxiliary variable T in the differential form of the stationary QD algorithm.

Arguments¶

A (INPUT) real(stnd), dimension(:)

On entry, A contains the diagonal elements of the bidiagonal matrix BD.

B (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

The size of B must be size( B ) = size( A ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD’ * BD - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD * BD’ - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( A ).

T (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the auxiliary values T(i) in the differential form of the stationary QD algorithm.

The size of T must be size( T ) = size( D ) = size( A ).

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dpqd ( a, b, shift, flip, d, s )`¶

Purpose¶

UPPER_BD_DPQD computes:

the L * D * L’ factorization of the matrix BD * BD’ - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD’ * BD - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the progressive QD algorithm of Rutishauser is used to compute the factorization (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization and the auxiliary variable S in the differential form of the progressive QD algorithm.

Arguments¶

A (INPUT) real(stnd), dimension(:)

On entry, A contains the diagonal elements of the bidiagonal matrix BD.

B (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

The size of B must be size( B ) = size( A ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD * BD’ - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD’ * BD - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( A ).

S (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the auxiliary values S(i) in the differential form of the progressive QD algorithm.

The size of S must be size( S ) = size( D ) = size( A ).

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dsqd ( a, b, shift, flip, d, t, l )`¶

Purpose¶

UPPER_BD_DSQD computes:

the L * D * L’ factorization of the matrix BD’ * BD - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD * BD’ - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the stationary QD algorithm of Rutishauser is used to compute the factorization (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization, the off-diagonal entries of L (or of U if FLIP=true) and the auxiliary variable T in the differential form of the stationary QD algorithm.

Arguments¶

A (INPUT) real(stnd), dimension(:)

On entry, A contains the diagonal elements of the bidiagonal matrix BD.

B (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

The size of B must be size( B ) = size( A ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD’ * BD - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD * BD’ - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( A ).

T (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the auxiliary values T(i) in the differential form of the stationary QD algorithm.

The size of T must be size( T ) = size( D ) = size( A ).

L (OUTPUT) real(stnd), dimension(:)

On exit, the off-diagonal entries of L if FLIP=false or the off-diagonal entries of U if FLIP=true.

The size of L must be size( L ) = size( B ) = size( A ) - 1.

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine upper_bd_dpqd ( a, b, shift, flip, d, s, l )`¶

Purpose¶

UPPER_BD_DPQD computes:

the L * D * L’ factorization of the matrix BD * BD’ - shift * I , if FLIP=false;

the U * D * U’ factorization of the matrix BD’ * BD - shift * I , if FLIP=true;

for a n-by-n (upper) bidiagonal matrix BD and a given shift. L and U are, respectively, unit lower and unit upper bidiagonal matrices and D is a diagonal matrix.

The differential form of the progressive QD algorithm of Rutishauser is used to compute the factorization (see the reference (1) below for further details).

The subroutine outputs the diagonal matrix D of the factorization, the off-diagonal entries of L (or of U if FLIP=true) and the auxiliary variable S in the differential form of the progressive QD algorithm.

Arguments¶

A (INPUT) real(stnd), dimension(:)

On entry, A contains the diagonal elements of the bidiagonal matrix BD.

B (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

The size of B must be size( B ) = size( A ) - 1.

SHIFT (INPUT) real(stnd)

On entry, the shift.

FLIP (INPUT) logical(lgl)

On entry, if FLIP=false the L * D * L’ factorization of the matrix BD * BD’ - shift * I is computed. Otherwise, if FLIP=true the U * D * U’ factorization of the matrix BD’ * BD - shift * I is computed.

D (OUTPUT) real(stnd), dimension(:)

On exit, the elements of the diagonal matrix D.

The size of D must be size( D ) = size( A ).

S (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the auxiliary values S(i) in the differential form of the progressive QD algorithm.

The size of S must be size( S ) = size( D ) = size( A ).

L (OUTPUT) real(stnd), dimension(:)

On exit, the off-diagonal entries of L if FLIP=false or the off-diagonal entries of U if FLIP=true.

The size of L must be size( L ) = size( B ) = size( A ) - 1.

Further Details¶

The bidiagonal matrix BD must be scaled appropriately before using this subroutine in order to avoid overflows (see the reference (1) below for further details).

This subroutine is adapted from the algorithms given in reference (1). See:

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine dflgen_bd ( d, e, lambda, cs_left, sn_left, cs_right, sn_right, scaling )`¶

Purpose¶

DFLGEN_BD computes deflation parameters (e.g. two chains of Givens rotations) for a n-by-n (upper) bidiagonal matrix BD and a given singular value of BD.

On output, the arguments CS_LEFT, SN_LEFT, CS_RIGHT and SN_RIGHT contain, respectively, the vectors of the cosines and sines coefficients of the chain of n-1 planar rotations that deflates the real n-by-n bidiagonal matrix BD corresponding to a singular value LAMBDA.

Arguments¶

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

E (INPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

The size of E must be size( E ) = size( D ) - 1.

LAMBDA (INPUT) real(stnd)

On entry, a singular value of the bidiagonal matrix BD.

CS_LEFT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the left.

The size of CS_LEFT must be size( CS_LEFT ) = size( E ) = size( D ) - 1.

SN_LEFT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the left.

The size of SN_LEFT must be size( SN_LEFT ) = size( E ) = size( D ) - 1.

CS_RIGHT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the right.

The size of CS_RIGHT must be size( CS_RIGHT ) = size( E ) = size( D ) - 1.

SN_RIGHT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the right.

The size of SN_RIGHT must be size( SN_RIGHT ) = size( E ) = size( D ) - 1.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows.

The default is to scale the bidiagonal matrix.

Further Details¶

This subroutine is adapted from the matlab routine DFLGEN in the reference (1) and algorithms given in reference (2).

For further details, see:

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine dflgen2_bd ( d, e, lambda, cs_left, sn_left, cs_right, sn_right, deflate, scaling )`¶

Purpose¶

DFLGEN2_BD computes and applies deflation parameters (e.g. two chains of Givens rotations) for a n-by-n (upper) bidiagonal matrix BD and a given singular value of BD.

On input:

The arguments D and E contain, respectively, the main diagonal and off-diagonal of the bidiagonal matrix, and the argument LAMBDA contains an estimate of the singular value.

On output:

The arguments D and E contain, respectively, the new main diagonal and off-diagonal of the deflated bidiagonal matrix if DEFLATE is set to true, otherwise D and E are not changed.

The arguments CS_LEFT, SN_LEFT, CS_RIGHT and SN_RIGHT contain, respectively, the vectors of the cosines and sines coefficients of the chain of n-1 planar rotations that deflates the real n-by-n bidiagonal matrix BD corresponding to the singular value LAMBDA. One chain is applied to the left of BD (CS_LEFT, SN_LEFT) and the other is applied to the right of BD (CS_RIGHT, SN_RIGHT).

Arguments¶

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

On exit, the new main diagonal of the bidiagonal matrix if DEFLATE=true. Otherwise, D is not changed.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

On exit, the new off-diagonal of the bidiagonal matrix if DEFLATE=true. Otherwise, E is not changed.

The size of E must be size( E ) = size( D ) - 1.

LAMBDA (INPUT) real(stnd)

On entry, a singular value of the bidiagonal matrix BD.

CS_LEFT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the left.

The size of CS_LEFT must be size( CS_LEFT ) = size( E ) = size( D ) - 1.

SN_LEFT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the left.

The size of SN_LEFT must be size( SN_LEFT ) = size( E ) = size( D ) - 1.

CS_RIGHT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the right.

The size of CS_RIGHT must be size( CS_RIGHT ) = size( E ) = size( D ) - 1.

SN_RIGHT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the right.

The size of SN_RIGHT must be size( SN_RIGHT ) = size( E ) = size( D ) - 1.

DEFLATE (OUTPUT) logical(lgl)

On exit:

DEFLATE = true : indicates successful exit.

DEFLATE = false: indicates that full accuracy was not attained in the deflation of the bidiagonal matrix.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows.

The default is to scale the bidiagonal matrix.

Further Details¶

This subroutine is adapted from the matlab routine DFLGEN in the reference (1) and algorithms given in reference (2).

For further details, see:

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Fernando, K.V., 1998:

Accurately counting singular values of bidiagonal matrices and eigenvalues of skew-symmetric tridiagonal matrices. SIAM J. Matrix Anal. Appl., Vol. 20, no 2, pp.373-399.

`subroutine dflapp_bd ( d, e, cs_left, sn_left, cs_right, sn_right, deflate )`¶

Purpose¶

DFLAPP_BD deflates a real n-by-n (upper) bidiagonal matrix BD by two chains of planar rotations produced by DFLGEN_BD or DFLGEN2_BD.

On entry, the arguments D and E contain, respectively, the main diagonal and off-diagonal of the bidiagonal matrix.

On output, the arguments D and E contain, respectively, the new main diagonal and off-diagonal of the deflated bidiagonal matrix if DEFLATE is set to true.

Arguments¶

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

On exit, the new main diagonal of the bidiagonal matrix if DEFLATE=true. Otherwise, D is not changed.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, the n-1 off-diagonal elements of the bidiagonal matrix BD.

On exit, the new off-diagonal of the bidiagonal matrix if DEFLATE=true. Otherwise, E is not changed.

The size of E must be size( E ) = size( D ) - 1.

CS_LEFT (INPUT) real(stnd), dimension(:)

On entry, the vector of the cosines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the left.

The size of CS_LEFT must be size( CS_LEFT ) = size( E ) = size( D ) - 1.

SN_LEFT (INPUT) real(stnd), dimension(:)

On entry, the vector of the sines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the left.

The size of SN_LEFT must be size( SN_LEFT ) = size( E ) = size( D ) - 1.

CS_RIGHT (INPUT) real(stnd), dimension(:)

On entry, the vector of the cosines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the right.

The size of CS_RIGHT must be size( CS_RIGHT ) = size( E ) = size( D ) - 1.

SN_RIGHT (INPUT) real(stnd), dimension(:)

On entry, the vector of the sines coefficients of the chain of n-1 Givens rotations that deflates the bidiagonal matrix BD on the right.

The size of SN_RIGHT must be size( SN_RIGHT ) = size( E ) = size( D ) - 1.

DEFLATE (OUTPUT) logical(lgl)

On exit:

DEFLATE = true : indicates successful exit.

DEFLATE = false: indicates that full accuracy was not attained in the deflation of the bidiagonal matrix.

Further Details¶

For further details, see:

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Dhillon, I.S., 1998:

Reliable computation of the condition number of a tridiagonal matrix in O(n) time. SIAM J. MATRIX ANAL. APPL, Vol. 19, 776-796.

`subroutine qrstep_bd ( d, e, lambda, cs_left, sn_left, cs_right, sn_right, deflate, update_bd )`¶

Purpose¶

QRSTEP_BD performs one QR step with a given shift LAMBDA on a n-by-n real (upper) bidiagonal matrix BD.

On entry, the arguments D and E contain, respectively, the main diagonal and superdiagonal of the bidiagonal matrix.

On output, the arguments D and E contain, respectively, the new main diagonal and superdiagonal of the updated (e.g. deflated) bidiagonal matrix, if DEFLATE is set to true or if the optional logical argument UPDATE_BD is used with the value true, otherwise they are not changed.

The two chains of n-1 planar rotations produced during the QR step with shift LAMBDA are saved in the arguments CS_LEFT, SN_LEFT, CS_RIGHT, SN_RIGHT.

Arguments¶

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

On exit, the new main diagonal of the bidiagonal matrix if DEFLATE=true or if UPDATE_BD=true. Otherwise, D is not changed.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, the n-1 superdiagonal elements of the bidiagonal matrix BD.

On exit, the new superdiagonal of the bidiagonal matrix if DEFLATE=true or if UPDATE_BD=true. Otherwise, E is not changed.

The size of E must be size( E ) = size( D ) - 1.

LAMBDA (INPUT) real(stnd)

On entry, the shift used in the current QR step.

CS_LEFT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations applied to the bidiagonal matrix BD on the left in the current QR step.

The size of CS_LEFT must be size( CS_LEFT ) = size( E ) = size( D ) - 1.

SN_LEFT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations applied to the bidiagonal matrix BD on the left in the current QR step.

The size of SN_LEFT must be size( SN_LEFT ) = size( E ) = size( D ) - 1.

CS_RIGHT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations applied to the bidiagonal matrix BD on the right in the current QR step.

The size of CS_RIGHT must be size( CS_RIGHT ) = size( E ) = size( D ) - 1.

SN_RIGHT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations applied to the bidiagonal matrix BD on the right in the current QR step.

The size of SN_RIGHT must be size( SN_RIGHT ) = size( E ) = size( D ) - 1.

DEFLATE (OUTPUT) logical(lgl)

On exit:

DEFLATE = true : indicates that deflation occured at the end of the step.

DEFLATE = false: indicates that the last superdiagonal element of the bidiagonal matrix is not small.

UPDATE_BD (INPUT, OPTIONAL) logical(lgl)

On entry:

UPDATE_BD = true : indicates that the bidiagonal matrix will be updated on exit.

UPDATE_BD = false: indicates that the bidiagonal matrix will be updated on exit only if DEFLATE = true.

The default value for UPDATE_BD is false.

Further Details¶

This subroutine is adapted from the matlab routine QRSTEP given in the reference (1). The bidiagonal matrix BD is assumed to be unreduced, but no checks are done in the subroutine to verify this hypothesis.

For further details, see:

Mastronardi, M., Van Barel, M., Van Camp, E., and Vandebril, R., 2006:

On computing the eigenvectors of a class of structured matrices. Journal of Computational and Applied Mathematics, 189, 580-591.

Demmel, J.W., and Kahan, W., 1990:

Accurate singular values of bidiagonal matrices. SIAM J. Sci. Statist. Comput., 11:5, 873-912.

`subroutine qrstep_zero_bd ( d, e, cs_left, sn_left, cs_right, sn_right, deflate, update_bd )`¶

Purpose¶

QRSTEP_ZERO_BD performs one implicit QR step with a zero shift on a n-by-n real (upper) bidiagonal matrix BD.

On entry, the arguments D and E contain, respectively, the main diagonal and superdiagonal of the bidiagonal matrix.

On output, the arguments D and E contain, respectively, the new main diagonal and superdiagonal of the updated (e.g. deflated) bidiagonal matrix, if DEFLATE is set to true or if the optional logical argument UPDATE_BD is used with the value true, otherwise they are not changed.

The two chains of n-1 planar rotations produced during the QR step with zero shift are saved in the arguments CS_LEFT, SN_LEFT, CS_RIGHT, SN_RIGHT.

Arguments¶

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

On exit, the new main diagonal of the bidiagonal matrix if DEFLATE=true or if UPDATE_BD=true. Otherwise, D is not changed.

E (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, the n-1 superdiagonal elements of the bidiagonal matrix BD.

On exit, the new superdiagonal of the bidiagonal matrix if DEFLATE=true or if UPDATE_BD=true. Otherwise, E is not changed.

The size of E must be size( E ) = size( D ) - 1.

CS_LEFT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations applied to the bidiagonal matrix BD on the left in the current QR step.

The size of CS_LEFT must be size( CS_LEFT ) = size( E ) = size( D ) - 1.

SN_LEFT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations applied to the bidiagonal matrix BD on the left in the current QR step.

The size of SN_LEFT must be size( SN_LEFT ) = size( E ) = size( D ) - 1.

CS_RIGHT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the cosines coefficients of the chain of n-1 Givens rotations applied to the bidiagonal matrix BD on the right in the current QR step.

The size of CS_RIGHT must be size( CS_RIGHT ) = size( E ) = size( D ) - 1.

SN_RIGHT (OUTPUT) real(stnd), dimension(:)

On exit, the vector of the sines coefficients of the chain of n-1 Givens rotations applied to the bidiagonal matrix BD on the right in the current QR step.

The size of SN_RIGHT must be size( SN_RIGHT ) = size( E ) = size( D ) - 1.

DEFLATE (OUTPUT) logical(lgl)

On exit:

DEFLATE = true : indicates that deflation occured at the end of the step.

DEFLATE = false: indicates that the last superdiagonal element of the bidiagonal matrix is not small.

UPDATE_BD (INPUT, OPTIONAL) logical(lgl)

On entry:

UPDATE_BD = true : indicates that the bidiagonal matrix will be updated on exit.

UPDATE_BD = false: indicates that the bidiagonal matrix will be updated on exit only if DEFLATE = true.

The default value for UPDATE_BD is false.

Further Details¶

This subroutine is adapted from the implicit zero-shift QR algorithm given in the reference (1).

For further details, see:

Demmel, J.W., and Kahan, W., 1990:

Accurate singular values of bidiagonal matrices. SIAM J. Sci. Statist. Comput., 11:5, 873-912.

`subroutine upper_bd_deflate ( d, e, singval, leftvec, rightvec, failure, max_qr_steps, scaling )`¶

Purpose¶

UPPER_BD_DEFLATE computes the left and right singular vectors of a real (upper) bidiagonal matrix BD corresponding to a specified singular value, using a deflation technique.

Arguments¶

D (INPUT) real(stnd), dimension(:)

On entry, the diagonal elements of the bidiagonal matrix BD.

E (INPUT) real(stnd), dimension(:)

On entry, the n-1 superdiagonal elements of the bidiagonal matrix BD.

The size of E must be size( E ) = size( D ) - 1 = n - 1.

SINGVAL (INPUT) real(stnd)

On entry, a singular value of the bidiagonal matrix. SINGVAL is assumed to be positive or zero.

LEFTVEC (OUTPUT) real(stnd), dimension(:)

On exit, the computed left singular vector associated with the singular value SINGVAL.

The shape of LEFTVEC must verify: size( LEFTVEC ) = size( D ) = n .

RIGHTVEC (OUTPUT) real(stnd), dimension(:)

On exit, the computed right singular vector associated with the singular value SINGVAL.

The shape of RIGHTVEC must verify: size( RIGHTVEC ) = size( D ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the deflation procedure of the bidiagonal matrix.

MAX_QR_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_QR_STEPS controls the maximum number of QR sweeps for deflating the bidiagonal matrix for a given singular value.

The algorithm fails to converge if the total number of QR sweeps exceeds MAX_QR_STEPS.

The default is 4.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows.

The default is to scale the bidiagonal matrix.

Further Details¶

UPPER_BD_DEFLATE is a low-level subroutine used by BD_DEFLATE subroutines. Its use as a stand-alone method for computing singular vectors of a bidiagonal matrix is not recommended.

Note also that the sign of the singular vectors computed by this subroutine is arbitrary and not necessarily consistent between the left and right singular vectors. In order to compute consistent singular triplets, subroutine BD_DEFLATE must be used instead.

For further details, see:

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Mastronardi, M., Van Barel, M., Van Camp, E., and Vandebril, R., 2006:

On computing the eigenvectors of a class of structured matrices. Journal of Computational and Applied Mathematics, 189, 580-591.

Demmel, J.W., and Kahan, W., 1990:

Accurate singular values of bidiagonal matrices. SIAM J. Sci. Statist. Comput., 11:5, 873-912.

`subroutine upper_bd_deflate ( d, e, singval, leftvec, rightvec, failure, max_qr_steps, scaling )`¶

Purpose¶

UPPER_BD_DEFLATE computes the left and right singular vectors of a real (upper) bidiagonal matrix BD corresponding to specified singular values, using a deflation technique.

Arguments¶

D (INPUT) real(stnd), dimension(:)

On entry, the diagonal elements of the bidiagonal matrix BD.

E (INPUT) real(stnd), dimension(:)

On entry, the n-1 superdiagonal elements of the bidiagonal matrix BD.

The size of E must be size( E ) = size( D ) - 1 = n - 1.

SINGVAL (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix. The singular values can be given in any order, but are assumed to be positive or zero.

The size of SINGVAL must verify: size( SINGVAL ) <= size( D ) = n .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value SINGVAL(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( D ) = n ,

size( LEFTVEC, 2 ) = size( SINGVAL ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value SINGVAL(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( D ) = n ,

size( RIGHTVEC, 2 ) = size( SINGVAL ) .

FAILURE (OUTPUT) logical(lgl), dimension(:)

On exit:

FAILURE(j) = FALSE : indicates successful exit for the jth singular triplet.

FAILURE(j) = TRUE : indicates that the algorithm did not converge and full accuracy was not attained in the deflation procedure of the bidiagonal matrix for the jth singular triplet.

The size of FAILURE must verify: size( FAILURE ) = size( SINGVAL ) .

MAX_QR_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_QR_STEPS controls the maximum number of QR sweeps for deflating the bidiagonal matrix for a given singular value. The algorithm fails to converge if the total number of QR sweeps for all eigenvalues exceeds MAX_QR_STEPS * size(EIGVAL).

The default is 4.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if SCALING=true the bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows.

The default is to scale the bidiagonal matrix.

Further Details¶

UPPER_BD_DEFLATE is a low-level subroutine used by BD_DEFLATE subroutines. Its use as a stand-alone method for computing singular vectors of a bidiagonal matrix is not recommended.

Note also that the sign of the singular vectors computed by this subroutine is arbitrary and not necessarily consistent between the left and right singular vectors. In order to compute consistent singular triplets, subroutine BD_DEFLATE must be used instead.

For further details, see:

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Mastronardi, M., Van Barel, M., Van Camp, E., and Vandebril, R., 2006:

On computing the eigenvectors of a class of structured matrices. Journal of Computational and Applied Mathematics, 189, 580-591.

Demmel, J.W., and Kahan, W., 1990:

Accurate singular values of bidiagonal matrices. SIAM J. Sci. Statist. Comput., 11:5, 873-912.

`subroutine bd_deflate ( upper, d, e, s, leftvec, rightvec, failure, max_qr_steps, ortho, scaling, inviter )`¶

Purpose¶

BD_DEFLATE computes the left and right singular vectors of a real n-by-n bidiagonal matrix BD corresponding to specified singular values, using deflation techniques on the bidiagonal matrix BD.

Arguments¶

UPPER (INPUT) logical(lgl)

On entry, if:

UPPER = true : BD is upper bidiagonal ;

UPPER = false : BD is lower bidiagonal.

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD.

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD. E(1) is arbitrary.

The size of E must verify: size( E ) = size( D ) = n .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= size( D ) = n .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( D ) = n ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( D ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the deflation procedure of the bidiagonal matrix.

MAX_QR_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_QR_STEPS controls the maximum number of QR sweeps for deflating the bidiagonal matrix for a given singular value. The algorithm fails to converge if the total number of QR sweeps for all singular values exceeds MAX_QR_STEPS * size(S).

The default is 4.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry, if:

ORTHO=true, the bidiagonal matrix BD is deflated sequentially for all the specified singular values; this implies that the singular vectors of the bidiagonal matrix BD will be automatically orthogonal on exit.

ORTHO=false, the bidiagonal matrix BD is deflated in parallel for the different clusters of singular values or isolated singular values; this implies that orthogonality of the singular vectors of bidiagonal matrix BD is preserved inside each cluster, but not automatically between clusters.

The default is ORTHO=false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows;

SCALING=false, the bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INVITER (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INVITER=true, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by inverse iteration instead of deflation.

INVITER=false, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by deflation.

The default is INVITER=true.

Further Details¶

The singular vectors are computed using deflation techniques applied to the bidiagonal matrix BD. The first deflation technique used in BD_DEFLATE combines an extension to bidiagonal matrices of Fernando’s approach for computing eigenvectors of tridiagonal matrices with a deflation procedure by Givens rotations originally developed by Godunov and his collaborators (see references (1) and (2) for more details). If this deflation technique failed, QR iterations are used instead as described in (3) and (4).

Optionally, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros may be also computed by inverse iteration on the Golub-Kahan tridiagonal form of the bidiagonal matrix BD. This is the default since in these cases inverse iteration is safer and faster than the deflation algorithms.

The computation of the singular vectors is parallelized if OPENMP is used.

It is essential that singular values given on entry of BD_DEFLATE are computed to high relative accuracy. Subroutines BD_SINGVAL or BD_SINGVAL2 may be used for this purpose.

BD_DEFLATE may fail if some the singular values specified in parameter S are nearly identical or for clusters of small singular values.

For further details, on the deflation techniques used in BD_DEFLATE, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Mastronardi, M., Van Barel, M., Van Camp, E., and Vandebril, R., 2006:

On computing the eigenvectors of a class of structured matrices. Journal of Computational and Applied Mathematics, 189, 580-591.

Demmel, J.W., and Kahan, W., 1990:

Accurate singular values of bidiagonal matrices. SIAM J. Sci. Statist. Comput., 11:5, 873-912.

`subroutine bd_deflate2 ( mat, tauq, taup, d, e, s, leftvec, rightvec, failure, max_qr_steps, ortho, scaling, inviter )`¶

Purpose¶

BD_DEFLATE2 computes the left and right singular vectors of a full real m-by-n matrix MAT corresponding to specified singular values, using deflation techniques.

It is required that the original matrix MAT has been reduced to upper or lower bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal. This can be done with a call to BD_CMP with parameters TAUQ and TAUP, before calling BD_SINGVAL (or BD_SINGVAL2) for computing singular values and a call to BD_DEFLATE2 for computing selected singular vectors.

Alternatively and more simply, the user can call SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines, with parameters TAUQ and TAUP present, to compute the bidiagonal reduction and singular values in one step and, finally, call DEFLATE2 for computing all or selected singular vectors of MAT.

If m >= n, BD is upper bidiagonal and if m < n, BD is lower bidiagonal.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the original m-by-n matrix after reduction by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines. MAT must contains the vectors which define the elementary reflectors H(i) and G(i) whose products determine the matrices Q and P, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines when the arguments TAUQ and TAUP are present in the call to these subroutines. MAT must be specified as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 and is not modified by the routine.

TAUQ (INPUT) real(stnd), dimension(:)

TAUQ(i) must contain the scalar factor of the elementary reflector H(i) which determines Q, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 in their array argument TAUQ.

The size of TAUQ must verify:

size( TAUQ ) = min( size(MAT,1) , size(MAT,2) ) .

TAUP (INPUT) real(stnd), dimension(:)

TAUP(i) must contain the scalar factor of the elementary reflector G(i), which determines P, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 in their array argument TAUP.

The size of TAUP must verify:

size( TAUP ) = min( size(MAT,1) , size(MAT,2) ) .

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2.

The size of D must verify:

size( D ) = min( size(MAT,1) , size(MAT,2) ) .

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2:

if m >= n, E(i) = BD(i-1,i) for i = 2,3,…,n;

if m < n, E(i) = BD(i,i-1) for i = 2,3,…,m.

E(1) is arbitrary.

The size of E must verify:

size( E ) = min( size(MAT,1) , size(MAT,2) ) .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the deflation procedure of the bidiagonal matrix BD.

MAX_QR_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_QR_STEPS controls the maximum number of QR sweeps for deflating the bidiagonal matrix BD for a given singular value. The algorithm fails to converge if the total number of QR sweeps for all singular values exceeds MAX_QR_STEPS * size(S).

The default is 4.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry:

ORTHO=true, the bidiagonal matrix BD is deflated sequentially for all the specified singular values; this implies that the singular vectors of the bidiagonal matrix BD will be automatically orthogonal on exit.

ORTHO=false, the bidiagonal matrix BD is deflated in parallel for the different clusters of singular values or isolated singular values; this implies that orthogonality of the singular vectors of bidiagonal matrix BD is preserved inside each cluster, but not automatically between clusters.

The default is ORTHO=false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the intermediate bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows;

SCALING=false, the intermediate bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INVITER (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INVITER=true, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by inverse iteration instead of deflation.

INVITER=false, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by deflation.

The default is INVITER=true.

Further Details¶

The singular vectors are computed using deflation techniques applied implicitly to the associated tridiagonal forms BD’ * BD and BD * BD’ of the bidiagonal matrix BD. See description of the BD_DEFLATE subroutine for more details.

The computation of the singular vectors is parallelized if OPENMP is used.

It is essential that singular values given on entry of BD_DEFLATE2 are computed to high (relative) accuracy. Subroutines BD_SINGVAL or BD_SINGVAL2 may be used for this purpose.

BD_DEFLATE2 may fail if some the singular values specified in parameter S are nearly identical or for clusters of small singular values for some pathological matrices.

The deflation algorithms used in BD_DEFLATE2 are competitive with the inverse iteration procedure implemented in BD_INVITER2.

For further details on the deflation techniques or the blocked back-transformation algorithm used in BD_DEFLATE2, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Mastronardi, M., Van Barel, M., Van Camp, E., and Vandebril, R., 2006:

On computing the eigenvectors of a class of structured matrices. Journal of Computational and Applied Mathematics, 189, 580-591.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine bd_deflate2 ( mat, p, d, e, s, leftvec, rightvec, failure, max_qr_steps, ortho, scaling, inviter, tol_reortho )`¶

Purpose¶

BD_DEFLATE2 computes the left and right singular vectors of a full real m-by-n matrix MAT with m>=n corresponding to specified singular values, using deflation techniques.

It is required that the original matrix MAT has been reduced to upper bidiagonal form BD by an orthogonal transformation:

Q’ * MAT * P = BD

where Q and P are orthogonal. This can be done with a call to BD_CMP2 (or a call to BD_CMP followed by a call to ORTHO_GEN_BD), before calling BD_SVD, BD_SINGVAL (or BD_SINGVAL2) subroutines for computing singular values and BD_DEFLATE2 for computing selected singular vectors.

Alternatively and more simply, the user can call SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines, with parameter P present, to compute the bidiagonal reduction and singular values in one step and, finally, call DEFLATE2 for computing all or selected singular vectors of MAT.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n orthogonal matrix Q after reduction by BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 (or by BD_CMP and ORTHO_GEN_BD). MAT is not modified by the routine.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

P (INPUT) real(stnd), dimension(:,:)

On entry, the n-by-n orthogonal matrix P after reduction by BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 (or by BD_CMP and ORTHO_GEN_BD). If P has been computed by BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4, P can be stored in factored form or not. Both cases are handled by the subroutine. P is not modified by the routine.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = size( MAT, 2 ) = n .

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD as returned by BD_CMP, BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL4 subroutines.

The size of D must verify: size( D ) = size( MAT, 2 ) = n .

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD as returned by BD_CMP, BD_CMP2, SELECT_SINGVAL_CMP3 or SELECT_SINGVAL4 subroutines:

E(i) = BD(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must verify: size( E ) = size( MAT, 2 ) = n .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= size( MAT, 2 ) = n .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the deflation procedure of the bidiagonal matrix BD.

MAX_QR_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_QR_STEPS controls the maximum number of QR sweeps for deflating the bidiagonal matrix BD for a given singular value. The algorithm fails to converge if the total number of QR sweeps for all singular values exceeds MAX_QR_STEPS * size(S).

The default is 4.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry:

ORTHO=true, the bidiagonal matrix BD is deflated sequentially for all the specified singular values; this implies that the singular vectors of the bidiagonal matrix BD will be automatically orthogonal on exit.

ORTHO=false, the bidiagonal matrix BD is deflated in parallel for the different clusters of singular values or isolated singular values; this implies that orthogonality of the singular vectors of bidiagonal matrix BD is preserved inside each cluster, but not automatically between clusters.

The default is ORTHO=false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the intermediate bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows;

SCALING=false, the intermediate bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INVITER (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INVITER=true, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by inverse iteration instead of deflation.

INVITER=false, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by deflation.

The default is INVITER=true.

TOL_REORTHO (INPUT, OPTIONAL) real(stnd)

On entry, TOL_REORTHO is used to determine if the left singular vectors stored in LEFTVEC must be reortogonalized on exit in order to correct for the loss of orthogonality in the Ralha-Barlow one-sided bidiagonal reduction algorithm if MAT is nearly deficient. If one of the singular values, S(i), verifies the condition

S(i) <= TOL_REORTHO * S(1)

all the computed left singular vectors are reorthogonalized with a QR factorization. If S(1) is the largest singular value of MAT, this condition leads to the assertion that the rank of MAT is less than size(S) and is thus a nearly singular matrix if TOL_REORTHO is a small positive value of the order of the machine epsilon.

TOL_REORTHO must be greater or equal to zero and less than or equal to one. If TOL_REORTHO = 0. is used, the left singular vectors are reorthogonalized only if some singular values are almost zero. On the other hand, If TOL_REORTHO = 1. is used, the left singular vectors are always reorthogonalized. If TOL_REORTHO is specified as less than zero or greater than one, the default value is used.

The default value is the value of the module parameter tol_reortho_def if size( S ) = n and tol_reortho_partial_def otherwise.

Further Details¶

The singular vectors are computed using deflation techniques applied implicitly to the associated tridiagonal forms BD’ * BD and BD * BD’ of the bidiagonal matrix BD. See description of the BD_DEFLATE subroutine for more details.

The computation of the singular vectors is parallelized if OPENMP is used.

It is essential that singular values given on entry of BD_DEFLATE2 are computed to high (relative) accuracy. Subroutines BD_SINGVAL or BD_SINGVAL2 may be used for this purpose.

BD_DEFLATE2 may fail if some the singular values specified in parameter S are nearly identical or for clusters of small singular values for some pathological matrices.

The deflation algorithms used in BD_DEFLATE2 are competitive with the inverse iteration procedure implemented in BD_INVITER2.

For further details on the deflation techniques or the blocked back-transformation algorithm used in BD_DEFLATE2, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Mastronardi, M., Van Barel, M., Van Camp, E., and Vandebril, R., 2006:

On computing the eigenvectors of a class of structured matrices. Journal of Computational and Applied Mathematics, 189, 580-591.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine bd_deflate2 ( mat, tauq, taup, rlmat, d, e, s, leftvec, rightvec, failure, tauo, max_qr_steps, ortho, scaling, inviter )`¶

Purpose¶

BD_DEFLATE2 computes the left and right singular vectors of a full real m-by-n matrix MAT corresponding to specified singular values, using deflation techniques.

It is required that the original matrix MAT has been reduced to upper bidiagonal form BD by a two-step algorithm as performed by BD_CMP subroutine with parameters TAUQ, TAUP, RLMAT, and eventually TAUO, or more simply by SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines with the same arguments:

If m >= n, a QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular. In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

If m < n, an LQ factorization of the real m-by-n matrix MAT is first computed

MAT = L * O

where O is orthogonal and L is lower triangular. In a second step, the m-by-m lower triangular matrix L is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * L * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix.

After this call to BD_CMP with parameters TAUQ, TAUP, RLMAT, and eventually TAUO, the user can call BD_SINGVAL or BD_SINGVAL2 subroutines for computing singular values of BD and, finally, BD_DEFLATE2 for computing all or selected singular vectors of MAT.

Alternatively and more simply, the user can call SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines, with parameters TAUQ, TAUP, RLMAT, and eventually TAUO, to perform the two-stage bidiagonal reduction and get singular values in one step and, finally, call BD_DEFLATE2 for computing all or selected singular vectors of MAT.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the original m-by-n matrix after a two-stage bidiagonal reduction by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines (e.g. when the matrix argument RLMAT is present in the call to these subroutines). The precise content of MAT is determined by the presence or absence of the optional argument TAUO:

if the optional argument TAUO is absent, it is assumed that the first n columns (if m>=n) or the first m rows (if m<n) of the orthogonal matrix O is stored explicitly in the argument MAT on entry;

if the optional argument TAUO is present, it is assumed that the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on entry. In this case, MAT must contains the vectors which define the elementary reflectors W(i) whose products determine the matrix O, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 subroutines when the matrix argument RLMAT is present in the call to these subroutines.

In both cases, MAT must be specified as returned by BD_CMP and is not modified by the routine.

TAUQ (INPUT) real(stnd), dimension(:)

TAUQ(i) must contain the scalar factor of the elementary reflector H(i) which determines Q, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 in their array argument TAUQ.

The size of TAUQ must verify: size( TAUQ ) = min( size(MAT,1) , size(MAT,2) ) .

TAUP (INPUT) real(stnd), dimension(:)

TAUP(i) must contain the scalar factor of the elementary reflector G(i), which determines P, as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 in their array argument TAUP.

The size of TAUP must verify: size( TAUP ) = min( size(MAT,1) , size(MAT,2) ) .

RLMAT (INPUT) real(stnd), dimension(:,:)

On entry, the elements on and below the diagonal, with the array TAUQ, represent the orthogonal matrix Q as a product of elementary reflectors, and the elements above the diagonal, with the array TAUP, represent the orthogonal matrix P as a product of elementary reflectors. RLMAT must be specified as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2 and is not modified by the routine.

The shape of RLMAT must verify: size( RLMAT, 1 ) = size( RLMAT, 2 ) = min( size(MAT,1) , size(MAT,2) ).

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the upper bidiagonal matrix BD as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2.

The size of D must verify: size( D ) = min( size(MAT,1) , size(MAT,2) ) .

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the upper bidiagonal matrix BD as returned by BD_CMP, SELECT_SINGVAL_CMP or SELECT_SINGVAL_CMP2:

E(i) = BD(i-1,i) for i = 2,3,…,min(m,n);

E(1) is arbitrary.

The size of E must verify: size( E ) = min( size(MAT,1) , size(MAT,2) ) .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the upper bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= min( size(MAT,1) , size(MAT,2) ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit ;

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the deflation procedure of the bidiagonal matrix BD .

TAUO (INPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors W(i), which represent the orthogonal matrix O of the QR or LQ decomposition of MAT.

If the optional argument TAUO is present, it is assumed that the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on entry.

If the optional argument TAUO is absent, it is assumed that the orthogonal matrix O is stored explicitly in the argument MAT on entry.

If the optional argument TAUO has been specified in the initial call to the BD_CMP subroutine, this optional argument TAUO must also be specified in the call to BD_DEFLATE2, otherwise the results will be incorrect.

See description of the argument MAT in the description of the BD_CMP subroutine, when the argument RLMAT is also present, for further details.

The size of TAUO must be min( size(MAT,1) , size(MAT,2) ).

MAX_QR_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_QR_STEPS controls the maximum number of QR sweeps for deflating the bidiagonal matrix BD for a given singular value. The algorithm fails to converge if the total number of QR sweeps for all singular values exceeds MAX_QR_STEPS * size(S).

The default is 4.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry:

ORTHO=true, the bidiagonal matrix BD is deflated sequentially for all the specified singular values; this implies that the singular vectors of the bidiagonal matrix BD will be automatically orthogonal on exit.

ORTHO=false, the bidiagonal matrix BD is deflated in parallel for the different clusters of singular values or isolated singular values; this implies that orthogonality of the singular vectors of bidiagonal matrix BD is preserved inside each cluster, but not automatically between clusters.

The default is ORTHO=false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the intermediate bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows;

SCALING=false, the intermediate bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INVITER (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INVITER=true, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by inverse iteration instead of deflation.

INVITER=false, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by deflation.

The default is INVITER=true.

Further Details¶

The singular vectors of BD are computed using deflation techniques applied implicitly to the associated tridiagonal forms BD’ * BD and BD * BD’ of the bidiagonal matrix BD. See description of the BD_DEFLATE subroutine for more details.

The singular vectors of MAT are finally computed by a blocked back-transformation algorithm.

The computation of the singular vectors of BD and the blocked back-transformation algorithm to find the singular vectors of MAT are parallelized if OPENMP is used.

It is essential that singular values given on entry of BD_DEFLATE2 are computed to high (relative) accuracy. Subroutines BD_SINGVAL or BD_SINGVAL2 may be used for this purpose.

BD_DEFLATE2 may fail if some the singular values specified in parameter S are nearly identical or for clusters of small singular values for some pathological matrices.

The deflation algorithms used in BD_DEFLATE2 are competitive with the inverse iteration procedure implemented in BD_INVITER2.

For further details on the deflation techniques or the blocked back-transformation algorithm used in BD_DEFLATE2, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Mastronardi, M., Van Barel, M., Van Camp, E., and Vandebril, R., 2006:

On computing the eigenvectors of a class of structured matrices. Journal of Computational and Applied Mathematics, 189, 580-591.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine bd_deflate2 ( mat, rmat, p, d, e, s, leftvec, rightvec, failure, tauo, max_qr_steps, ortho, scaling, inviter, tol_reortho )`¶

Purpose¶

BD_DEFLATE2 computes all or selected left and right singular vectors of a full real m-by-n matrix MAT with m>=n corresponding to specified singular values, using deflation techniques.

It is required that the original matrix MAT has been reduced to upper bidiagonal form BD by a two-step algorithm as performed by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines with parameters P, RMAT, and eventually TAUO:

A QR factorization of the real m-by-n matrix MAT is first computed

MAT = O * R

where O is orthogonal and R is upper triangular. In a second step, the n-by-n upper triangular matrix R is reduced to upper bidiagonal form BD by an orthogonal transformation :

Q’ * R * P = BD

where Q and P are orthogonal and BD is an upper bidiagonal matrix. Subroutines SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 computes O, Q, P, BD and also all or some of the singular values of R, which are also the singular values of MAT. Using this two-step factorization, BD_DEFLATE2 computes all or selected left and right singular vectors of R and apply to them a back-transformation algorithm to obtain the corresponding left and right singular vectors of MAT.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the original m-by-n matrix after the two-step bidiagonal reduction by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines with arguments P, RMAT, and eventually TAUO. The precise content of MAT is determined by the presence or absence of the optional argument TAUO:

if the optional argument TAUO is absent, it is assumed that the first n columns of the orthogonal matrix O is stored explicitly in the argument MAT on entry,

if the optional argument TAUO is present, it is assumed that the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on entry. In this case, MAT must contains the vectors which define the elementary reflectors W(i) whose products determine the matrix O, as returned by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines when the matrix argument RLMAT is present in the call to these subroutines.

In both cases, MAT must be specified as returned by these subroutines and is not modified by the routine. See the description of the argument TAUO below for further details.

The shape of MAT must verify: size( MAT, 1 ) >= size( MAT, 2 ) = n .

RMAT (INPUT) real(stnd), dimension(:,:)

On entry, the n-by-n orthogonal matrix Q after reduction by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines. RMAT must be specified as returned by these subroutines and is not modified by the routine.

The shape of RMAT must verify: size( RMAT, 1 ) = size( RMAT, 2 ) = size( MAT, 2 ) = n.

P (INPUT) real(stnd), dimension(:,:)

On entry, the n-by-n orthogonal matrix P after reduction by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines. P can be stored in factored form or not. Both cases are handled by the subroutine and P is not modified by the routine.

The shape of P must verify: size( P, 1 ) = size( P, 2 ) = size( MAT, 2 ) = n .

D (INPUT) real(stnd), dimension(:)

On entry, D contains the diagonal elements of the bidiagonal matrix BD as returned by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4.

The size of D must verify: size( D ) = size( MAT, 2 ) = n .

E (INPUT) real(stnd), dimension(:)

On entry, E contains the off-diagonal elements of the bidiagonal matrix BD as returned by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4:

E(i) = BD(i-1,i) for i = 2,3,…,n;

E(1) is arbitrary.

The size of E must verify: size( E ) = size( MAT, 2 ) = n .

S (INPUT) real(stnd), dimension(:)

On entry, selected singular values of the bidiagonal matrix BD. The singular values must be given in decreasing order and are assumed to be positive or zero.

The size of S must verify: size( S ) <= size( MAT, 2 ) .

LEFTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed left singular vectors. The left singular vector associated with the singular value S(j) is stored in the j-th column of LEFTVEC.

The shape of LEFTVEC must verify:

size( LEFTVEC, 1 ) = size( MAT, 1 ) = m ,

size( LEFTVEC, 2 ) = size( S ) .

RIGHTVEC (OUTPUT) real(stnd), dimension(:,:)

On exit, the computed right singular vectors. The right singular vector associated with the singular value S(j) is stored in the j-th column of RIGHTVEC.

The shape of RIGHTVEC must verify:

size( RIGHTVEC, 1 ) = size( MAT, 2 ) = n ,

size( RIGHTVEC, 2 ) = size( S ) .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit,

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the deflation procedure of the bidiagonal matrix BD.

TAUO (INPUT, OPTIONAL) real(stnd), dimension(:)

The scalar factors of the elementary reflectors W(i), which represent the orthogonal matrix O of the QR decomposition of MAT as returned by SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4.

If the optional argument TAUO is present, it is assumed that the orthogonal matrix O is stored in factored form, as a product of elementary reflectors, in the argument MAT on entry.

If the optional argument TAUO is absent, it is assumed that the orthogonal matrix O is stored explicitly in the argument MAT on entry.

If the optional argument TAUO has been specified in the initial call to the SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines, this optional argument TAUO must also be specified in the call to BD_DEFLATE2, otherwise the results will be incorrect.

See description of the argument MAT in the description of the SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 subroutines, when the argument RMAT is also present, for further details.

The size of TAUO must be size(MAT,2) = n .

MAX_QR_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_QR_STEPS controls the maximum number of QR sweeps for deflating the bidiagonal matrix BD for a given singular value. The algorithm fails to converge if the total number of QR sweeps for all singular values exceeds MAX_QR_STEPS * size(S).

The default is 4.

ORTHO (INPUT, OPTIONAL) logical(lgl)

On entry:

ORTHO=true, the bidiagonal matrix BD is deflated sequentially for all the specified singular values; this implies that the singular vectors of the bidiagonal matrix BD will be automatically orthogonal on exit;

ORTHO=false, the bidiagonal matrix BD is deflated in parallel for the different clusters of singular values or isolated singular values; this implies that orthogonality of the singular vectors of bidiagonal matrix BD is preserved inside each cluster, but not automatically between clusters.

The default is ORTHO=false.

SCALING (INPUT, OPTIONAL) logical(lgl)

On entry, if:

SCALING=true, the intermediate bidiagonal matrix BD is scaled before computing the deflation parameters in order to avoid overflows;

SCALING=false, the intermediate bidiagonal matrix BD is not scaled.

The default is to scale the bidiagonal matrix.

INVITER (INPUT, OPTIONAL) logical(lgl)

On entry, if:

INVITER=true, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by inverse iteration instead of deflation.

INVITER=false, singular vectors corresponding to isolated singular values or singular vectors of bidiagonal matrices with zeros are computed by deflation.

The default is INVITER=true.

TOL_REORTHO (INPUT, OPTIONAL) real(stnd)

On entry, TOL_REORTHO is used to determine if the left singular vectors stored in LEFTVEC must be reortogonalized on exit in order to correct for the loss of orthogonality in the Ralha-Barlow one-sided bidiagonal reduction algorithm if MAT is nearly deficient. If one of the singular values, S(i), verifies the condition

S(i) <= TOL_REORTHO * S(1)

all the computed left singular vectors are reorthogonalized with a QR factorization. If S(1) is the largest singular value of MAT, this condition leads to the assertion that the rank of MAT is less than size(S) and is thus a nearly singular matrix if TOL_REORTHO is a small positive value of the order of the machine epsilon.

TOL_REORTHO must be greater or equal to zero and less than or equal to one. If TOL_REORTHO = 0. is used, the left singular vectors are reorthogonalized only if some singular values are almost zero. On the other hand, If TOL_REORTHO = 1. is used, the left singular vectors are always reorthogonalized. If TOL_REORTHO is specified as less than zero or greater than one, the default value is used.

The default value is the value of the module parameter tol_reortho_def if size( S ) = n and tol_reortho_partial_def otherwise.

Further Details¶

The singular vectors are computed using deflation techniques applied implicitly to the associated tridiagonal forms BD’ * BD and BD * BD’ of the bidiagonal matrix BD. See description of the BD_DEFLATE subroutine for more details.

The computation of the singular vectors is parallelized if OPENMP is used.

It is essential that singular values given on entry of BD_DEFLATE2 are computed to high (relative) accuracy. Subroutines SELECT_SINGVAL_CMP3 or SELECT_SINGVAL_CMP4 may be used for this purpose.

BD_DEFLATE2 may fail if some the singular values specified in parameter S are nearly identical or for clusters of small singular values for some pathological matrices.

The deflation algorithms used in BD_DEFLATE2 are competitive with the inverse iteration procedure implemented in BD_INVITER2.

For further details on the deflation techniques or the blocked back-transformation algorithm used in BD_DEFLATE2, see:

Fernando, K.V., 1997:

On computing an eigenvector of a tridiagonal matrix. Part I: Basic results. Siam J. Matrix Anal. Appl., Vol. 18, pp. 1013-1034.

Malyshev, A.N., 2000:

On deflation for symmetric tridiagonal matrices. Report 182 of the Department of Informatics, University of Bergen, Norway.

Mastronardi, M., Van Barel, M., Van Camp, E., and Vandebril, R., 2006:

On computing the eigenvectors of a class of structured matrices. Journal of Computational and Applied Mathematics, 189, 580-591.

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore.

`subroutine svd_sort ( sort, d, u, v )`¶

Purpose¶

Given the singular values D and singular vectors U and V as output from BD_SVD, SVD_CMP or SVD_CMP3, this subroutine sorts the singular values into ascending or descending order, and, rearranges the columns of U and V correspondingly.

Arguments¶

SORT (INPUT) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

The singular vectors are rearranged accordingly.

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, the singular values.

On exit, the singular values in ascending or decreasing order.

U (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the columns of U are the (left) singular vectors.

On exit, U contains the rearranged (left) singular vectors.

The shape of U must verify: size(U,2) = size( D ) .

V (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the columns of V are the (right) singular vectors.

On exit, V contains the rearranged (right) singular vectors.

The shape of V must verify: size(V,2) = size( D ) .

Further Details¶

The method is straight insertion.

`subroutine svd_sort2 ( sort, d, u, vt )`¶

Purpose¶

Given the singular values D and singular vectors U and VT as output from BD_SVD2 or SVD_CMP2, this subroutine sorts the singular values into ascending or descending order, and, rearranges the columns of U and the rows of VT correspondingly.

Arguments¶

SORT (INPUT) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

The singular vectors are rearranged accordingly.

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, the singular values.

On exit, the singular values in ascending or decreasing order.

U (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the columns of U are the (left) singular vectors.

On exit, U contains the rearranged (left) singular vectors.

The shape of U must verify: size(U,2) = size( D ) .

VT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the rows of VT are the (right) singular vectors.

On exit, VT contains the rearranged (right) singular vectors.

The shape of VT must verify: size(VT,1) = size( D ) .

Further Details¶

The method is straight insertion.

`subroutine singvec_sort ( sort, d, u )`¶

Purpose¶

Given the singular values D and singular vectors U, stored columwise, as output from BD_SVD, SVD_CMP, BD_SVD2, SVD_CMP2 or SVD_CMP3, this subroutine sorts the singular values into ascending or descending order, and, rearranges the columns of U correspondingly.

Arguments¶

SORT (INPUT) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

The singular vectors are reordered accordingly.

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, the singular values.

On exit, the singular values in ascending or decreasing order.

U (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the columns of U are the singular vectors.

On exit, U contains the reordered singular vectors.

The shape of U must verify: size(U,2) = size( D ) = n .

Further Details¶

The method is straight insertion.

`subroutine singval_sort ( sort, d )`¶

Purpose¶

Given the singular values D as output from BD_SVD, BD_SVD2, SVD_CMP, SVD_CMP2 or SVD_CMP3, this routine sorts the singular values into ascending or descending order.

Arguments¶

SORT (INPUT) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’.

D (INPUT/OUTPUT) real(stnd), dimension(:)

On entry, the singular values.

On exit, the singular values in ascending or decreasing order.

Further Details¶

The method is quick sort.

`subroutine product_svd_cmp ( a, b, s, failure, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

This subroutine computes the singular value decomposition of the product of a m-by-n matrix A by the transpose of a p-by-n matrix B:

A * B’ = U * SIGMA * V’

where A and B have more rows than columns ( n<=min(m,p) ), SIGMA is an n-by-n matrix which is zero except for its diagonal elements, U is an m-by-n orthogonal matrix, and V is an p-by-n orthogonal matrix. The diagonal elements of SIGMA are the singular values of A * B’; they are real and non-negative. The columns of U and V are the left and right singular vectors of A * B’, respectively.

Arguments¶

A (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix A.

On exit, the m-by-n left-singular matrix U.

B (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the p-by-n matrix B.

On exit, the p-by-n right-singular matrix V.

S (OUTPUT) real(stnd), dimension(:)

The singular values of A * B’.

The size of S must verify: size( S ) = n .

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit

FAILURE = true : indicates that the algorithm did not converge and that full accuracy was not attained in the SVD.

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’. The singular vectors are rearranged accordingly.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form of A * B’ fails to converge if the number of QR sweeps exceeds MAXITER * n. Convergence usually occurs in about 2 * n QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of n and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

The size of S must match: size( S ) = size( A, 2 ) = size( B, 2 ) .

`function ginv ( mat, tol, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

GINV returns the generalized inverse of a m-by-n real matrix, MAT. The generalized inverse of MAT is a n-by-m matrix.

Arguments¶

MAT (INPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

TOL (INPUT, OPTIONAL) real(stnd)

On entry:

If TOL is less than or equal to zero or is absent, the function computes the generalized inverse of MAT.

If TOL is greater than zero, the subroutine computes the generalized inverse of a matrix close to MAT, but having condition number in the 2-norm less than 1/TOL.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD phase of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of min(m,n) and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed generalized inverse at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

If MAT is the null matrix or the SVD algorithm used to compute the generalized inverse of MAT did not converge and full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form of MAT, function GINV returns a n-by-m matrix filled with NAN() function.

The computation of the generalized inverse is parallelized if OPENMP is used.

For further details, on the generalized inverse of a rectangular matrix and the algorithm to compute it, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore, Maryland.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

`subroutine comp_ginv ( mat, failure, matginv, tol, singvalues, krank, mul_size, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

COMP_GINV computes the generalized inverse of a m-by-n real matrix, MAT.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT is destroyed.

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that MAT is the null matrix or that the SVD algorithm which is used to compute the generalized inverse of MAT did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of MAT.

MATGINV (OUTPUT) real(stnd), dimension(:,:)

On exit, MATGINV contains the generalized inverse of MAT or the generalized inverse of a matrix close to MAT.

The shape of MATGINV must verify:

size(MATGINV,1) = size(MAT,2) = n ,

size(MATGINV,2) = size(MAT,1) = m .

TOL (INPUT, OPTIONAL) real(stnd)

On entry, if:

TOL is less than or equal to zero or is absent, the subroutine computes the generalized inverse of MAT.

TOL is greater than zero, the subroutine computes the generalized inverse of a matrix close to MAT, but having condition number in the 2-norm less than 1/TOL.

SINGVALUES (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The singular values of MAT in decreasing order. The condition number of MAT in the 2-norm is

SINGVALUES(1)/SINGVALUES(min(m,n)).

The size of SINGVALUES must verify : size( SINGVALUES ) = min(m,n) .

KRANK (OUTPUT, OPTIONAL) integer(i4b)

On exit, the effective rank of MAT, i.e., the number of singular values which are greater than TOL * SINGVALUES(1).

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= max(m,n), otherwise a default value is used. MUL_SIZE can be increased or decreased to improve the performance of the algorithm.

The default value is 32.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of min(m,n) and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed generalized inverse at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

If all the elements of MAT are equal to zero, subroutine COMP_GINV returns a n-by-m matrix filled with NAN() function in argument MATGINV and the logical argument FAILURE is set to .true. .

The computation of the generalized inverse is parallelized if OPENMP is used.

For further details, on the generalized inverse of a rectangular matrix and the algorithm to compute it, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore, Maryland.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

`subroutine comp_ginv ( mat, failure, tol, singvalues, krank, mul_size, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

Purpose¶

COMP_GINV computes the generalized inverse of a m-by-n real matrix, MAT.

Arguments¶

MAT (INPUT/OUTPUT) real(stnd), dimension(:,:)

On entry, the m-by-n matrix MAT.

On exit, MAT contains the transpose of the generalized inverse of MAT or of the generalized inverse of a matrix close to MAT.

FAILURE (OUTPUT) logical(lgl)

On exit:

FAILURE = false : indicates successful exit.

FAILURE = true : indicates that MAT is the null matrix or that the SVD algorithm which is used to compute the generalized inverse of MAT did not converge and that full accuracy was not attained in the bidiagonal SVD of an intermediate bidiagonal form B of MAT.

TOL (INPUT, OPTIONAL) real(stnd)

On entry, if:

TOL is less than or equal to zero or is absent, the subroutine computes the generalized inverse of MAT.

TOL is greater than zero, the subroutine computes the generalized inverse of a matrix close to MAT, but having condition number in the 2-norm less than 1/TOL.

SINGVALUES (OUTPUT, OPTIONAL) real(stnd), dimension(:)

The singular values of MAT in decreasing order. The condition number of MAT in the 2-norm is

SINGVALUES(1)/SINGVALUES(min(m,n)).

The size of SINGVALUES must verify: size( SINGVALUES ) = min(m,n) .

KRANK (OUTPUT, OPTIONAL) integer(i4b)

On exit, the effective rank of MAT, i.e., the number of singular values which are greater than TOL * SINGVALUES(1).

MUL_SIZE (INPUT, OPTIONAL) integer(i4b)

Internal parameter. MUL_SIZE must verify: 1 <= MUL_SIZE <= max(m,n), otherwise a default value is used. MUL_SIZE can be increased or decreased to improve the performance of the algorithm.

The default value is 32.

MAXITER (INPUT, OPTIONAL) integer(i4b)

MAXITER controls the maximum number of QR sweeps in the bidiagonal SVD phase of the SVD algorithm.

The bidiagonal SVD algorithm of an intermediate bidiagonal form B of MAT fails to converge if the number of QR sweeps exceeds MAXITER * min(m,n). Convergence usually occurs in about 2 * min(m,n) QR sweeps.

The default is 10.

MAX_FRANCIS_STEPS (INPUT, OPTIONAL) integer(i4b)

MAX_FRANCIS_STEPS controls the maximum number of Francis sets (e.g. QR sweeps) of Givens rotations which must be saved before applying them with a wavefront algorithm to accumulate the singular vectors in the bidiagonal SVD algorithm.

MAX_FRANCIS_STEPS is a strictly positive integer, otherwise the default value is used.

The default is the minimum of n and the integer parameter MAX_FRANCIS_STEPS_SVD specified in the module Select_Parameters.

PERFECT_SHIFT (INPUT, OPTIONAL) logical(lgl)

PERFECT_SHIFT determines if a perfect shift strategy is used in the implicit QR algorithm in order to minimize the number of QR sweeps in the bidiagonal SVD algorithm.

The default is true.

BISECT (INPUT, OPTIONAL) logical(lgl)

BISECT determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If BISECT is set to true, singular values are computed with a more accurate bisection algorithm delivering improved accuracy in the final computed generalized inverse at the expense of a slightly slower execution time.

If BISECT is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

DQDS (INPUT, OPTIONAL) logical(lgl)

DQDS determines how the singular values are computed if a perfect shift strategy is used in the bidiagonal SVD algorithm (e.g., if PERFECT_SHIFT is equal to TRUE). This argument has no effect if PERFECT_SHIFT is equal to false.

If DQDS is set to true, singular values are computed with a more accurate dqds algorithm delivering improved accuracy in the final computed SVD decomposition at the expense of a slightly slower execution time.

If DQDS is set to false, singular values are computed with the fast Pal-Walker-Kahan algorithm.

If both optional arguments BISECT and DQDS are specified with the value true, the bisection algorithm is used.

The default is false.

Further Details¶

If all the elements of MAT are equal to zero, subroutine COMP_GINV returns a m-by-n matrix filled with NAN() function in argument MAT and the logical argument FAILURE is set to .true. .

The computation of the generalized inverse is parallelized if OPENMP is used.

For further details, on the generalized inverse of a rectangular matrix and the algorithm to compute it, see:

Golub, G.H., and Van Loan, C.F., 1996:

Matrix Computations. 3rd ed. The Johns Hopkins University Press, Baltimore, Maryland.

Lawson, C.L., and Hanson, R.J., 1974:

Solving least square problems. Prentice-Hall.

`subroutine gen_bd_mat ( type, d, e, failure, known_singval, from_tridiag, singval, sort, val1, val2, l0, glu0 )`¶

Purpose¶

GEN_BD_MAT generates different types of bidiagonal matrices with known singular values or specific numerical properties such as clustered singular values for testing purposes of singular value decomposition bidiagonal solvers.

Optionally, the singular values of the selected bidiagonal matrix can be computed analytically, if possible, or by a bisection algorithm with high absolute and relative accuracies.

Arguments¶

TYPE (INPUT) integer(i4b)

Select the type of bidiagonal matrix BD to be generated by the subroutine.

If TYPE is between 1 and 56, the subroutine generates a specific bidiagonal matrix as described in the comments inside the code of the subroutine. For other values of TYPE, all diagonal and off-diagonal elements of the bidiagonal matrix are generated from an uniform random numbers distribution between 0 and 1.

For TYPE between 1 and 17, the singular values of the bidiagonal matrix are known analytically. For other values of TYPE, the singular values may be estimated by a bisection algorithm with high accuracy. In all cases, the singular values may be output in the optional parameter SINGVAL.

For TYPE between 1 and 11 or 52 and 56, the bidiagonal matrix BD is computed as the Cholesky factor of symmetric positive-definite tridiagonal matrices.

D (OUTPUT) real(stnd), dimension(:)

On exit, D contains the diagonal elements of the bidiagonal matrix BD.

The size of D must verify: size( D )>=2 .

E (OUTPUT) real(stnd), dimension(:)

On exit, E(2:) contains the off-diagonal elements of the bidiagonal matrix BD. E(1) is arbitrary, but is set to zero.

The size of E must verify: size( E ) = size( D ) .

FAILURE (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FAILURE = false : indicates that the singular values of BD are known analytically or have been computed with high accuracy;

FAILURE = true : indicates that the singular values of BD are not known analytically and have not been computed with maximum accuracy with the bisection algorithm.

KNOWN_SINGVAL (OUTPUT, OPTIONAL) logical(lgl)

On exit:

KNOWN_SINGVAL = true : indicates that the singular values of BD are known analytically for the selected TYPE.

KNOWN_SINGVAL = false : indicates that the eigenvalues of BD are not known analytically for the selected TYPE.

FROM_TRIDIAG (OUTPUT, OPTIONAL) logical(lgl)

On exit:

FROM_TRIDIAG = true : indicates that the bidiagonal matrix BD has been computed as the Cholesky factor of a positive-definite tridiagonal matrix for the selected TYPE.

FROM_TRIDIAG = false : indicates that the bidiagonal matrix BD has not been computed as the Cholesky factor of a positive-definite tridiagonal matrix for the selected TYPE.

SINGVAL (OUTPUT, OPTIONAL) real(stnd), dimension(:)

On exit, the singular values of BD computed analytically or estimated to high accuracy with a bisection algorithm.

The size of SINGVAL must verify: size( SINGVAL ) = size( D ) .

SORT (INPUT, OPTIONAL) character

Sort the singular values into ascending order if SORT = ‘A’ or ‘a’, or in descending order if SORT = ‘D’ or ‘d’, if the optional argument SINGVAL is present. For other values of SORT nothing is done and SINGVAL(:) may not be sorted.

VAL1 (INPUT, OPTIONAL) real(stnd)

On entry, specifies the parameter d0 for parametrized bidiagonal matrices (e.g. TYPE= 2-8, 10, 15, 32-35).

If this parameter is changed for TYPE between 2 and 8, care must be taken to insure that the initial symmetric tridiagonal matrix, which is used to derive the bidiagonal matrix BD, is positive-definite. If this is not the case, the subroutine will issue an error message and stop the program.

Also, if this parameter is changed for TYPE between 32 and 35, which correspond to graded (or reversely graded) matrices with an arithmetic or geometric progression, care must be taken to insure that some elements of the arithmetic or geometric progression will not underflow or overflow as no checks are done in the subroutine for such errors.

The default for VAL1 is:

for TYPE between 2 and 7;

for TYPE equal to 8;

for TYPE equal to 10;

for TYPE equal to 15;

for TYPE between 32 and 35.

VAL2 (INPUT, OPTIONAL) real(stnd)

On entry, specifies the parameter e0 for parametrized bidiagonal matrices (e.g. TYPE= 2-8, 10, 32-35).

If this parameter is changed for TYPE between 2 and 8, care must be taken to insure that the initial symmetric tridiagonal matrix, which is used to derive the bidiagonal matrix BD, is positive-definite. If this is not the case, the subroutine will issue an error message and stop the program.

Also, if this parameter is changed for TYPE between 32 and 35, which correspond to graded (or reversely graded) matrices with an arithmetic or geometric progression, care must be taken to insure that some elements of the arithmetic or geometric progression will not underflow or overflow as no checks are done in the subroutine for such errors.

The default for VAL2 is:

for TYPE between 2 and 7;

for TYPE equal to 8;

for TYPE equal to 10;

for TYPE between 32 and 35.

L0 (INPUT, OPTIONAL) integer(i4b)

On entry, specify the radius of the initial matrix for parametrized form of glued bidiagonal matrices (e.g. for TYPE equal to 44, 46, 48, 53, 55).

L0 must be greater than 0 and preferably less or equal to size( D )/2 . The default is 5.

GLU0 (INPUT, OPTIONAL) real(stnd)

On entry, specify the glue parameter for parametrized form of glued bidiagonal matrices (e.g. for TYPE equal to 44, 46, 48, 53, 55).

The default is sqrt( epsilon(GLU0) ).

Further Details¶

This subroutine tries to take care of imprecisions in intrinsic subroutines (e.g. like the cos function in the gfortran compiler) when computing singular values by analytic formulae.

For further details on the bidiagonal matrices used for testing in GEN_BD_MAT subroutine, see:

Gladwell, G.M.L., Jones, T.H., Willms N.B., 2014:

A test matrix for an inverse eigenvalue problem. Journal of Applied Mathematics, 14, 6 pages, Article ID 515082, DOI 10.1155/2014/515082.

Clement, P.A., 1959:

A class of triple-diagonal matrices for test purposes. SIAM Review, 1(1):50-52, DOI 10.1137/1001006.

Gregory, R.T., Karney, D.L., 1969:

A collection of matrices for testing computational algorithms. New York: Wiley. Reprinted with corrections by Robert E. Krieger, Huntington, New York, 1978.

Higham, N.J., 1991: Algorithm 694:

A collection of test matrices in MATLAB. ACM Transactions on Mathematical Software 17(3):289-305 DOI 10.1145/114697.116805.

Godunov, S.K., Antonov, A.G., Kirillyuk, O.P., and Kostin, V.I., 1993:

Guaranteed Accuracy in numerical linear algebra. Kluwer Academic Publishers.

Parlett, B.N., and Vomel, C., 2005:

How the MRRR algorithm can fail on tight eigenvalue clusters. Lapack Working Note 163.

Nakatsukasa, Y., Aishima, K., and Yamazaki, I., 2012:

dqds with agressive early deflation. SIAM J. Matrix Anal. Appl., 33(1): 22-51.

Fernando, K.V., and Parlett, B.N., 1994:

Accurate singular values and differenial qd algorithms. Numer. Math., 67: 191-229.

Module_SVD_Procedures¶

subroutine bd_cmp ( mat, d, e, tauq, taup )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_cmp ( mat, d, e, tauq )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_cmp ( mat, d, e, tauq, taup, rlmat, tauo )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_cmp ( mat, d, e )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_cmp2 ( mat, d, e, p, failure, gen_p, reortho )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_cmp2 ( mat, d, e, failure, reortho )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_cmp3 ( mat, d, e, gen_p, failure )¶

Purpose¶

Arguments¶

Further Details¶

subroutine ortho_gen_bd ( mat, tauq, taup, p )¶

Purpose¶

Arguments¶

Further Details¶

subroutine ortho_gen_bd2 ( mat, tauq, taup, q_pt )¶

Purpose¶

Arguments¶

Further Details¶

subroutine ortho_gen_q_bd ( mat, tauq )¶

Purpose¶

Arguments¶

Further Details¶

subroutine ortho_gen_p_bd ( mat, taup, p )¶

Purpose¶

Arguments¶

Further Details¶

subroutine apply_q_bd ( mat, tauq, c, left, trans )¶

Purpose¶

Arguments¶

Further Details¶

subroutine apply_p_bd ( mat, taup, c, left, trans )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_svd ( upper, d, e, failure, u, v, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_svd2 ( upper, d, e, failure, u, vt, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_svd ( upper, d, e, failure, u, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_svd ( upper, d, e, failure, sort, maxiter )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_singval ( d, e, nsing, s, failure, sort, vector, abstol, ls, theta, scaling, init )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_singval2 ( d, e, nsing, s, failure, sort, vector, abstol, ls, theta, scaling, init )¶

Purpose¶

Arguments¶

Further Details¶

subroutine bd_max_singval ( d, e, nsing, s, failure, abstol, scaling )¶

Purpose¶

Arguments¶

`subroutine bd_cmp ( mat, d, e, tauq, taup )`¶

`subroutine bd_cmp ( mat, d, e, tauq )`¶

`subroutine bd_cmp ( mat, d, e, tauq, taup, rlmat, tauo )`¶

`subroutine bd_cmp ( mat, d, e )`¶

`subroutine bd_cmp2 ( mat, d, e, p, failure, gen_p, reortho )`¶

`subroutine bd_cmp2 ( mat, d, e, failure, reortho )`¶

`subroutine bd_cmp3 ( mat, d, e, gen_p, failure )`¶

`subroutine ortho_gen_bd ( mat, tauq, taup, p )`¶

`subroutine ortho_gen_bd2 ( mat, tauq, taup, q_pt )`¶

`subroutine ortho_gen_q_bd ( mat, tauq )`¶

`subroutine ortho_gen_p_bd ( mat, taup, p )`¶

`subroutine apply_q_bd ( mat, tauq, c, left, trans )`¶

`subroutine apply_p_bd ( mat, taup, c, left, trans )`¶

`subroutine bd_svd ( upper, d, e, failure, u, v, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

`subroutine bd_svd2 ( upper, d, e, failure, u, vt, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

`subroutine bd_svd ( upper, d, e, failure, u, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds )`¶

`subroutine bd_svd ( upper, d, e, failure, sort, maxiter )`¶

`subroutine bd_singval ( d, e, nsing, s, failure, sort, vector, abstol, ls, theta, scaling, init )`¶

`subroutine bd_singval2 ( d, e, nsing, s, failure, sort, vector, abstol, ls, theta, scaling, init )`¶

`subroutine bd_max_singval ( d, e, nsing, s, failure, abstol, scaling )`¶

`subroutine bd_lasq1 ( d, e, failure, maxiter, sort, scaling, ieee, aggdef2, max_win, freq, info )`¶

`subroutine bd_dqds ( d, e, failure, maxiter, sort, scaling )`¶

`subroutine las2 ( f, g, h, ssmin, ssmax )`¶

`function singvalues ( mat, sort, mul_size, maxiter, dqds )`¶

`subroutine select_singval_cmp ( mat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauq, taup, scaling, init, dqds )`¶

`subroutine select_singval_cmp ( mat, rlmat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauo, tauq, taup, scaling, init, dqds )`¶

`subroutine select_singval_cmp2 ( mat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauq, taup, scaling, init, dqds )`¶

`subroutine select_singval_cmp2 ( mat, rlmat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauo, tauq, taup, scaling, init, dqds )`¶

`subroutine select_singval_cmp3 ( mat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, p, gen_p, reortho, scaling, init, dqds, failure_bd )`¶

`subroutine select_singval_cmp3 ( mat, rmat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauo, p, gen_p, reortho, scaling, init, dqds, failure_bd )`¶

`subroutine select_singval_cmp4 ( mat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, p, gen_p, reortho, scaling, init, dqds, failure_bd )`¶

`subroutine select_singval_cmp4 ( mat, rmat, nsing, s, failure, sort, mul_size, vector, abstol, ls, theta, d, e, tauo, p, gen_p, reortho, scaling, init, dqds, failure_bd )`¶

`subroutine svd_cmp ( mat, s, failure, v, sort, mul_size, maxiter, max_francis_steps, perfect_shift, bisect, dqds, use_svd2 )`¶

`subroutine svd_cmp2 ( mat, s, failure, u_vt, sort, mul_size, maxiter, max_francis_steps, perfect_shift, bisect, dqds, use_svd2 )`¶

`subroutine svd_cmp ( mat, s, failure, sort, mul_size, maxiter, bisect, dqds, d, e, tauq, taup )`¶

`subroutine svd_cmp3 ( mat, s, failure, u_v, sort, maxiter, max_francis_steps, perfect_shift, bisect, dqds, reortho, failure_bd )`¶