comp_svd_3d¶

Authors¶

Pascal Terray (LOCEAN/IPSL) and Eric Maisonnave (CERFACS)

Latest revision¶

05/05/2026

Purpose¶

Compute a Singular Values Decomposition (SVD) analysis, also known as Maximum Covariance Analysis (MCA), of a covariance (or correlation) matrix between (selected) time series associated with two input NetCDF variables (specified with the -v= and -v2= arguments) extracted from one or two NetCDF datasets (specified with the -f= and -f2= arguments).

SVD analysis or MCA can be considered as a generalization of Empirical Orthogonal Function (EOF) analysis [Bjornsson_Venegas] [Bretherton_etal] [vonStorch_Zwiers]. It aims at estimating the covariance matrix between two fields and at computing the SVD of this covariance matrix for defining pairs of spatial patterns, which describe (maximize) a fraction of the total Square Covariance (SCF) between the two fields.

The procedure first repacks the first (or left) input tridimensional NetCDF variable (specified with the -v= argument) and the second (or right) input tri- or fourdimensional NetCDF variable (specified with the -v2= argument) as a ntime by nv1 rectangular matrix, X, and a ntime by nv2 rectangular matrix, Y respectively. The procedure then computes the covariance (or correlation or sum of squares and cross-products, at the user option) matrix between X and Y , e.g., the rectangular matrix product (transpose(X).Y)/ntime after, eventually centering or standardizing the columns of the X and Y matrices.

In the following discussion, the X and Y matrices will be called the left and right fields, respectively.

Optionally, the covariance (or correlation or sum of squares and cross-products) matrix between the left and right fields can be estimated using (diagonal) metrics for the left and right fields such that the covariance matrix is weighted by the surface (or volume) associated with each cell in the grid associated with the input NetCDF variables. This implies that equal areas (or volumes) carry equal weights in the results of the MCA analysis (see the -d= and -d2= arguments description for more details).

The second step of the SVD analysis is to compute the leading k terms of the SVD decomposition of this covariance matrix between the left and right fields, which is given by

(transpose(X).Y)/ntime = USV

where

U is a nv1 by k matrix with orthonormal columns (the left singular vectors stored columnwise)

S is a square k by k matrix with nonnegative elements on its principal diagonal and zeros elsewhere (the diagonal elements of S are the singular values of the covariance matrix)

V is a k by nv2 matrix with orthonormal rows (the right singular vectors stored rowwise)

This partial SVD decomposition of the covariance matrix between the left and right fields can be computed efficiently by inverse iteration or deflation algorithms without computing the full SVD decomposition of the covariance matrix. At the user option, very fast randomized partial SVD algorithms [Halko_etal] [Li_etal] [Martinsson] can also be used, which will be highly efficient on huge datasets. See description of the -alg= argument for more details on the numerical algorithms available for performing the (partial) SVD analysis.

In a third step, the first k standardized left and right “Singular Variables” (SV) time series are computed by projecting the left and right fields onto the first k left and right singular vectors, respectively. These SV time series play a similar role as principal component time series in an Empirical Orthogonal Function (EOF) analysis. Refer to comp_eof_3d or comp_eof_4d for more details on EOF analysis.

Finally, from the k left and right SV time series, two types of regression maps are generated for each field: the kth homogeneous vector, which is the regression map between a given data field and its kth standardized SV time series, and the kth heterogeneous vector, which is the regression map between a given data field and the kth standardized SV time series of the other field. The kth heterogeneous vector indicates how well the time series of one field can be predicted from the kth SV time series of the other field.

Simple statistics associated with each singular triplet (e.g., a singular value and the associated left and right singular vectors) of the (partial) SVD of the covariance matrix are also computed. These include the Square Covariance Fraction (SCF) coefficient, which is a simple measure of the relative importance of each singular triplet in the linear relationship between the two fields, the correlation coefficients between the kth left and right SV time series of the two fields and the Normalized root-mean-square Covariance (NC) coefficient. See [Bretherton_etal] [Zhang_etal] for a discussion of the relative merits of these coefficients for determining how strongly related the coupled patterns described by the singular triplets are.

At the user option, a moving block bootstrap approach [Efron] [Politis] can also be used to obtain confidence intervals for the SCF, correlation and NC coefficients associated to the singular triplets of the SVD analysis and to test the stability of the computed leading singular triplets. More precisely, statistics (e.g., confidence intervals, means, standard-deviations, skewness, …) of SCF, correlation and NC coefficients, cosines of the angles between the pairwise observed and bootstrapped singular vectors, as well as spectral and (scaled) Frobenius distances between the pairwise observed and bootstrapped leading left and right singular subspaces are computed and can be used to select the number of singular triplets for further analysis. Furthermore, the -nb=, -bl=, -bp= and -bm= and -nobias arguments allow the user to determine the exact form of the blockwise bootstrap algorithm which is used. One of the -nb=, -bl=, -bp= and -bm=arguments must be present if you want to test the stability of the computed singular triplets of the SVD analysis by this bootstrap approach and store its results in the output NetCDF datasets.

The notions of distance between two singular subspaces of the same dimension used in the moving block bootstrap approach are based on the unique orthogonal projectors onto the ranges of these two singular subspaces and the distances between the two square matrices representing these two orthogonal projectors as measured in some matrix norms [Golub_VanLoan]. Here, we use both the spectral (e.g., the 2-norm) and Frobenius matrix norms to quantify the distances between two singular subspaces of the same dimension [Golub_VanLoan].

The spectral distance, S_dist, of two subspaces of the same dimension is simply equal to the sine of the largest principal angle between these two subspaces [Golub_VanLoan] and is always greater or equal to 0 and less or equal to 1. It is further possible to show that the spectral distance between two subspaces is equal to 0 if and only if the two subspaces are the same. On the other hand, if the spectral distance between two subspaces is equal to 1 than it exists at least one vector belonging to the first subspace, which is orthogonal to all vectors belonging to the second subspace and vice-versa.

On the other hand, it can be shown that the Frobenius distance, F_dist, between two subspaces of dimension n is equal to

F_dist = sqrt( 2. sum( [ sin( PA(:n) ) ]**2 ) )

where the elements of the n-vector PA(:) are the n principal angles between the two subspaces of dimension n. The Frobenius distance is equal to 0 if the two subspaces are the same and is always less than sqrt( 2 . n) and this maximum is attained only when the two subspaces are orthogonal to each other. Since this upper bound is a function of the dimension n of the two subspaces, it is further convenient to define the “scaled” Frobenius distance of two subspaces of dimension n as the ratio between the Frobenius distance and its upper bound. In this way, the “scaled” Frobenius distance is always greater or equal to 0 and less or equal to 1, as the spectral distance, and can also be used to compare “scaled” Frobenius distances between pairs of singular subspaces of different dimensions in order to determine the most stable left and right singular subspaces with the help of the moving block bootstrap method.

Note, finally, that comp_svd2_3d performs exactly the same task as comp_svd_3d, but differs in the form of the moving block bootstrap approach used to assess the significance of the singular triplets of the SVD analysis. In comp_svd2_3d, confidence levels for the SCF, NC and correlation coefficients are estimated by a moving block bootstrap algorithm in which these statistics are recomputed many times after replacing the right field by a permuted field constructed by resampling randomly blocks of observations from the original right field. This approach destroys completely the covarying signals between the two fields, excepted the one resulting from the rank deficiency of the covariance matrix (e.g., if the number of time observations is less than the number of grid cells in the left and right fields). This approach is useful if you want to assess that the analyzed covariance matrix is above the “noise”. However, if you already know that some covarying signals are present in the analyzed covariance matrix, the bootstrap approach used in comp_svd_3d is more appropriate and meaningful as this approach allows to assess the sampling distributions and stability of the SVD results without destroying the covarying signals in the bootstrapped left and right fields.

Two output NetCDF datasets containing the singular values, the left and right singular vectors, the corresponding left and right standardized SV time series and, the homogeneous and heterogeneous vectors for each field are created. The left and right singular vectors, and the homogeneous and heterogeneous vectors for each field, are repacked onto the original grids of the two input NetCDF variables in the output NetCDF datasets. In addition, if the bootstrap approach is used, the two output NetCDF datasets will also contain confidence intervals for the SCF, NC and correlation coefficients as well as statistics for SCF, NC and correlation coefficients, cosines of the angles between the pairwise observed and bootstrapped singular vectors and distances between the pairwise observed and bootstrapped leading singular subspaces.

This procedure is parallelized if OpenMP is used and will be also much faster if an optimized BLAS library is specified at compilation with the _BLAS CPP macro. Moreover, this procedure may use (randomized) partial SVD algorithms [Martinsson], which are highly efficient on huge covariance matrices if you are interested only in the few leading terms of the SVD of the covariance matrix between the X and Y fields.

Further Details¶

Usage¶

$ comp_svd_3d \
  -f=input_netcdf_file \
  -v=netcdf_variable \
  -m=input_mesh_mask_netcdf_file \
  -a=type_of_analysis                             (optional : scp, cov, cor) \
  -n=number_of_sing_triplets                      (optional) \
  -g=grid_type                                    (optional : n, t, u, v, w, f) \
  -r=resolution                                   (optional : r2, r4) \
  -b=nlon_orca, nlat_orca                         (optional) \
  -x=lon1,lon2                                    (optional) \
  -y=lat1,lat2                                    (optional) \
  -t=time1,time2                                  (optional) \
  -c=input_climatology_netcdf_file                (optional) \
  -d=distance                                     (optional : dist2, ident) \
  -o=output_svd_netcdf_file_left_field            (optional) \
  -f2=input_netcdf_file_right_field               (optional) \
  -v2=netcdf_variable_right_field                 (optional) \
  -m2=input_mesh_mask_netcdf_file_right_field     (optional) \
  -g2=grid_type_right_field                       (optional : n, t, u, v, w, f) \
  -r2=resolution_right_field                      (optional : r2, r4) \
  -b2=nlon_orca, nlat_orca                        (optional) \
  -x2=lon1_right_field,lon2_right_field           (optional) \
  -y2=lat1_right_field,lat2_right_field           (optional) \
  -z2=level1_right_field,level2_right_field       (optional) \
  -t2=time1_right_field,time2_right_field         (optional) \
  -c2=input_climatology_netcdf_file_right_field   (optional) \
  -d2=distance_right_field                        (optional : dist2, dist3, ident) \
  -o2=output_svd_netcdf_file_right_field          (optional) \
  -alg=algorithm                                  (optional : svd, inviter, deflate, rsvd) \
  -bm=bootstrap_method                            (optional : normal, percentile) \
  -nb=number_of_shuffles                          (optional) \
  -bp=bootstrap_periodicity                       (optional) \
  -bl=bootstrap_block_length                      (optional) \
  -pro=critical_probability                       (optional : 0.0 > 1.) \
  -mi=missing_value                               (optional) \
  -ortho                                          (optional) \
  -nobias                                         (optional) \
  -double                                         (optional) \
  -bigfile                                        (optional) \
  -hdf5                                           (optional) \
  -cdf5                                           (optional) \
  -tlimited                                       (optional)

By default¶

-a=

the type_of_analysis is set to scp. This means that the singular vectors and singular values are computed from the sums of squares and cross-products matrix between the left and right fields

-n=

number_of_sing_triplets is set to 4. This means that the first 4 singular triplets of the covariance matrix between the left and right fields are computed and stored in the output NetCDF files output_svd_netcdf_file_left_field and output_svd_netcdf_file_right_field

-g=

the grid_type is set to n which means that the 2-D grid-mesh associated with the left field extracted from the input NetCDF variable input_netcdf_file is assumed to be regular or Gaussian

-r=

if the input NetCDF variable netcdf_variable is from the NEMO model (e.g., if -g= argument is not set to n) the resolution is assumed to be r2

-b=

if -g= is not set to n, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, of the left input NetCDF variable netcdf_variable are determined from the -r= argument. However, you may override this choice by default with the -b= argument

-x=

the whole longitude domain associated with the netcdf_variable

-y=

the whole latitude domain associated with the netcdf_variable

-t=

the whole time period associated with the netcdf_variable

-c=

this argument is not used. This argument is required only if the type_of_analysis is set to cov or cor and is used to specify the climatology NetCDF file for computing anomalies or standardized anomalies for the left field

-d=

the distance is set to dist2. This means that distances and scalar products for the left field in the SVD analysis are computed with the diagonal metric associated with the 2-D grid-mesh associated with the input NetCDF variable netcdf_variable

-o=

the output_svd_netcdf_file_left_field is named svd_left_netcdf_variable.nc

-f2=

this argument is not used. It is required only if the right field is not stored in the same file as the left NetCDF variable

-v2=

this argument is not used. It is required only if the right field is not extracted from the same input NetCDF variable than the left field

-m2=

this argument will take the same value as the -m= argument. It is required only if the right field is not extracted from the same input NetCDF variable as the left field

-g2=

same as the -g= argument if the -v2= argument is omitted and -g2=n otherwise

-r2=

same as the -r= argument if -v2= is omitted and -r2=r2 otherwise

-b2=

if -g2= is not set to n, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, of the right field extracted from the input NetCDF variable netcdf_variable_right_field are determined from the -r2= argument. However, you may override this choice by default with the -b2= argument

-x2=

the whole longitude domain associated with the netcdf_variable_right_field

-y2=

the whole latitude domain associated with the netcdf_variable_right_field

-z2=

the whole level associated with the netcdf_variable_right_field

-t2=

the whole time period associated with the netcdf_variable_right_field

-c2=

this argument is not used. This argument is required only if the type_of_analysis is set to cov or cor and is used to specify the climatology NetCDF file for computing anomalies or standardized anomalies for the right field if this field is not extracted from the same input NetCDF variable as the left field

-d2=

same as -d= if -v2= is omitted, -d2=dist2 if the netcdf_variable_right_field is a 3D variable and -d2=dist3 if the netcdf_variable_right_field is a 4D variable

-o2=

the output_svd_netcdf_file_right_field is named svd_right_netcdf_variable_right_field.nc

-alg=

the algorithm option is set to inviter. This means that the SVD analysis is computed by a partial SVD of the covariance matrix using an inverse iteration algorithm

-bm=

bootstrap_method is set to normal. This means that bootstrap confidence intervals of SCF, NC and correlation coefficients are based on asymptotic normality

-nb=

number_of_shuffles is set to 99. This means that 99 bootstrap samples are generated in the moving block bootstrap algorithm for testing the significance and stability of the singular triplets

-bp=

the time series are assumed to be stationary and bootstrap_periodicity is set to 1 in the moving block bootstrap procedure for testing the significance and stability of the singular triplets. This means that the blocks in the bootstrap algorithm are not forced to begin at specific observations. Use this parameter if the time series are cyclostationary, see the remarks below for further details

-bl=

bootstrap_block_length is set to bootstrap_periodicity.2

-pro=

the critical_probability is set to 0.05, which means that the bootstrap confidence intervals for the SCF, NC and correlation coefficients are 95% confidence intervals

-mi=

the missing_value attribute in the output NetCDF files is set to 1.e+20

-ortho

the computed singular vectors and associated SV time series are not reorthogonalized if a (partial) SVD is computed by the deflation or inverse iteration methods, e.g., if -alg=inviter or -alg=deflate

-nobias

biased bootstrap confidence intervals for SCF, NC and correlation coefficients are computed. However, if -nobias is activated, unbiased bootstrap confidence intervals for SCF, NC and correlation coefficients are computed

-double

the results of the SVD analysis are stored as single-precision floating point numbers in the output NetCDF files. If -double is activated, the results are stored as double-precision floating point numbers

-bigfile

NetCDF classical format files are created. If -bigfile is activated, the output NetCDF files are 64-bit offset format files

-hdf5

NetCDF classical format files are created. If -hdf5 is activated, the output NetCDF files are NetCDF-4/HDF5 format files

-cdf5

NetCDF classical format files are created. If -cdf5 is activated, the output NetCDF files are CDF5 format files

-tlimited

the time dimension is defined as unlimited in the output NetCDF files. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF files

Remarks¶

The -v=netcdf_variable argument specifies the NetCDF variable from which the left field for the SVD analysis must be extracted and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.

The -v2=netcdf_variable_right_field argument specifies the NetCDF variable from which the right field of the SVD analysis must be extracted and the -f2=input_netcdf_file_right_field argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file_right_field.

If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing, the geographical domain used for defining the left field in the SVD analysis is determined from the attributes of the input mesh mask NetCDF variable named grid_typemask (e.g., lon1_Eastern_limit, lon2_Western_limit, lat1_Southern_limit and lat2_Northern_limit ) which is read from the input NetCDF file input_mesh_mask_netcdf_file. If these attributes are missing and the -x= and -y= arguments are also not specified, the whole geographical domain associated with the netcdf_variable is used for defining the left field in the SVD analysis.

The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

This remark applies also for the -x2=, -y2= and -z2= arguments used for defining the right field in the SVD analysis.

Refer to comp_mask_3d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_svd_3d.

If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used to estimate the covariance matrix between the left and right fields.

The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the SV time series in the output NetCDF files will have ntime = time2 - time1 + 1 time observations.

This remark applies also for -t2= argument used to define the time dimension of the right field.

If -g= is set to t, u, v, w or f it is assumed that the NetCDF variable netcdf_variable is from an experiment with the NEMO model (ORCA configuration and R2, R4 or R05 resolutions). In this case, the duplicate points from the ORCA grid are removed when extracting the left field of the SVD analysis, as far as possible, and, in particular, if the 2-D grid-mesh of the input NetCDF variable covers the whole globe. On output, the duplicate points are restored when writing the SVD results (e.g., the singular, homogeneous and heterogeneous vectors), if the geographical domain of the input NetCDF variable netcdf_variable is the whole globe.

If -g= is set to n, it is assumed that the 2-D grid-mesh associated with netcdf_variable is regular or Gaussian and as such has no duplicate points.

The -g= argument is also used to determine the name of the NetCDF variables which contain the 2-D mesh-mask and the scale factors in the input_mesh_mask_netcdf_file (e.g., these variables are named grid_typemask, e1grid_type and e2grid_type, respectively). This input_mesh_mask_netcdf_file may be created by comp_clim_3d if the 2-D grid-mesh is regular or gaussian.

This remark applies also for -g2= argument used to define the grid type of the right field.

If -g= is set to t, u, v, w or f (e.g., if the NetCDF variable is from an experiment with the NEMO model), the -r= argument gives the resolution used. If:

-r=r2 the NetCDF variable is from an experiment with the ORCA R2 configuration

-r=r4 the NetCDF variable is from an experiment with the ORCA R4 configuration.

This remark applies also for -g2= argument used to define the grid type of the right field.

If the NetCDF variable netcdf_variable is from an experiment with the NEMO ocean model, but the resolution is not r2 or r4, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.

This remark applies also for -b2= argument used to define the grid type of the right field.

The -a= argument specifies if the left and right fields are centered or standardized with an input climatology (specified with the -c= and -c2= arguments) before computing the covariance matrix between the two fields. If:

-a=scp, the SVD analysis is done on the raw data of the two fields

-a=cov, the SVD analysis is done on the anomalies of the two fields

-a=cor, the SVD analysis is done on the standardized anomalies of the two fields

By default, the -a= argument is set to scp.

The input_climatology_netcdf_file and input_climatology_netcdf_file_right_field specified, respectively, with the -c= and -c2= arguments are needed only if -a=cov or -a=cor.

If -a=cov or -a=cor, the selected time periods for the left and right fields specified, respectively, with the -t= and -t2= arguments, must agree with the periods used to estimate the climatologies.

This means that the first selected time observation (time1 if the -t= argument is present) must correspond to the first day, month, season of the climatology specified with the -c= argument for the left field. This remark also applies for the right field and the -t2= and -c2= arguments.

The geographical shapes of the netcdf_variable (in the input_netcdf_file), the mask (in the input_mesh_mask_netcdf_file), the scale factors (in the input_mesh_mask_netcdf_file), and the climatology (in the input_climatology_netcdf_file) must agree.

Similarly, for the right field, the geographical shapes of the netcdf_variable_right_field (in the input_netcdf_file_right_field), the mask (in the input_mesh_mask_netcdf_file_right_field), the scale factors (in the input_mesh_mask_netcdf_file_right_field), and the climatology (in the input_climatology_netcdf_file_right_field) must agree.

The -n=number_of_sing_triplets argument specifies the number of singular triplets of the SVD of the covariance matrix between the left and right fields, which must be stored (and also computed if -alg=inviter , -alg=deflate or -alg=rsvd is specified) in the output NetCDF files given in the -o= and -o2= arguments. The default value is 4.

The -d=distance argument specifies the metric and scalar product used for the left field in the SVD analysis. If:

-d=dist2, the SVD analysis is done with the diagonal distance associated with the horizontal 2-D grid-mesh of the left field (e.g., each grid point is weighted accordingly to the surface associated with it)

-d=ident, the SVD analysis is done with the identity metric : the Euclidean distance and the usual scalar product is used in the SVD analysis.

By default, the -d= argument is set to dist2.

This remark applies also for the -d2= argument used to define the distance and scalar product of the right field in the SVD analysis.

If the second NetCDF variable netcdf_variable_right_field, used to define the right field in the SVD analysis, has 4 dimensions, the following value is also allowed for the -d2= argument:

-d=dist3 meaning that the SVD analysis is done with the diagonal distance associated with the whole 3D grid of the right field (e.g., each grid point is weighted accordingly to the volume or weight associated with it).

By default, the -d2= argument is set to the same value as the -d= argument if -v2= is omitted, and, otherwise, -d2=dist2 if the netcdf_variable_right_field is a 3D variable and -d2=dist3 if the netcdf_variable_right_field is a 4D variable.

The -alg= argument determines how the singular values and vectors of the covariance matrix between the left and right fields are computed. If:

-alg=svd, a full SVD of the rectangular covariance matrix is computed.

-alg=inviter, a partial SVD of the rectangular covariance matrix is computed by inverse iteration.

-alg=deflate, a partial SVD of the rectangular covariance matrix is computed by a deflation technique.

-alg=rsvd, a partial SVD of the rectangular covariance matrix is computed by a very fast randomized algorithm.

All algorithms are parallelized if OpenMP is used. The default is -alg=inviter since computing a partial SVD is generally much faster than computing a full SVD. But, -alg=deflate is generally as fast as -alg=inviter. -alg=rsvd is generally much faster than all other options, but the computed singular vectors and SV variables may be less accurate depending on the shape of the distribution of the singular values of the covariance matrix.

If the -ortho argument is used, the computed singular vectors and associated SV time series are always reorthogonalized if a (partial) SVD is computed by the deflation or inverse iteration methods. By default, they are only partially orthogonalized if the computed singular values are not well separated. Note that this argument has no effect if -alg=svd or -alg=rsvd.

If any of the -nb=, -bl=, -bp= and -bm= arguments is specified, a moving block bootstrap algorithm is used to test the significance and stability of the computed singular triplets and help to select the number of significant singular triplets for further analysis.

The -bm=bootstrap_method argument specifies how bootstrap confidence intervals for SCF, NC and correlation coefficients are computed. If:

-bm=normal, bootstrap confidence intervals are based on asymptotic normality [Efron] [Politis] . These standard intervals will be bias-corrected [Politis] if the -nobias argument is also specified.

-bm=percentile, bootstrap confidence intervals are based on the percentile or bias-corrected percentile methods [Efron] . The bias-corrected percentile method will be used if the -nobias argument is also specified.

Note that the percentile method requires a larger number of shuffles (specified with the -nb= argument), especially if the critical_probability (specified with the -pro= argument) is small. More precisely, number_of_shuffles.critical_probability.0.5 must be greater or equal to one in the case of the percentile method. If it is not the case, the procedure will exit with an error message. As an illustration, with the default critical_probability (e.g., 0.05), number_of_shuffles must be at least 40 for using the percentile method for estimating confidence intervals for SCFs, NCs and correlations between singular variables.

By default, the -bm= argument is set to normal.

The -nb=number_of_shuffles argument specifies the number of shuffles used in the bootstrap algorithm (by default 99).

The -bp=bootstrap_periodicity argument specifies that the index, i, of the first observation of each selected block in the moving block bootstrap algorithm verifies the condition i = 1 + bootstrap_periodicity.j where j is a random positive integer. bootstrap_periodicity must be greater than zero and less than the length of the time series. This parameter is useful if the time series are cyclostationary instead of stationary. By default, bootstrap_periodicity is set to 1.

The -bl=bootstrap_block_length argument specifies the size of the blocks in the moving block bootstrap algorithm. bootstrap_block_length must be greater or equal to the bootstrap_periodicity and less than the length of the time series. If -a=cov or -a=cor is specified, it is highly recommended to set bootstrap_block_length as a multiple of the periodicity in the data as this will take properly into account the cyclostationary of the analyzed time series.

By default, bootstrap_block_length is set to bootstrap_periodicity.2. If you perform a MCA on huge datasets and you require bootstrap testing of the MCA results, the elapsed time will be largely reduced if you use a large bootstrap_block_length, especially if the NCSTAT software has been built with the _BLAS macro (e.g., if BLAS support has been activated). In such cases, bootstrap_block_length values between 24 and 122 (depending on the computer) speed significantly the bootstrap computations.

The -pro=critical_probability argument specifies the critical probability which is used to determine the lower and upper bounds of the confidence intervals for SCF, correlation and NC coefficients if the bootstrap method is used. More precisely, Lower and upper (1-critical_probability)*100% bootstrap confidence bounds for SCF, correlation and NC coefficients are computed based on asymptotic normality or the percentile method depeding on the value of the -bm=bootstrap_method argument. critical_probability must be greater than 0. and less than 1.. The default value is 0.05, meaning that 95% confidence intervals are computed by default.

The -mi=missing_value argument specifies the missing value indicator associated with the NetCDF variables in the output_netcdf_file and output_netcdf_file_right_field.

If the -mi= argument is not specified missing_value is set to 1.e+20.

If -nobias is specified, unbiased confidence intervals for SCF, correlation and NC coefficients are computed.

If the -nobias argument is absent, biased standard confidence intervals are estimated by default.

The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF files.

By default, the results are stored as single-precision floating point numbers in the output NetCDF files.

The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36, _USE_NETCDF4 or _USE_NETCDF44 CPP macros (e.g., -D_USE_NETCDF36 or -D_USE_NETCDF4 or -D_USE_NETCDF44) and linked to the NetCDF 3.6 library or higher.

If this argument is specified, the output netcdf files will be 64-bit offset format files instead of NetCDF classic format files. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36, _USE_NETCDF4 or _USE_NETCDF44 CPP macros.

The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 or _USE_NETCDF44 CPP macros (e.g., -D_USE_NETCDF4 or -D_USE_NETCDF44) and linked to the NetCDF 4 library or higher.

If this argument is specified, the output netcdf files will be NetCDF-4/HDF5 format files instead of NetCDF classic format files. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 or _USE_NETCDF44 CPP macros.

The -cdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF44 CPP macro (e.g., -D_USE_NETCDF44) and linked to the NetCDF 4.4 library or higher.

If this argument is specified, the output netcdf files will be CDF5 format files instead of NetCDF classic format files. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF44 CPP macro.

It is assumed that the data has no missing values.

Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

For more details on SVD (e.g., MCA) analysis in the climate literature, the bootstrap approach, distances between subspaces and randomized SVD algorithms, see

“A manual for EOF and SVD analyses of climate data”, by Bjornsson, H., and Venegas, S.A., McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp, 1997. https://www.jsg.utexas.edu/fu/files/EOFSVD.pdf

“An intercomparison of methods for finding coupled patterns in climate data”, by Bretherton, Smith, C., and Wallace, J.M., Journal of Climate, Vol. 5, 541-560 pp, 1992. doi: 10.1175/1520-0442(1992)005<0541:AIOMFF>2.0.CO;2

“Seasonality of large scale atmosphere-ocean interaction over the North Pacific”, by Zhang, Y., Norris, J.R., and Wallace, J.M., Journal of Climate, Vol. 11, 2473-2481 pp, 1998. doi: 10.1175/1520-0442(1998)011<2473:SOLSAO>2.0.CO;2

“Matrix Computations”, by Golub, G.H., and Van Loan, C., The Johns Hopkins University Press, Baltimore, MD., 4rd Edition, 756 pp., 2013. ISBN: `978-1-4214-0794-4`_

“Statistical Analysis in Climate Research”, by von Storch, H., and Zwiers, F.W., Cambridge University press, Cambridge, UK, Chapter 13, 484 pp., 2002. ISBN: 9780521012300

“Nonparametric standard-errors and confidence intervals” By Efron, B., The Canadian Journal of Statistics, 9, 2, 139-172, (1981). https://doi.org/10.2307/3314608

“Computer-Intensive Methods in Statistical Analysis” By Politis, D.N., IEEE Signal Processing Magazine, 15, 1, 39-55, (January 1998). https://ieeexplore.ieee.org/document/647042

“Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions”, By Halko, N., Martinsson, P.G., and Tropp, J.A., SIAM Rev., 53, 217-288, 2011. https://epubs.siam.org/doi/abs/10.1137/090771806

“Algorithm 971: An implementation of a randomized algorithm for principal component analysis”, by Li, H.,Linderman, G.C., Szlam, A., Stanton, K.P., Kluger, Y., and Tygert, M., ACM Trans. Math. Softw. 43, 3, Article 28 (January 2017). https://pubmed.ncbi.nlm.nih.gov/28983138

“Randomized methods for matrix computations”, by Martinsson, P.G., arXiv.1607.01649, 2019. https://arxiv.org/abs/1607.01649

Outputs¶

comp_svd_3d creates two output NetCDF files. The first output file (specified in the -o=output_svd_netcdf_file_left_field argument) contains all the SVD statistics associated with the left field (specified in the -v=netcdf_variable argument) and the second output file (specified in the -o2=output_svd_netcdf_file_right_field argument) the SVD statistics associated with the right field (specified in the -v2=netcdf_variable_right_field argument).

The output_svd_netcdf_file_left_field contains the singular values and left singular vectors of the covariance matrix, the left homogeneous and heterogeneous vectors, and the left SV time series of the SVD analysis. The number of SV time series, regression vectors, singular vectors and singular values stored in the output NetCDF dataset is determined by the -n=number_of_sing_triplets argument. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat and nlon are the lengths of the spatial dimensions of the first input NetCDF variable netcdf_variable):

netcdf_variable_svd(number_of_sing_triplets,nlat,nlon) : the number_of_sing_triplets leading left singular vectors of the sums of squares and cross-products (-a=scp), covariance (-a=cov) or correlation (-a=cor) matrix between the left and right fields. The singular vectors are sorted by descending order of the associated singular values.

The left singular vectors are packed in a tridimensional variable whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variable is filled with missing values. If this is a problem, you can use comp_norm_3d for restricting the geographical domain in the input dataset before using comp_svd_3d.

netcdf_variable_hom(number_of_sing_triplets,nlat,nlon) : the selected left homogeneous vectors of the sums of squares and cross-products (-a=scp), covariance (-a=cov) or correlation (-a=cor) matrix. The left homogeneous vectors are sorted by descending order of the associated singular values. These vectors are scaled such that they give the scalar products (-a=scp), covariances (-a=cov) or correlations (-a=cor) between the original observed time series in the left field and the left SV time series.

The left homogeneous vectors are packed in a tridimensional variable whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variable is filled with missing values.

netcdf_variable_het(number_of_sing_triplets,nlat,nlon) : the selected left heterogeneous vectors of the sums of squares and cross-products (-a=scp), covariance (-a=cov) or correlation (-a=cor) matrix. The left heterogeneous vectors are sorted by descending order of the associated singular values. These vectors are scaled such that they give the scalar products (-a=scp), covariances (-a=cov) or correlations (-a=cor) between the original observed time series in the left field and the right SV time series stored in the other output NetCDF file.

The left heterogeneous vectors are packed in a tridimensional variable whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variable is filled with missing values.

netcdf_variable_sv(ntime,number_of_sing_triplets) : the left SV time series sorted by descending order of the singular values.

The SV time series are always standardized to unit variance.

netcdf_variable_SV_STDs(number_of_sing_triplets) : the standard-deviations of the left SV time series sorted by descending order of the singular values.

netcdf_variable_SV_EXPVARs(number_of_sing_triplets) : the proportion of variance of the left field explained by the left SV time series sorted by descending order of the singular values.

netcdf_variable_CORSV(number_of_sing_triplets) : the symmetric correlation matrix between the left SV time series, only the upper triangle of this symmetric matrix is written in the output file.

netcdf_variable_RAW_VAR(1) : the raw variance (or inertia if -a=scp) of the left field averaged over the selected domain.

Sing_vals(number_of_sing_triplets) : the singular values of the sums of squares and cross-products (if -a=scp) or covariance (if -a=cov) or correlation (if -a=cor) matrix between the left and right fields sorted in decreasing order.

SCFs(number_of_sing_triplets) : the Squared Covariance Fractions (SCF) described by each of the leading singular triplets of the squares and cross-products (if -a=scp) or covariance (if -a=cov) or correlation (if -a=cor) matrix between the left and right fields.

NCs(number_of_sing_triplets) : the Normalized root-mean-square Covariance (NC) coefficients associated with each of the leading singular triplets of the squares and cross-products (if -a=scp) or covariance (if -a=cov) or correlation (if -a=cor) matrix between the left and right fields.

Corrs(number_of_sing_triplets) : the correlation coefficients between the corresponding left and right SV time series.

Furthermore, if one of the -nb=, -bl=, -bp= or -bm= arguments is specified when calling comp_svd_3d, the following NetCDF variables are computed and also stored in the output_svd_netcdf_file_left_field NetCDF file:

SCFs_boot_lower_bound(number_of_sing_triplets) : lower (1-critical_probability)*100% bootstrap confidence bounds for SCF statistics. By default, critical_probability is set to 0.05, such that 95% confidence intervals are computed, but the width of the confidence intervals can be changed at the user option with the -pro=critical_probability) argument. Similarly, by default, biased confidence intervals are computed, but unbiased confidence intervals are estimated if the -nobias= option is used.

SCFs_boot_upper_bound(number_of_sing_triplets) : upper (1-critical_probability)*100% bootstrap confidence bounds for SCF statistics. By default, critical_probability is set to 0.05, such that 95% confidence intervals are computed, but the width of the confidence intervals can be changed at the user option with the -pro=critical_probability) argument. Similarly, by default, biased confidence intervals are computed, but unbiased confidence intervals are estimated if the -nobias= option is used.

SCFs_boot_mean(number_of_sing_triplets) : the mean of bootstrapped SCF statistics.

SCFs_boot_var(number_of_sing_triplets) : the variance of bootstrapped SCF statistics.

SCFs_boot_std(number_of_sing_triplets) : the standard-deviation of bootstrapped SCF statistics.

SCFs_boot_skew(number_of_sing_triplets) : the skewness of bootstrapped SCF statistics.

SCFs_boot_kurt(number_of_sing_triplets) : the kurtosis of bootstrapped SCF statistics.

SCFs_boot_min(number_of_sing_triplets) : the minimum of bootstrapped SCF statistics.

SCFs_boot_max(number_of_sing_triplets) : the maximum of bootstrapped SCF statistics.

NCs_boot_lower_bound(number_of_sing_triplets) : lower (1-critical_probability)*100% bootstrap confidence bounds for NC statistics. By default, critical_probability is set to 0.05, such that 95% confidence intervals are computed, but the width of the confidence intervals can be changed at the user option with the -pro=critical_probability) argument. Similarly, by default, biased confidence intervals are computed, but unbiased confidence intervals are estimated if the -nobias= option is used.

NCs_boot_upper_bound(number_of_sing_triplets) : upper (1-critical_probability)*100% bootstrap confidence bounds for NC statistics. By default, critical_probability is set to 0.05, such that 95% confidence intervals are computed, but the width of the confidence intervals can be changed at the user option with the -pro=critical_probability) argument. Similarly, by default, biased confidence intervals are computed, but unbiased confidence intervals are estimated if the -nobias= option is used.

NCs_boot_mean(number_of_sing_triplets) : the mean of bootstrapped NC statistics.

NCs_boot_var(number_of_sing_triplets) : the variance of bootstrapped NC statistics.

NCs_boot_std(number_of_sing_triplets) : the standard-deviation of bootstrapped NC statistics.

NCs_boot_skew(number_of_sing_triplets) : the skewness of bootstrapped NC statistics.

NCs_boot_kurt(number_of_sing_triplets) : the kurtosis of bootstrapped NC statistics.

NCs_boot_min(number_of_sing_triplets) : the minimum of bootstrapped NC statistics.

NCs_boot_max(number_of_sing_triplets) : the maximum of bootstrapped NC statistics.

Corrs_boot_lower_bound(number_of_sing_triplets) : lower (1-critical_probability)*100% bootstrap confidence bounds for correlation coefficients between left and right SV variables. By default, critical_probability is set to 0.05, such that 95% confidence intervals are computed, but the width of the confidence intervals can be changed at the user option with the -pro=critical_probability) argument. Similarly, by default, biased confidence intervals are computed, but unbiased confidence intervals are estimated if the -nobias= option is used.

Corrs_boot_upper_bound(number_of_sing_triplets) : upper (1-critical_probability)*100% bootstrap confidence bounds for correlation coefficients between left and right SV variables. By default, critical_probability is set to 0.05, such that 95% confidence intervals are computed, but the width of the confidence intervals can be changed at the user option with the -pro=critical_probability) argument. Similarly, by default, biased confidence intervals are computed, but unbiased confidence intervals are estimated if the -nobias= option is used.

Corrs_boot_mean(number_of_sing_triplets) : the mean of bootstrapped correlation coefficients between SV variables.

Corrs_boot_var(number_of_sing_triplets) : the variance of bootstrapped correlation coefficients between SV variables.

Corrs_boot_std(number_of_sing_triplets) : the standard-deviation of bootstrapped correlation coefficients between SV variables.

Corrs_boot_skew(number_of_sing_triplets) : the skewness of bootstrapped correlation coefficients between SV variables.

Corrs_boot_kurt(number_of_sing_triplets) : the kurtosis of bootstrapped correlation coefficients between SV variables.

Corrs_boot_min(number_of_sing_triplets) : the minimum of bootstrapped correlation coefficients between SV variables.

Corrs_boot_max(number_of_sing_triplets) : the maximum of bootstrapped correlation coefficients between SV variables.

netcdf_variable_cos_vec_boot_mean(number_of_sing_triplets) : the mean of the cosines of the angles between pairwise observed and bootstrapped left singular vectors.

netcdf_variable_cos_vec_boot_var(number_of_sing_triplets) : the variance of the cosines of the angles between pairwise observed and bootstrapped left singular vectors.

netcdf_variable_cos_vec_boot_std(number_of_sing_triplets) : the standard-deviation of the cosines of the angles between pairwise observed and bootstrapped left singular vectors.

netcdf_variable_cos_vec_boot_skew(number_of_sing_triplets) : the skewness of the cosines of the angles between pairwise observed and bootstrapped left singular vectors.

netcdf_variable_cos_vec_boot_kurt(number_of_sing_triplets) : the kurtosis of the cosines of the angles between pairwise observed and bootstrapped left singular vectors.

netcdf_variable_cos_vec_boot_min(number_of_sing_triplets) : the minimum of the cosines of the angles between pairwise observed and bootstrapped left singular vectors.

netcdf_variable_cos_vec_boot_max(number_of_sing_triplets) : the maximum of the cosines of the angles between pairwise observed and bootstrapped left singular vectors.

netcdf_variable_spect_dist_subspace_boot_mean(number_of_sing_triplets) : the mean of the the spectral distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_spect_dist_subspace_boot_var(number_of_sing_triplets) : the variance of the spectral distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_spect_dist_subspace_boot_std(number_of_sing_triplets) : the standard-deviation of the spectral distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_spect_dist_subspace_boot_skew(number_of_sing_triplets) : the skewness of the spectral distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_spect_dist_subspace_boot_kurt(number_of_sing_triplets) : the kurtosis of the spectral distances between the pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_spect_dist_subspace_boot_min(number_of_sing_triplets) : the minimum of the spectral distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_spect_dist_subspace_boot_max(number_of_sing_triplets) : the maximum of the spectral distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_frob_dist_subspace_boot_mean(number_of_sing_triplets) : the mean of the the (scaled) Frobenius distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_frob_dist_subspace_boot_var(number_of_sing_triplets) : the variance of the (scaled) Frobenius distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_frob_dist_subspace_boot_std(number_of_sing_triplets) : the standard-deviation of the (scaled) Frobenius distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_frob_dist_subspace_boot_skew(number_of_sing_triplets) : the skewness of the (scaled) Frobenius distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_frob_dist_subspace_boot_kurt(number_of_sing_triplets) : the kurtosis of the (scaled) Frobenius distances between the pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_frob_dist_subspace_boot_min(number_of_sing_triplets) : the minimum of the (scaled) Frobenius distances between pairwise observed and bootstrapped left singular subspaces.

netcdf_variable_frob_dist_subspace_boot_max(number_of_sing_triplets) : the maximum of the (scaled) Frobenius distances between pairwise observed and bootstrapped left singular subspaces.

The output_svd_netcdf_file_right_field contains the singular values and right singular vectors of the covariance matrix, the right homogeneous and heterogeneous vectors, and the right SV time series of the SVD analysis. The number of SV time series, regression vectors and singular vectors and singular values stored in this output NetCDF dataset is also determined by the -n=number_of_sing_triplets argument. This output NetCDF dataset contains exactly the same NetCDF variables than the first NetCDF output file, but for the statistics of the right field instead of the left field. Refer to the description above for the content and definition of the NetCDF variables in the file output_svd_netcdf_file_right_field.

Examples¶

For computing an SVD analysis from two NetCDF variables sst and slp stored, respectively, in the NetCDF files HadISST1_1m_1979_2005_sst.nc and hadslp_1m_1979_2005_slp.nc use the following command (note that the analysis is done on the anomalies after removing the annual cycle for each variable since -a=cov is specified) :
$ comp_svd_3d \
  -a=cov \
  -n=5 \
  -f=HadISST1_1m_1979_2005_sst.nc  \
  -v=sst \
  -c=clim_HadISST1_1m_1979_2005_sst.nc  \
  -x=111,330 \
  -y=101,140 \
  -m=mask_HadISST1_sst.nc \
  -o=svd_HadISST1_1m_1979_2005_sst_oiatl.nc \
  -f2=hadslp_1m_1979_2005_slp.nc \
  -v2=slp \
  -c2=clim_hadslp_1m_1979_2005_slp.nc \
  -x2=-14,31 \
  -y2=21,33 \
  -m=mask_hadslp_slp.nc \
  -o2=svd_hadslp_1m_1979_2005_slp_oiatl.nc

Table of Contents

Previous topic

Next topic

This Page