comp_svd_3d¶
Authors¶
Pascal Terray (LOCEAN/IPSL) and Eric Maisonnave (CERFACS)
Latest revision¶
29/05/2024
Purpose¶
Compute a Singular Values Decomposition (SVD) analysis, also known as Maximum Covariance Analysis (MCA), of a covariance (or correlation) matrix between (selected) time series associated with two input NetCDF variables (specified with the -v= and -v2= arguments) extracted from one or two NetCDF datasets (specified with the -f= and -f2= arguments).
SVD analysis or MCA can be considered as a generalization of Empirical Orthogonal Function (EOF) analysis [Bjornsson_Venegas] [Bretherton_etal] [vonStorch_Zwiers]. It aims at estimating the covariance matrix between two fields and at computing the SVD of this covariance matrix for defining pairs of spatial patterns, which describe (maximize) a fraction of the total Square Covariance (SCF) between the two fields.
The procedure first repacks the first (or left) input tridimensional NetCDF variable (specified with the -v= argument) and the second (or right) input tri or fourdimensional NetCDF variable (specified with the -v2= argument) as a ntime by nv1 rectangular matrix, X, and a ntime by nv2 rectangular matrix, Y respectively. The procedure then computes the covariance (or correlation or sum of squares and cross-products, at the user option) matrix between X and Y , e.g., the rectangular matrix product (transpose(X).Y)/ntime after, eventually centering or standardizing the columns of the X and Y matrices. In the following discussion, the X and Y matrices will be called the left and right fields, respectively.
Optionally, the covariance (or correlation or sum of squares and cross-products) matrix between the left and right fields can be estimated using (diagonal) metrics for the left and right fields such that the covariance matrix is weighted by the surface (or volume) associated with each cell in the grid associated with the input NetCDF variables. This implies that equal areas (or volumes) carry equal weights in the results of the MCA analysis (see the -d= and -d2= arguments description for more details).
The second step of the SVD analysis is to compute the leading k terms of the SVD decomposition of this covariance matrix between the left and right fields, which is given by
(transpose(X).Y)/ntime = USV
where
- U is a nv1 by k matrix with orthonormal columns (the left singular vectors stored columnwise)
- S is a square k by k matrix with nonnegative elements on its principal diagonal and zeros elsewhere (the diagonal elements of S are the singular values of the covariance matrix)
- V is a k by nv2 matrix with orthonormal rows (the right singular vectors stored rowwise)
This partial SVD decomposition of the covariance matrix between the left and right fields can be computed efficiently by inverse iteration or deflation algorithms without computing the full SVD decomposition of the covariance matrix. At the user option, very fast randomized partial SVD algorithms [Halko_etal] [Li_etal] [Martinsson] can also be used, which will be highly efficient on huge datasets. See description of the -alg= argument for more details.
In a third step, the first k standardized left and right “Singular Variables” (SV) time series are computed by projecting the left and right fields onto the first k left and right singular vectors, respectively. These SV time series play a similar role as principal component time series in an Empirical Orthogonal Function (EOF) analysis. Refer to comp_eof_3d or comp_eof_4d for more details on EOF analysis.
Finally, from the k left and right SV time series, two types of regression maps are generated for each field: the kth homogeneous vector, which is the regression map between a given data field and its kth standardized SV time series, and the kth heterogeneous vector, which is the regression map between a given data field and the kth standardized SV time series of the other field. The kth heterogeneous vector indicates how well the time series of one field can be predicted from the kth SV time series of the other field.
Simple statistics associated with each singular triplet (e.g., a singular value and the associated left and right singular vectors) of the (partial) SVD of the covariance matrix are also computed. These include the Square Covariance Fraction (SCF) coefficient, which is a simple measure of the relative importance of each singular triplet in the linear relationship between the two fields, the correlation coefficients between the kth left and right SV time series of the two fields and the Normalized root-mean-square Covariance (NC) coefficient. See [Bretherton_etal] [Zhang_etal] for a discussion of the relative merits of these coefficients for determining how strongly related the coupled patterns described by the singular triplets are.
At the user option, confidence levels for the SCF, NC and correlation coefficients can be estimated by a moving block bootstrap algorithm in which these statistics are recomputed many times after replacing the right field by a permuted field constructed by resampling randomly blocks of observations from the original right field. These bootstrap confidence levels are estimated only if any of the optional bootstrap arguments (e.g., -nb=, -bl=, -bp=, -ba=and -cb=) are specified in the call to the procedure.
Furthermore, the -nb=, -bl=, -bp= , -ba=and -cb= arguments allow the user to
determine and adapt the exact form of the blockwise bootstrap algorithm depending on the analyzed dataset.
This moving block bootstrap algorithm is formally similar to the one described in comp_cor_3d or
comp_cor_4d for testing the significance of a correlation coefficient when -a=bootstrap
.
Refer to comp_cor_3d for further details on this moving block bootstrap algorithm.
This approach is useful if you want to assess that the analyzed covariance matrix is above the “noise”.
Two output NetCDF datasets containing the singular values, the left and right singular vectors, the corresponding left and right standardized SV time series and, the homogeneous and heterogeneous vectors for each field are created. The left and right singular vectors, and the homogeneous and heterogeneous vectors for each field, are repacked onto the original grids of the two input NetCDF variables in the output NetCDF datasets. In addition, if confidence levels for the SCF, NC and correlation coefficients are estimated, these probabilities are also included in the output NetCDF datasets.
This procedure is parallelized if OpenMP is used and will be also much faster if an optimized BLAS library is specified at compilation with the _BLAS CPP macro. Moreover, this procedure may use (randomized) partial SVD algorithms [Martinsson], which are highly efficient on huge covariance matrices if you are interested only in the few leading terms of the SVD of the covariance matrix between the X and Y fields.
Further Details¶
Usage¶
$ comp_svd_3d \
-f=input_netcdf_file \
-v=netcdf_variable \
-m=input_mesh_mask_netcdf_file \
-a=type_of_analysis (optional : scp, cov, cor) \
-n=number_of_sing_triplets (optional) \
-g=grid_type (optional : n, t, u, v, w, f) \
-r=resolution (optional : r2, r4) \
-b=nlon_orca, nlat_orca (optional) \
-x=lon1,lon2 (optional) \
-y=lat1,lat2 (optional) \
-t=time1,time2 (optional) \
-c=input_climatology_netcdf_file (optional) \
-d=type_of_distance (optional : dist2, ident) \
-o=output_svd_netcdf_file_left_field (optional) \
-f2=input_netcdf_file_right_field (optional) \
-v2=netcdf_variable_right_field (optional) \
-m2=input_mesh_mask_netcdf_file_right_field (optional) \
-g2=grid_type_right_field (optional : n, t, u, v, w, f) \
-r2=resolution_right_field (optional : r2, r4) \
-b2=nlon_orca, nlat_orca (optional) \
-x2=lon1_right_field,lon2_right_field (optional) \
-y2=lat1_right_field,lat2_right_field (optional) \
-z2=level1_right_field,level2_right_field (optional) \
-t2=time1_right_field,time2_right_field (optional) \
-c2=input_climatology_netcdf_file_right_field (optional) \
-d2=type_of_distance_right_field (optional : dist2, dist3, ident) \
-o2=output_svd_netcdf_file_right_field (optional) \
-alg=algorithm (optional : svd, inviter, deflate, rsvd) \
-cb=bootstrap_statistic_significativity_type (optional : values, vector) \
-nb=number_of_shuffles (optional) \
-ba=bootstrap_algorithm (optional : svd, inviter, deflate, rsvd) \
-bp=bootstrap_periodicity (optional) \
-bl=bootstrap_block_length (optional) \
-mi=missing_value (optional) \
-ortho (optional) \
-double (optional) \
-bigfile (optional) \
-hdf5 (optional) \
-tlimited (optional)
By default¶
- -a=
- the type_of_analysis is set to
scp
. This means that the singular vectors and singular values are computed from the sums of squares and cross-products matrix between the left and right fields- -n=
- number_of_sing_triplets is set to
4
. This means that the first4
singular triplets of the covariance matrix between the left and right fields are computed and stored in the output NetCDF files output_svd_netcdf_file_left_field and output_svd_netcdf_file_right_field- -g=
- the grid_type is set to
n
which means that the 2-D grid-mesh associated with the left field extracted from the input NetCDF variable input_netcdf_file is assumed to be regular or Gaussian- -r=
- if the input NetCDF variable netcdf_variable is from the NEMO model (e.g., if -g= argument is not set to
n
) the resolution is assumed to ber2
- -b=
- if -g= is not set to
n
, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, of the left input NetCDF variable netcdf_variable are determined from the -r= argument. However, you may override this choice by default with the -b= argument- -x=
- the whole longitude domain associated with the netcdf_variable
- -y=
- the whole latitude domain associated with the netcdf_variable
- -t=
- the whole time period associated with the netcdf_variable
- -c=
- this argument is not used. This argument is required only if the type_of_analysis is set to
cov
orcor
and is used to specify the climatology NetCDF file for computing anomalies or standardized anomalies for the left field- -d=
- the type_of_distance is set to
dist2
. This means that distances and scalar products for the left field in the SVD analysis are computed with the diagonal metric associated with the 2-D grid-mesh associated with the input NetCDF variable netcdf_variable- -o=
- the output_svd_netcdf_file_left_field is named
svd_left_
netcdf_variable.nc
- -f2=
- this argument is not used. It is required only if the right field is not stored in the same file as the left NetCDF variable
- -v2=
- this argument is not used. It is required only if the right field is not extracted from the same input NetCDF variable than the left field
- -m2=
- this argument will take the same value as the -m= argument. It is required only if the right field is not extracted from the same input NetCDF variable as the left field
- -g2=
- same as the -g= argument if the -v2= argument is omitted and -g2=
n
otherwise- -r2=
- same as the -r= argument if -v2= is omitted and -r2=
r2
otherwise- -b2=
- if -g2= is not set to
n
, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, of the right field extracted from the input NetCDF variable netcdf_variable_right_field are determined from the -r2= argument. However, you may override this choice by default with the -b2= argument- -x2=
- the whole longitude domain associated with the netcdf_variable_right_field
- -y2=
- the whole latitude domain associated with the netcdf_variable_right_field
- -z2=
- the whole level associated with the netcdf_variable_right_field
- -t2=
- the whole time period associated with the netcdf_variable_right_field
- -c2=
- this argument is not used. This argument is required only if the type_of_analysis is set to
cov
orcor
and is used to specify the climatology NetCDF file for computing anomalies or standardized anomalies for the right field if this field is not extracted from the same input NetCDF variable as the left field- -d2=
- same as -d= if -v2= is omitted, -d2=
dist2
if the netcdf_variable_right_field is a 3D variable and -d2=dist3
if the netcdf_variable_right_field is a 4D variable- -o2=
- the output_svd_netcdf_file_right_field is named
svd_right_
netcdf_variable_right_field.nc
- -alg=
- the algorithm option is set to
inviter
. This means that the SVD analysis is computed by a partial SVD of the covariance matrix using an inverse iteration algorithm- -cb=
- bootstrap_statistic_significativity_type is set to
values
. This means that only the SCF and NC coefficients are tested by the moving block bootstrap algorithm. This saves computing time because this requires only the computation of singular values in the moving block bootstrap algorithm- -nb=
- number_of_shuffles is set to
99
. This means that99
bootstrap samples are generated in the moving block bootstrap algorithm for testing the significance of the singular triplets- -ba=
- the bootstrap_algorithm option is set to
rsvd
. This means that the (partial) SVDs of the bootstrap covariance matrices are computed by a randomized SVD algorithm during the bootstrap phase of the procedure for efficiency reasons- -bp=
- the time series are assumed to be stationary and bootstrap_periodicity is set to
1
in the moving block bootstrap procedure for testing the significance of the singular triplets. This means that the blocks in the bootstrap algorithm are not forced to begin at specific observations. Use this parameter if the time series are cyclostationary, see the remarks below for further details- -bl=
- bootstrap_block_length is set to bootstrap_periodicity.
2
- -mi=
- the missing_value attribute in the output NetCDF files is set to
1.e+20
- -ortho
- the computed singular vectors and associated SV time series are not reorthogonalized if a (partial) SVD is computed by the deflation or inverse iteration methods, e.g., if -alg=
inviter
or -alg=deflate
- -double
- the results of the SVD analysis are stored as single-precision floating point numbers in the output NetCDF files. If -double is activated, the results are stored as double-precision floating point numbers
- -bigfile
- NetCDF classical format files are created. If -bigfile is activated, the output NetCDF files are 64-bit offset format files
- -hdf5
- NetCDF classical format files are created. If -hdf5 is activated, the output NetCDF files are NetCDF-4/HDF5 format files
- -tlimited
- the time dimension is defined as unlimited in the output NetCDF files. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF files
Remarks¶
The -v=netcdf_variable argument specifies the NetCDF variable from which the left field for the SVD analysis must be extracted and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.
The -v2=netcdf_variable_right_field argument specifies the NetCDF variable from which the right field of the SVD analysis must be extracted and the -f2=input_netcdf_file_right_field argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file_right_field.
If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing, the geographical domain used for defining the left field in the SVD analysis is determined from the attributes of the input mesh mask NetCDF variable named grid_typemask (e.g., lon1_Eastern_limit, lon2_Western_limit, lat1_Southern_limit and lat2_Northern_limit ) which is read from the input NetCDF file input_mesh_mask_netcdf_file. If these attributes are missing and the -x= and -y= arguments are also not specified, the whole geographical domain associated with the netcdf_variable is used for defining the left field in the SVD analysis.
The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to
1
. Negative values are allowed for lon1. In this case the longitude domain is fromnlon
+lon1+1
to lon2 wherenlon
is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.This remark applies also for the -x2=, -y2= and -z2= arguments used for defining the right field in the SVD analysis.
Refer to comp_mask_3d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_svd_3d.
If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used to estimate the covariance matrix between the left and right fields.
The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
. Note that the SV time series in the output NetCDF files will haventime
= time2 - time1 +1
time observations.This remark applies also for -t2= argument used to define the time dimension of the right field.
If -g= is set to
t
,u
,v
,w
orf
it is assumed that the NetCDF variable netcdf_variable is from an experiment with the NEMO model (ORCA configuration and R2, R4 or R05 resolutions). In this case, the duplicate points from the ORCA grid are removed when extracting the left field of the SVD analysis, as far as possible, and, in particular, if the 2-D grid-mesh of the input NetCDF variable covers the whole globe. On output, the duplicate points are restored when writing the SVD results (e.g., the singular, homogeneous and heterogeneous vectors), if the geographical domain of the input NetCDF variable netcdf_variable is the whole globe.If -g= is set to
n
, it is assumed that the 2-D grid-mesh associated with netcdf_variable is regular or Gaussian and as such has no duplicate points.The -g= argument is also used to determine the name of the NetCDF variables which contain the 2-D mesh-mask and the scale factors in the input_mesh_mask_netcdf_file (e.g., these variables are named grid_typemask, e1grid_type and e2grid_type, respectively). This input_mesh_mask_netcdf_file may be created by comp_clim_3d if the 2-D grid-mesh is regular or gaussian.
This remark applies also for -g2= argument used to define the grid type of the right field.
If -g= is set to
t
,u
,v
,w
orf
(e.g., if the NetCDF variable is from an experiment with the NEMO model), the -r= argument gives the resolution used. If:
- -r=
r2
the NetCDF variable is from an experiment with the ORCA R2 configuration- -r=
r4
the NetCDF variable is from an experiment with the ORCA R4 configuration.This remark applies also for -g2= argument used to define the grid type of the right field.
If the NetCDF variable netcdf_variable is from an experiment with the NEMO model, but the resolution is not
r2
orr4
, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.This remark applies also for -b2= argument used to define the grid type of the right field.
The -a= argument specifies if the left and right fields are centered or standardized with an input climatology (specified with the -c= and -c2= arguments) before computing the covariance matrix between the two fields. If:
- -a=
scp
, the SVD analysis is done on the raw data of the two fields- -a=
cov
, the SVD analysis is done on the anomalies of the two fields- -a=
cor
, the SVD analysis is done on the standardized anomalies of the two fieldsBy default, the -a= argument is set to
scp
.The input_climatology_netcdf_file and input_climatology_netcdf_file_right_field specified, respectively, with the -c= and -c2= arguments are needed only if -a=
cov
or -a=cor
.If -a=
cov
or -a=cor
, the selected time periods for the left and right fields specified, respectively, with the -t= and -t2= arguments, must agree with the periods used to estimate the climatologies.This means that the first selected time observation (time1 if the -t= argument is present) must correspond to the first day, month, season of the climatology specified with the -c= argument for the left field. This remark also applies for the right field and the -t2= and -c2= arguments.
The geographical shapes of the netcdf_variable (in the input_netcdf_file), the mask (in the input_mesh_mask_netcdf_file), the scale factors (in the input_mesh_mask_netcdf_file), and the climatology (in the input_climatology_netcdf_file) must agree.
Similarly, for the right field, the geographical shapes of the netcdf_variable_right_field (in the input_netcdf_file_right_field), the mask (in the input_mesh_mask_netcdf_file_right_field), the scale factors (in the input_mesh_mask_netcdf_file_right_field), and the climatology (in the input_climatology_netcdf_file_right_field) must agree.
The -n=number_of_sing_triplets argument specifies the number of singular triplets of the SVD of the covariance matrix between the left and right fields, which must be stored (and also computed if -alg=
inviter
, -alg=deflate
or -alg=rsvd
is specified) in the output NetCDF files given in the -o= and -o2= arguments. The default value is4
.The -d= argument specifies the metric and scalar product used for the left field in the SVD analysis. If:
- -d=
dist2
, the SVD analysis is done with the diagonal distance associated with the horizontal 2-D grid-mesh of the left field (e.g., each grid point is weighted accordingly to the surface associated with it)- -d=
ident
, the SVD analysis is done with the identity metric : the Euclidean distance and the usual scalar product is used in the SVD analysis.By default, the -d= argument is set to
dist2
.This remark applies also for the -d2= argument used to define the distance and scalar product of the right field in the SVD analysis. If the second NetCDF variable netcdf_variable_right_field, used to define the right field in the SVD analysis, has
4
dimensions, the following value is also allowed for the -d2= argument:
- -d=
dist3
meaning that the SVD analysis is done with the diagonal distance associated with the whole 3D grid of the right field (e.g., each grid point is weighted accordingly to the volume or weight associated with it).By default, the -d2= argument is set to the same value as the -d= argument if -v2= is omitted, and, otherwise, -d2=
dist2
if the netcdf_variable_right_field is a 3D variable and -d2=dist3
if the netcdf_variable_right_field is a 4D variable.The -alg= argument determines how the singular values and vectors of the covariance matrix between the left and right fields are computed. If:
- -alg=
svd
, a full SVD of the rectangular covariance matrix is computed.- -alg=
inviter
, a partial SVD of the rectangular covariance matrix is computed by inverse iteration.- -alg=
deflate
, a partial SVD of the rectangular covariance matrix is computed by a deflation technique.- -alg=
rsvd
, a partial SVD of the rectangular covariance matrix is computed by a very fast randomized algorithm.All algorithms are parallelized if OpenMP is used. The default is -alg=
inviter
since computing a partial SVD is generally much faster than computing a full SVD. But, -alg=deflate
is generally as fast as -alg=inviter
. -alg=rsvd
is generally much faster than all other options, but the computed singular vectors and SV variables may be less accurate depending on the shape of the distribution of the singular values of the covariance matrix.If the -ortho argument is used, the computed singular vectors and associated SV time series are always reorthogonalized if a (partial) SVD is computed by the deflation or inverse iteration methods. By default, they are only partially orthogonalized if the computed singular values are not well separated. Note that this argument has no effect if -alg=
svd
or -alg=rsvd
.If any of the -nb=, -bl=, -bp= and -cb= arguments is specified, a moving block bootstrap algorithm is used to test the significativity of the SCF, NC and, eventually, correlation coefficients associated with each singular triplet of the SVD analysis.
The -cb=bootstrap_statistic_significativity_type argument specifies which statistics are bootstrapped. If -cb=
values
(this is the default), the Square Covariance Fraction (SCF) and the Normalized root-mean-square Covariance (NC) coefficients are tested with the moving block bootstrap algorithm. If -cb=vector
, the correlations between the Singular Variable time series of the left and right fields are also tested in addition of the SCF and NC coefficients. This may require much more computer time since singular vectors of the bootstrap versions of the covariance matrix are needed to estimate the Singular Variable time series. By default, bootstrap tests of these coefficients are not performed. These bootstrap tests are computed only if any of the optional bootstrap arguments (e.g., -nb=, -bl=, -bp=, -ba=and -cb=) are specified in the call to the procedure.The -nb=number_of_shuffles argument specifies the number of shuffles for the bootstrap tests of the SCF, NC and correlation coefficients (by default
99
).The -ba= argument determines how the singular values and vectors of the bootstrap covariance matrices are computed int the bootstrap phase of the program. If:
- -ba=
svd
, a full SVD of the bootstrap covariance matrix is computed.- -ba=
inviter
, a partial SVD of the bootstrap covariance matrix is computed by inverse iteration.- -ba=
deflate
, a partial SVD of the bootstrap covariance matrix is computed by a deflation technique.- -ba=
rsvd
, a partial SVD of the bootstrap covariance matrix is computed by a randomized algorithm.All algorithms are parallelized if OpenMP is used. The default is -ba=
rsvd
for effciency reasons, as a randomized SVD algorithm is generally much faster than all other options, but the computed singular vectors and SV variables may be less accurate depending on the shape of the distribution of the singular values of the bootstrap covariance matrix.The -bp=bootstrap_periodicity argument specifies that the index,
i
, of the first observation of each selected block in the moving block bootstrap algorithm verifies the conditioni
=1
+ bootstrap_periodicity.j
wherej
is a random positive integer. bootstrap_periodicity must be greater than zero and less than the length of the time series. This parameter is useful if the time series are cyclostationary instead of stationary. By default, bootstrap_periodicity is set to1
.The -bl=bootstrap_block_length argument specifies the size of the blocks in the moving block bootstrap algorithm. bootstrap_block_length must be greater or equal to the bootstrap_periodicity and less than the length of the time series. If -a=
cov
or -a=cor
is specified, it is highly recommended to set bootstrap_block_length as a multiple of the periodicity in the data as this will take properly into account the cyclostationary of the analyzed time series.By default, bootstrap_block_length is set to bootstrap_periodicity.
2
. If you perform a MCA on huge datasets and you require bootstrap testing of the MCA results, the elapsed time will be largely reduced if you use a large bootstrap_block_length, especially if the NCSTAT software has been built with the _BLAS macro (e.g., if BLAS support has been activated). In such cases, bootstrap_block_length values between24
and122
(depending on the computer) speed significantly the bootstrap computations.The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g.,
-D_USE_NETCDF36
or-D_USE_NETCDF4
) and linked to the NetCDF 3.6 library or higher.If this argument is specified, the output NetCDF files will be 64-bit offset format files instead of NetCDF classic format files. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.
The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g.,
-D_USE_NETCDF4
) and linked to the NetCDF 4 library or higher.If this argument is specified, the output NetCDF files will be NetCDF-4/HDF5 format files instead of NetCDF classic format files. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.
The -mi=missing_value argument specifies the missing value indicator associated with the NetCDF variables in the output_netcdf_file and output_netcdf_file_right_field.
If the -mi= argument is not specified missing_value is set to
1.e+20
.The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF files.
By default, the results are stored as single-precision floating point numbers in the output NetCDF files.
It is assumed that the data has no missing values.
Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.
For more details on SVD (e.g., MCA) analysis in the climate literature and randomized SVD algorithms, see
- “A manual for EOF and SVD analyses of climate data”, by Bjornsson, H., and Venegas, S.A., McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp, 1997. https://www.jsg.utexas.edu/fu/files/EOFSVD.pdf
- “An intercomparison of methods for finding coupled patterns in climate data”, by Bretherton, Smith, C., and Wallace, J.M., Journal of Climate, Vol. 5, 541-560 pp, 1992. doi: 10.1175/1520-0442(1992)005<0541:AIOMFF>2.0.CO;2
- “Seasonality of large scale atmosphere-ocean interaction over the North Pacific”, by Zhang, Y., Norris, J.R., and Wallace, J.M., Journal of Climate, Vol. 11, 2473-2481 pp, 1998. doi: 10.1175/1520-0442(1998)011<2473:SOLSAO>2.0.CO;2
- “Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions”, By Halko, N., Martinsson, P.G., and Tropp, J.A., SIAM Rev., 53, 217-288, 2011. https://epubs.siam.org/doi/abs/10.1137/090771806
- “Algorithm 971: An implementation of a randomized algorithm for principal component analysis”, by Li, H.,Linderman, G.C., Szlam, A., Stanton, K.P., Kluger, Y., and Tygert, M., ACM Trans. Math. Softw. 43, 3, Article 28 (January 2017). https://pubmed.ncbi.nlm.nih.gov/28983138
- “Randomized methods for matrix computations”, by Martinsson, P.G., arXiv.1607.01649, 2019. https://arxiv.org/abs/1607.01649
Outputs¶
comp_svd_3d creates two output NetCDF files. The first output file (specified in the -o=output_svd_netcdf_file_left_field argument) contains all the SVD statistics associated with the left field (specified in the -v=netcdf_variable argument) and the second output file (specified in the -o2=output_svd_netcdf_file_right_field argument) the SVD statistics associated with the right field (specified in the -v2=netcdf_variable_right_field argument).
The output_svd_netcdf_file_left_field contains the singular values and left singular vectors of the covariance matrix, the left homogeneous and heterogeneous vectors, and the left SV time series of the SVD analysis. The number of SV time series, regression vectors, singular vectors and singular values stored in the output NetCDF dataset is determined by the -n=number_of_sing_triplets argument. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat and nlon are the lengths of the spatial dimensions of the first input NetCDF variable netcdf_variable) :
netcdf_variable_svd
(number_of_sing_triplets,nlat,nlon)
: the number_of_sing_triplets leading left singular vectors of the sums of squares and cross-products (-a=scp
), covariance (-a=cov
) or correlation (-a=cor
) matrix between the left and right fields. The singular vectors are sorted by descending order of the associated singular values.The left singular vectors are packed in a tridimensional variable whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variable is filled with missing values. If this is a problem, you can use comp_norm_3d for restricting the geographical domain in the input dataset before using comp_svd_3d.
netcdf_variable_hom
(number_of_sing_triplets,nlat,nlon)
: the selected left homogeneous vectors of the sums of squares and cross-products (-a=scp
), covariance (-a=cov
) or correlation (-a=cor
) matrix. The left homogeneous vectors are sorted by descending order of the associated singular values. These vectors are scaled such that they give the scalar products (-a=scp
), covariances (-a=cov
) or correlations (-a=cor
) between the original observed time series in the left field and the left SV time series.The left homogeneous vectors are packed in a tridimensional variable whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variable is filled with missing values.
netcdf_variable_het
(number_of_sing_triplets,nlat,nlon)
: the selected left heterogeneous vectors of the sums of squares and cross-products (-a=scp
), covariance (-a=cov
) or correlation (-a=cor
) matrix. The left heterogeneous vectors are sorted by descending order of the associated singular values. These vectors are scaled such that they give the scalar products (-a=scp
), covariances (-a=cov
) or correlations (-a=cor
) between the original observed time series in the left field and the right SV time series stored in the other output NetCDF file.The left heterogeneous vectors are packed in a tridimensional variable whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variable is filled with missing values.
netcdf_variable_sv
(ntime,number_of_sing_triplets)
: the left SV time series sorted by descending order of the singular values.The SV time series are always standardized to unit variance.
netcdf_variable_SV_STDs
(number_of_sing_triplets)
: the standard-deviations of the left SV time series sorted by descending order of the singular values.netcdf_variable_SV_EXPVARs
(number_of_sing_triplets)
: the proportion of variance of the left field explained by the left SV time series sorted by descending order of the singular values.netcdf_variable_CORSV
(number_of_sing_triplets)
: the symmetric correlation matrix between the left SV time series, only the upper triangle of this symmetric matrix is written in the output file.netcdf_variable_RAW_VAR
(1)
: the raw variance (or inertia if -a=scp
) of the left field averaged over the selected domain.Sing_vals
(number_of_sing_triplets)
: the singular values of the sums of squares and cross-products (if -a=scp
) or covariance (if -a=cov
) or correlation (if -a=cor
) matrix between the left and right fields sorted in decreasing order.SCFs
(number_of_sing_triplets)
: the Squared Covariance Fractions (SCF) described by each of the leading singular triplets of the squares and cross-products (if -a=scp
) or covariance (if -a=cov
) or correlation (if -a=cor
) matrix between the left and right fields.NCs
(number_of_sing_triplets)
: the Normalized root-mean-square Covariance (NC) coefficients associated with each of the leading singular triplets of the squares and cross-products (if -a=scp
) or covariance (if -a=cov
) or correlation (if -a=cor
) matrix between the left and right fields.Corrs
(number_of_sing_triplets)
: the correlation coefficients between the corresponding left and right SV time series.SCF_stat_sign
(number_of_sing_triplets)
: the critical probabilities associated with the Squared Covariance Fractions (SCF) coefficients associated with each of the leading singular triplets estimated by a moving block bootstrap procedure. Large values indicate significant SCF coefficients. This NetCDF variable is computed and stored only if one of the -nb=, -bl=, -bp= or -cb= arguments is specified when calling the procedure.NC_stat_sign
(number_of_sing_triplets)
: the critical probabilities associated with the Normalized root-mean-square Covariance (NC) coefficients associated with each of the leading singular triplets estimated by a moving block bootstrap procedure. Large values indicate significant NC coefficients. This NetCDF variable is computed and stored only if one of the -nb=, -bl=, -bp= or -cb= arguments is specified when calling the procedure.Corr_stat_sign
(number_of_sing_triplets)
: the critical probabilities associated with the correlation coefficients between the corresponding left and right SV time series estimated by a moving block bootstrap procedure. Large values indicate significant correlation coefficients between the corresponding left and right SV time series. This NetCDF variable is computed and stored only if -cb=vector
is specified when calling the procedure.The output_svd_netcdf_file_right_field contains the singular values and right singular vectors of the covariance matrix, the right homogeneous and heterogeneous vectors, and the right SV time series of the SVD analysis. The number of SV time series, regression vectors and singular vectors and singular values stored in this output NetCDF dataset is also determined by the -n=number_of_sing_triplets argument. This output NetCDF dataset contains exactly the same NetCDF variables than the first NetCDF output file, but for the statistics of the right field instead of the left field. Refer to the description above for the content and definition of the NetCDF variables in the file output_svd_netcdf_file_right_field.
Examples¶
For computing an SVD analysis from two NetCDF variables
sst
andslp
stored, respectively, in the NetCDF filesHadISST1_1m_1979_2005_sst.nc
andhadslp_1m_1979_2005_slp.nc
use the following command (note that the analysis is done on the anomalies after removing the annual cycle for each variable since -a=cov
is specified) :$ comp_svd_3d \ -a=cov \ -n=5 \ -f=HadISST1_1m_1979_2005_sst.nc \ -v=sst \ -c=clim_HadISST1_1m_1979_2005_sst.nc \ -x=111,330 \ -y=101,140 \ -m=mask_HadISST1_sst.nc \ -o=svd_HadISST1_1m_1979_2005_sst_oiatl.nc \ -f2=hadslp_1m_1979_2005_slp.nc \ -v2=slp \ -c=clim_hadslp_1m_1979_2005_slp.nc \ -x2=-14,31 \ -y2=21,33 \ -m=mask_hadslp_slp.nc \ -o2=svd_hadslp_1m_1979_2005_slp_oiatl.nc