comp_invert_eof_3d

Authors

Eric Maisonnave (CERFACS) and Pascal Terray (LOCEAN/IPSL)

Latest revision

13/09/2018

Purpose

Approximate a tridimensional NetCDF variable (or parts of it) from its Empirical Orthogonal Function (EOF) decomposition or from results of a Singular Value Decomposition (SVD) analysis.

Using as input a NetCDF file produced by comp_eof_3d, comp_eof_miss_3d or comp_svd_3d, this procedure computes an approximation of a tridimensional NetCDF variable, packed as a ntime by nv rectangular matrix, X, of observed variables (e.g. the selected cells of the 2-D grid-mesh associated with the tridimensional NetCDF variable), of the form:

X = AB + E

where

  • A is an ntime by k matrix of k selected principal component time series
  • B is the k by nv matrix of the k associated eigenvectors (stored rowwise)
  • E is an ntime by nv matrix of residuals.

If the selected principal components are the first k, when the principal components are sorted by descending order of the eigenvalues, the matrix product AB is a least-squares solution to the problem of minimizing of the sum of all the squared elements of E. In other words, the first k principal components of X are the best linear predictors of the observed variables among all possible sets of k variables.

This type of approximation can also be computed from the results of a previous SVD analysis if the argument -svd is activated, however in this case the computed approximation is not necessarily optimal in the least square sense.

If the NetCDF variable is fourdimensional use comp_invert_eof_4d instead of comp_invert_eof_3d.

An output NetCDF dataset containing the matrix product AB repacked as a tridimensional variable is created.

Further Details

Usage

$ comp_invert_eof_3d \
  -f=input_eof_netcdf_file \
  -v=netcdf_variable \
  -se=selected_eofs                 (optional) \
  -x=lon1,lon2                      (optional) \
  -y=lat1,lat2                      (optional) \
  -t=time1,time2                    (optional) \
  -l=selected_time_period           (optional) \
  -a=type_of_analysis               (optional : scp, cov, cor) \
  -c=input_climatology_netcdf_file  (optional) \
  -o=output_netcdf_file             (optional) \
  -mi=missing_value                 (optional) \
  -svd                              (optional) \
  -double                           (optional) \
  -bigfile                          (optional) \
  -hdf5                             (optional) \
  -tlimited                         (optional)

By default

-se=
the selected principal components are those stored in the input_eof_netcdf_file; in other words, the NetCDF variable netcdf_variable is approximated with the number of EOFs (or SVDs) stored in the NetCDF file eof_input_netcdf_file
-x=
the whole longitude domain associated with the netcdf_variable
-y=
the whole latitude domain associated with the netcdf_variable
-t=
the whole time period associated with the netcdf_variable
-l=
the whole time period as determined by the -t= argument
-a=
the type_of_analysis is set to scp. This means that the eigenvectors (or singular vectors) and eigenvalues (or singular values) have been computed from the sums of squares and cross-products matrix between the observed variables if an EOF (or SVD) model is used
-c=
this argument is not used if the type_of_analysis is set to scp
-o=
the output_netcdf_file is named approx_netcdf_variable.nc
-mi=
the missing_value attribute in the output NetCDF file is set to 1.e+20
-svd
the input_eof_netcdf_file is assumed to be produced by comp_eof_3d or comp_eof_miss_3d. However, if -svd is activated, a file produced by comp_svd_3d is assumed, this means that the approximation is done from a set of singular vectors and singular variables of a previous SVD analysis
-double
the results of the EOF analysis are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable which must be approximated from its EOF decomposition or a previous SVD analysis. The EOF (or SVD) model is extracted from the NetCDF file input_eof_netcdf_file specified by the -f= argument. This NetCDF file must have exactly the same format as the files produced by comp_eof_3d or comp_svd_3d.

  2. The -se= argument allows the user to select the EOFs (or SVDs) which must be included in the approximation model. The EOFs (or SVDs) list may be given in two formats:

    • -se=1,3,...,nn allows to include eof1,eof3,… and eofnn in the EOF (or SVD) model
    • -se=1:4 allows to include from eof1 to eof4 in the EOF (or SVD) model.

    The two forms of the -se= argument may be combined and repeated any number of times. Duplicate EOF (or SVD) numbers are not allowed. If the -se= argument is not specified, the NetCDF variable is approximated with the number of EOFs (or SVDs) stored in the input_eof_netcdf_file.

  3. If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing the whole geographical domain associated with the netcdf_variable is approximated by the selected EOF (or SVD) model (as specified by the -se= argument).

    The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_3d for transforming geographical coordinates as indices before using comp_invert_eof_3d.

  4. If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is approximated by the selected EOF (or SVD) model.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the output NetCDF file will have ntime = time2 - time1 + 1 observations if the -l= argument is missing.

  5. The -l= argument lists the indices of the time steps which must be included in the output file. The indices of the time steps are counted from the start of the (selected) time period (e.g. time1 in the -t=time1,time2 argument or 1 if this argument is not used). The list may be specified in two formats:

    • -l=n1,n2,…nn allows to select for n1, n2, … and nn time steps
    • -l=n1:n2 allows to select time steps from n1 to n2.

    Be careful with time period limits, when specifying the -l= argument list.

    The two forms of the -l= argument may be combined and repeated any number of times. Duplicate time steps are not allowed.

  6. The -a= argument specifies if the observed variables have been centered or standardized with an input climatology (specified with the -c= argument) before the EOF (or SVD) analysis:

    • -a=scp means that the EOF (or SVD) analysis was done on the raw data
    • -a=cov means that the EOF (or SVD) analysis was done on the anomalies
    • -a=cor means that the EOF (or SVD) analysis was done on the standardized anomalies.

    In all cases, the raw data are approximated if the -a= argument is used.

  7. The input_climatology_netcdf_file is needed only if -a=cov or -a=cor.

  8. If -a=cov or -a=cor, the selected time period must agree with the climatology. This means that the first selected time observation (time1 if the -t= argument is present) must correspond to the first day, month, season of the climatology specified with the -c= argument.

  9. The -svd argument specifies that the input_eof_netcdf_file is produced by comp_svd_3d instead of comp_eof_3d. This means that the approximation will be done from the singular vectors and singular variables of a previous SVD analysis stored in comp_svd_3d.

  10. The geographical shapes of the netcdf_variable (in the input_netcdf_file) and the climatology (in the input_climatology_netcdf_file) must agree.

  11. The -mi=missing_value argument specifies the missing value indicator associated with the netcdf variable netcdf_variable in the NetCDF file output_netcdf_file. If the -mi= argument is not specified missing_value is set to 1.e+20.

  12. The -double argument specify that the results are stored as double-precision floating point numbers in the output NetCDF file.

    By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  13. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g. -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  14. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  15. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  16. For more details on EOF or SVD analysis in the climate literature, see

    • “A manual for EOF and SVD analyses of climate data”, by Bjornsson, H., and Venegas, S.A., McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp, 1997. https://www.jsg.utexas.edu/fu/files/EOFSVD.pdf
    • “Statistical Analysis in Climate Research”, by von Storch, H., and Zwiers, F.W., Cambridge University press, Cambridge, UK, Chapter 13, 484 pp., 2002. ISBN: 9780521012300

Outputs

comp_invert_eof_3d creates an output NetCDF file that contains a least squares approximation of the input NetCDF variable computed from selected eigenvectors and principal components of an EOF analysis or from singular vectors and variables of a SVD analysis.

This least squares approximation is repacked as a tridimensional NetCDF variable with the same dimensions as the input NetCDF variable specified in the -v= argument.

This output NetCDF dataset contains the following NetCDF variable (in the description below, nlat and nlon are the length of the dimensions of the input NetCDF variable in the initial EOF or SVD analysis; ntime is the number of time steps selected with the -l= and -t= arguments):

  1. netcdf_variable(ntime,nlat,nlon) : a least squares approximation of the input NetCDF variable computed with the help of the selected EOF (or SVD) model.

Examples

  1. For computing a 10-EOF approximation of a NetCDF variable named sst from the NetCDF file eof_HadISST1_2m_197902_200501_sst_oi.nc produced by comp_eof_3d and store the results in a NetCDF file named HadISST1_2m_197902_200501_sst_oi_10pc.nc, use the following command :

    $ comp_invert_eof_3d \
      -f=eof_HadISST1_2m_197902_200501_sst_oi.nc  \
      -v=sst \
      -a=scp \
      -n=10 \
      -o=HadISST1_2m_197902_200501_sst_oi_10pc.nc