comp_project_eof_3d

Authors

Pascal Terray (LOCEAN/IPSL) and Eric Maisonnave (CERFACS)

Latest revision

29/05/2024

Purpose

Project a tridimensional NetCDF variable (or parts of it) extracted from a NetCDF dataset onto eigenvectors or singular vectors computed from a previous Empirical Orthogonal Function (EOF) or Singular Value Decomposition (SVD) analysis.

Using as input an EOF (or SVD) NetCDF file produced by comp_eof_3d, comp_eof_miss_3d or comp_svd_3d, this procedure computes the projection of a given tridimensional variable extracted from another NetCDF dataset onto the orthonormal basis formed by the eigenvectors or singular vectors of the EOF (or SVD) analysis.

The procedure first transforms the selected time steps of the input tridimensional NetCDF variable as a ntime by nv rectangular matrix, X, of observed variables (e.g., the selected cells of the 2-D grid-mesh associated with the tridimensional NetCDF variable), does the same repacking transformation for the selected eigenvectors or singular vectors (which must have been computed on exactly the same selected cells of the 2-D grid-mesh associated with the input tridimensional NetCDF variable) and then computes the projections of the selected time steps onto the selected eigenvectors (or singular vectors) by performing the following matrix product:

A = X.transpose(B)

where

  • A is the ntime by k matrix of the k selected principal components (or singular variables) time series to be computed
  • B is the k by nv matrix of the k eigenvectors or singular vectors (stored rowwise) readed from an input NetCDF file produced by comp_eof_3d, comp_eof_miss_3d or comp_svd_3d.

If the NetCDF variable is fourdimensional use comp_project_eof_4d instead of comp_project_eof_3d.

An output NetCDF dataset containing the expansion coefficients for the selected time steps of the Principal Component (PC) or Singular Variable (SV) time series is created.

This procedure is parallelized if OpenMP is used.

Further Details

Usage

$ comp_project_eof_3d \
  -f=input_netcdf_file \
  -v=netcdf_variable \
  -fe=eof_netcdf_file \
  -ve=eof_netcdf_variable \
  -m=input_mesh_mask_netcdf_file \
  -se=selected_eofs                 (optional) \
  -g=grid_type                      (optional : n, t, u, v, w, f) \
  -r=resolution                     (optional : r2, r4) \
  -b=nlon_orca, nlat_orca           (optional) \
  -x=lon1,lon2                      (optional) \
  -y=lat1,lat2                      (optional) \
  -t=time1,time2                    (optional) \
  -l=selected_time_period           (optional) \
  -a=type_of_analysis               (optional : scp, cov, cor) \
  -c=input_climatology_netcdf_file  (optional) \
  -d=distance                       (optional : dist2, ident) \
  -o=output_pc_netcdf_file          (optional) \
  -normpc                           (optional) \
  -svd                              (optional) \
  -double                           (optional) \
  -bigfile                          (optional) \
  -hdf5                             (optional) \
  -tlimited                         (optional)

By default

-se=
all the eigenvectors or singular vectors stored in the eof_netcdf_file
-g=
the grid_type is set to n which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian
-r=
if the input netcdf_variable is from the NEMO model (e.g., if -g= argument is not set to n) the resolution is assumed to be r2
-b=
if -g= is not set to n, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, are determined from the -r= argument. However, you may override this default choice with the -b= argument
-x=
the whole longitude domain associated with the netcdf_variable
-y=
the whole latitude domain associated with the netcdf_variable
-t=
the whole time period associated with the netcdf_variable
-l=
all the time steps from time1 to time2 as specified in the -t= argument
-a=
the type_of_analysis is set to scp. This means that the projection onto the eigenvectors is done on the raw data without centering or standardizing the input time series
-c=
an input_climatology_netcdf_file is not needed if the type_of_analysis is set to scp
-d=
the distance is set to dist2. This means that the scalar products for computing the projections are computed with the diagonal metric associated with the 2-D grid-mesh associated with the input NetCDF variable
-o=
the output_pc_netcdf_file is named proj_netcdf_variable.nc
-normpc
the computed PC (or SV) time series are not normalized in the output NetCDF file. If -normpc is activated, the computed PC (or SV) time series are normalized in the output NetCDF file
-svd
the eof_netcdf_file is assumed to be produced by comp_eof_3d or comp_eof_miss_3d. If -svd is activated, a file produced by comp_svd_3d is assumed, this means that the projection is done onto singular vectors of a previous SVD analysis
-double
the results of the projection analysis are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable for which a projection analysis must be computed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.

  2. The -ve=eof_netcdf_variable argument specifies the NetCDF variable for which an EOF (or SVD) analysis was originally computed by comp_eof_3d (or comp_svd_3d) and the -fe=input_eof_netcdf_file argument specifies that the resulting EOF (or SVD) patterns must be extracted from the NetCDF file, eof_netcdf_file. This NetCDF file must have exactly the same format as the files produced by comp_eof_3d or comp_svd_3d. These EOF (or SVD) patterns will be used to compute the projections of the netcdf_variable specified by the -v= argument.

  3. The -m=input_mesh_mask_netcdf_file argument specifies the land-sea mask to apply to the netcdf_variable for transforming this tridimensional NetCDF variable as a rectangular matrix before computing the EOF (or SVD) projection. The same land-sea mask is assumed and apply to the eof_netcdf_variable which contains the eigenvectors of the EOF (or SVD) analysis.

    The scale factors associated with the 2-D grid-mesh of these NetCDF variables (needed if -d=dist2 is specified when calling the procedure) are also read from the input_mesh_mask_netcdf_file.

  4. The -se=selected_eofs argument allows the user to select the eigenvectors (or singular vectors) which must be included in the projection analysis. The list of selected vectors may be given in two formats:

    • -se=1,3,…,nn allows to include eof1,eof3,… and eofnn in the EOF (or SVD) projection
    • -se=1:4 allows to include from eof1 to eof4 in the EOF (or SVD) projection.

    The two forms of the -se= argument may be combined and repeated any number of times. Duplicate EOF or SVD numbers are not allowed. If the -se= argument is not specified, all the eigenvectors (or singular vectors) stored in the input_eof_netcdf_file are used in the projection analysis.

  5. If -g= is set to t, u, v, w or f it is assumed that the NetCDF variables are from an experiment with the NEMO model (ORCA configuration and R2, R4 or R05 resolutions). In this case, the duplicate points from the ORCA grid are removed before the EOF (or SVD) projection, as far as possible, and, in particular, if the 2-D grid-mesh of the input NetCDF variables covers the whole globe.

    If -g= is set to n, it is assumed that the 2-D grid-mesh is regular or Gaussian and as such has no duplicate points.

    The -g= argument is also used to determine the name of the NetCDF variables which contain the 2-D mesh-mask and the scale factors in the input_mesh_mask_netcdf_file (e.g., these variables are named grid_typemask, e1grid_type and e2grid_type, respectively). This input_mesh_mask_netcdf_file may be created by comp_clim_3d if the 2-D grid-mesh is regular or gaussian.

  6. If -g= is set to t, u, v, w or f (e.g., if the input NetCDF variables are from an experiment with the NEMO model), the -r= argument gives the resolution used. If:

    • -r=r2 the NetCDF variables are from an experiment with the ORCA R2 configuration
    • -r=r4 the NetCDF variables are from an experiment with the ORCA R4 configuration.
  7. If the NetCDF variables are from an experiment with the NEMO model, but the resolution is not R2 or R4, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.

  8. If the -x=lon1,lon2 and -y=lat1,lat2 arguments are not specified, the geographical domain used in the EOF projection is determined from the attributes of the input mesh mask NetCDF variable named grid_typemask (e.g., lon1_Eastern_limit, lon2_Western_limit, lat1_Southern_limit and lat2_Northern_limit), which is read from the input NetCDF file input_mesh_mask_netcdf_file. If these attributes are missing, the whole geographical domain associated with the netcdf_variable is used in the EOF projection. These arguments must be set normally to the same values as used in the original EOF analysis, whose results are stored in the input NetCDF file, eof_netcdf_file.

    The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_3d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_project_eof_3d.

  9. If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used in the projection analysis.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the output NetCDF file will have ntime = time2 - time1 + 1 observations if the -l= argument is missing.

  10. The -l= argument lists the indices of the time steps which must be included in the output file. The indices of the time steps are counted from the start of the (selected) time period (e.g., time1 in the -t=time1,time2 argument or 1 if this argument is not used). The list may be specified in two formats:

    • -l=n1,n2,…nn allows to select for n1, n2, … and nn time steps
    • -l=n1:n2 allows to select time steps from n1 to n2.

    The two forms of the -l= argument may be combined and repeated any number of times, but duplicate time steps are not allowed. Be careful with time period limits when specifying the -l= argument.

  11. The -a= argument specifies if the observed variables have to be centered or standardized with an input climatology (specified with the -c= argument) before the projection analysis:

    • -a=scp means that the projection analysis must be done on the raw data
    • -a=cov means that the projection analysis must be done on the anomalies
    • -a=cor means that the projection analysis must be done on the standardized anomalies.
  12. The input_climatology_netcdf_file is needed only if -a=cov or -a=cor.

  13. If -a=cov or -a=cor, the selected time period must agree with the climatology. This means that the first selected time observation (time1 if the -t= argument is present) must correspond to the first day, month, season of the climatology specified with the -c= argument.

  14. The geographical shapes of the netcdf_variable (in the input_netcdf_file), the eof_netcdf_variable (in the eof_netcdf_file), the climatology (in the input_climatology_netcdf_file), the land-sea mask and the scale factors (in the input_mesh_mask_netcdf_file) must agree.

  15. The -d= argument specifies the metric and scalar product used in the EOF or (SVD) projection:

    • -d=dist2 means that the projection is done with the diagonal distance associated with the horizontal 2-D grid-mesh (e.g., each grid point is weighted accordingly to the surface associated with it)
    • -d=ident means that the projection is done with the identity metric : the usual Euclidean distance and scalar product are used in the EOF (or SVD) projection.
  16. The -normpc argument specifies that the computed PC (or SV) time series must be normalized with the reciprocal of the singular value of the associated EOF pattern stored in the eof_netcdf_file (or with the reciprocal of the previously computed standard-deviations of the SV time series stored in the eof_netcdf_file if the -svd argument is used).

  17. The -svd argument specifies that the eof_netcdf_file is produced by comp_svd_3d instead of comp_eof_3d. This means that the projection will be done onto the singular vectors of a previous SVD analysis computed by comp_svd_3d.

  18. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file.

    By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  19. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g., -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher.

    If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  20. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g., -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher.

    If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  21. It is assumed that the data has no missing values excepted those associated with a constant land-sea mask.

  22. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  23. For more details on EOF or SVD analysis in the climate literature, see

    • “A manual for EOF and SVD analyses of climate data”, by Bjornsson, H., and Venegas, S.A., McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp, 1997. https://www.jsg.utexas.edu/fu/files/EOFSVD.pdf
    • “Statistical Analysis in Climate Research”, by von Storch, H., and Zwiers, F.W., Cambridge University press, Cambridge, UK, Chapter 13, 484 pp., 2002. ISBN: 9780521012300

Outputs

comp_project_eof_3d creates an output NetCDF file that contains the new principal component (or singular variables) time series computed from the eigenvectors (or singular vectors) of a previous EOF (or SVD) analysis. The number of principal components stored in the output NetCDF dataset is determined by the -se=selected_eofs argument. The number of observations in the output NetCDF dataset is determined from the -t=time1,time2 and -l=selected_time_period arguments. The output NetCDF dataset contains the following NetCDF variable :

  1. netcdf_variable_pc(ntime,number_of_eofs) : the new principal component time series corresponding to the projection onto the selected eigenvectors or singular vectors.

    The new principal component time series are standardized with the standard-deviations estimated from the previous EOF (or SVD) analysis if the -normpc argument is specified.

Examples

  1. For computing a 10-EOF projection of a NetCDF variable named sst in the NetCDF file ersst_2m_197902_200501_sst_oi.nc on the 10 first EOF patterns of a previous EOF analysis stored in the file named eof_HadISST1_2m_197902_200501_sst_oi.nc and store the results in a NetCDF file named ersst_2m_197902_200501_sst_oi_10pc.nc, use the following command :

    $ comp_project_eof_3d \
      -f=ersst_2m_197902_200501_sst_oi.nc  \
      -v=sst \
      -fe=eof_HadISST1_2m_197902_200501_sst_oi.nc  \
      -ve=sst \
      -se=1:10 \
      -m=ersst_mask.nc \
      -o=ersst_2m_197902_200501_sst_oi_10pc.nc
    
Flag Counter