comp_project_eof_4d¶
Authors¶
Pascal Terray (LOCEAN/IPSL) and Eric Maisonnave (CERFACS)
Latest revision¶
13/09/2018
Purpose¶
Project a fourdimensional NetCDF variable (or parts of it) extracted from a NetCDF dataset onto eigenvectors or singular vectors computed from a previous Empirical Orthogonal Function (EOF) or Singular Value Decomposition (SVD) analysis.
Using as input, an EOF (or SVD) NetCDF file produced by comp_eof_4d or comp_svd_3d, this procedure computes the projection of a given fourdimensional variable extracted from another NetCDF dataset onto the orthonormal basis formed by the eigenvectors or singular vectors of the EOF (or SVD) analysis.
The procedure first transforms the selected time steps of the input fourdimensional NetCDF variable as a ntime by nv rectangular matrix, X, of observed variables (e.g. the selected cells of the 3-D grid-mesh associated with the fourdimensional NetCDF variable), does the same repacking operation for the selected eigenvectors or singular vectors (which must have been computed on the same selected cells of the 3-D grid-mesh associated with the input fourdimensional NetCDF variable) and then computes the projections of the selected time steps onto the selected eigenvectors (or singular vectors) by performing the following matrix product
A = X.transpose(B)
where
- A is the ntime by k matrix of k selected principal components (or singular variables) time series to be computed
- B is the k by nv matrix of the k eigenvectors or singular vectors (stored rowwise) readed from an input NetCDF file produced by comp_eof_4d or comp_svd_3d.
If the NetCDF variable is tridimensional use comp_project_eof_3d instead of comp_project_eof_4d.
An output NetCDF dataset containing the expansion coefficients for the selected time steps of the principal components (or singular variables) time series is created.
This procedure is parallelized if OpenMP is used.
Further Details¶
Usage¶
$ comp_project_eof_4d \
-f=input_netcdf_file \
-v=netcdf_variable \
-fe=eof_netcdf_file \
-ve=eof_netcdf_variable \
-m=input_mesh_mask_netcdf_file \
-se=selected_eof (optional) \
-g=grid_type (optional : n, t, u, v, w, f) \
-r=resolution (optional : r2, r4) \
-b=nlon_orca, nlat_orca (optional) \
-x=lon1,lon2 (optional) \
-y=lat1,lat2 (optional) \
-z=level1,level2 (optional) \
-t=time1,time2 (optional) \
-l=selected_time_period (optional) \
-a=type_of_analysis (optional : scp, cov, cor) \
-c=input_climatology_netcdf_file (optional) \
-d=type_of_distance (optional : dist2, dist3, ident) \
-o=output_pc_netcdf_file (optional) \
-normpc (optional) \
-svd (optional) \
-double (optional) \
-bigfile (optional) \
-hdf5 (optional) \
-tlimited (optional)
By default¶
- -se=
- all the eigenvectors or singular vectors stored in the eof_netcdf_file
- -g=
- the grid_type is set to
n
which means that the 3-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian- -r=
- if the input netcdf_variable is from the NEMO or ORCA model (e.g. if -g= argument is not set to
n
) the resolution is assumed to ber2
- -b=
- if -n= is not set to
n
, the dimensions of the 3-D grid-mesh, nlon_orca, nlat_orca and nlevel_orca are determined from the -r= argument. However, you may override this choice by default with the -b= argument- -x=
- the whole longitude domain associated with the netcdf_variable
- -y=
- the whole latitude domain associated with the netcdf_variable
- -z=
- the whole vertical resolution associated with the netcdf_variable
- -t=
- the whole time period associated with the netcdf_variable
- -l=
- all the time steps from time1 to time2 as specified in the -t= argument
- -a=
- the type_of_analysis is set to
scp
. This means that the projection onto the eigenvectors is done on the raw data without centering or standardizing the input time series- -c=
- an input_climatology_netcdf_file is not needed if the type_of_analysis is set to
scp
- -d=
- the type_of_distance is set to
dist3
. This means that the scalar products for computing the projections are computed with the diagonal metric associated with the 3-D grid-mesh associated with the input NetCDF variable- -o=
- the output_pc_netcdf_file is named
proj_
netcdf_variable.nc
- -normpc
- the computed PC (or SV) time series are not normalized in the output NetCDF file. If -normpc is activated, the computed PC (or SV) time series are normalized in the output NetCDF file
- -svd
- the eof_netcdf_file is assumed to be produced by comp_eof_4d or comp_eof_miss_3d. If -svd is activated, a file produced by comp_svd_3d is assumed, this means that the projection is done onto singular vectors of a previous SVD analysis.
- -double
- the results of the projection analysis are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
- -bigfile
- a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
- -hdf5
- a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
- -tlimited
- the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file
Remarks¶
The -v=netcdf_variable argument specifies the NetCDF variable for which a projection analysis must be computed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.
The -ve=eof_netcdf_variable argument specifies the NetCDF variable for which an EOF (or SVD) analysis was originally computed by comp_eof_4d (or comp_svd_3d) and the -fe=input_eof_netcdf_file argument specifies that the resulting EOF (or SVD) patterns must be extracted from the NetCDF file, eof_netcdf_file. This NetCDF file must have exactly the same format as the files produced by comp_eof_4d or comp_svd_3d. These EOF (or SVD) patterns will be used to compute the projections of the netcdf_variable specified by the -v= argument.
The argument -m=input_mesh_mask_netcdf_file specifies the land-sea mask to apply to the netcdf_variable for transforming this fourdimensional NetCDF variable as a rectangular matrix before computing the EOF (or SVD) projection. The same land-sea mask is assumed and apply to the eof_netcdf_variable which contains the eigenvectors of the EOF (or SVD) analysis.
The scale factors associated with the 2-D grid-mesh of these NetCDF variables (needed if -d=
dist2
is specified when calling the procedure) are also read from the input_mesh_mask_netcdf_file.The -se= argument allows the user to select the eigenvectors (or singular vectors) which must be included in the projection analysis. The list of selected vectors may be given in two formats:
- -se=1,3,…,nn allows to include eof1,eof3,… and eofnn in the EOF (or SVD) projection
- -se=
1:4
allows to include from eof1 to eof4 in the EOF (or SVD) projection.The two forms of the -se= argument may be combined and repeated any number of times. Duplicate EOF or SVD numbers are not allowed. If the -se= argument is not specified, all the eigenvectors (or singular vectors) stored in the input_eof_netcdf_file are used in the projection analysis.
If -g= is set to
t
,u
,v
,w
orf
it is assumed that the NetCDF variables are from an experiment with the NEMO or ORCA model. In this case, the duplicate points from the ORCA grid are removed before the EOF (or SVD) projection, as far as possible, and, in particular, if the 3-D grid-mesh of the input NetCDF variables covers the whole globe.If -g= is set to
n
, it is assumed that the 3-D grid-mesh is regular or Gaussian and as such has no duplicate points.The -g= argument is also used to determine the name of the NetCDF variables which contain the 2-D mesh-mask and the scale factors in the input_mesh_mask_netcdf_file (e.g. these variables are named grid_typemask, e1grid_type and e2grid_type, respectively). This input_mesh_mask_netcdf_file may be created by comp_clim_4d if the 3-D grid-mesh is regular or gaussian.
If -g= is set to
t
,u
,v
,w
orf
(e.g. if the input NetCDF variables are from an experiment with the NEMO or ORCA model), the -r= argument gives the resolution used. If:
- -r=
r2
the NetCDF variables are from an experiment with the ORCA R2 model.- -r=
r4
the NetCDF variables are from an experiment with the ORCA R4 model.If the NetCDF variables are from an experiment with the NEMO or ORCA model, but the resolution is not R2 or R4, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.
If the -x=lon1,lon2, -y=lat1,lat2 and -z=level1,level2 arguments are missing, the geographical domain used in the EOF projection is determined from the attributes of the input mesh mask NetCDF variable named grid_typemask (e.g. lon1_Eastern_limit, lon2_Western_limit, lat1_Southern_limit , lat2_Northern_limit, level1_First_level and level2_Last_level ) which is read from the input NetCDF file input_mesh_mask_netcdf_file. If these attributes are missing, the whole geographical domain associated with the netcdf_variable is used in the EOF projection.
The longitude, latitude or level range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to
1
. Negative values are allowed for lon1. In this case the longitude domain is fromnlon
+lon1+1
to lon2 wherenlon
is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.Refer to comp_mask_4d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_project_eof_4d.
If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used in the projection analysis.
The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
. Note that the output NetCDF file will haventime
= time2 - time1 +1
observations if the -l= argument is missing.The -l= argument lists the indices of the time steps which must be included in the output file. The indices of the time steps are counted from the start of the (selected) time period (e.g. time1 in the -t=time1,time2 argument or
1
if this argument is not used). The list may be specified in two formats:
- -l=n1,n2,…nn allows to select for n1, n2, … and nn time steps
- -l=n1:n2 allows to select time steps from n1 to n2.
The two forms of the -l= argument may be combined and repeated any number of times. Duplicate time steps are not allowed. Be careful with time period limits when specifying the -l= argument.
The -a= argument specifies if the observed variables have to be centered or standardized with an input climatology (specified with the -c= argument) before the projection analysis:
- -a=
scp
means that the projection analysis must be done on the raw data.- -a=
cov
means that the projection analysis must be done on the anomalies.- -a=
cor
means that the projection analysis must be done on the standardized anomalies.The input_climatology_netcdf_file is needed only if -a=
cov
or -a=cor
.If -a=
cov
or -a=cor
, the selected time period must agree with the climatology. This means that the first selected time observation (time1 if the -t= argument is present) must correspond to the first day, month, season of the climatology specified with the -c= argument.The geographical shapes of the netcdf_variable (in the input_netcdf_file), the eof_netcdf_file (in the eof_netcdf_file), the climatology (in the input_climatology_netcdf_file) and the scale factors (in the input_mesh_mask_netcdf_file) must agree.
The -d= argument specifies the metric and scalar product used in the EOF or (SVD) projection:
- -d=
dist2
means that the projection is done with the diagonal distance associated with the horizontal 2-D grid-mesh (e.g. each grid point is weighted accordingly to the surface associated with it)- -d=
dist3
means that the projection is done with the diagonal distance associated with the whole 3D grid-mesh (e.g. each grid point is weighted accordingly to the volume or weight associated with it)- -d=
ident
means that the projection is done with the identity metric : the usual Euclidean distance and scalar product are used in the EOF (or SVD) projection.By default, the -d= argument is set to
dist3
.The -normpc argument specifies that the computed PC (or SV) time series must be normalized with the reciprocal of the singular value of the associated EOF pattern stored in the eof_netcdf_file (or with the reciprocal of the previously computed standard-deviations of the SV time series stored in the eof_netcdf_file if the -svd argument is used).
The -svd argument specifies that the eof_netcdf_file is produced by comp_svd_3d instead of comp_eof_4d. This means that the projection will be done onto the singular vectors of a previous SVD analysis stored in comp_svd_3d.
The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file.
By default, the results are stored as single-precision floating point numbers in the output NetCDF file.
The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g.
-D_USE_NETCDF36
or-D_USE_NETCDF4
) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g.
-D_USE_NETCDF4
) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.It is assumed that the data has no missing values excepted those associated with a constant land-sea mask.
Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.
For more details on EOF or SVD analysis in the climate literature, see
- “A manual for EOF and SVD analyses of climate data”, by Bjornsson, H., and Venegas, S.A., McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp, 1997. https://www.jsg.utexas.edu/fu/files/EOFSVD.pdf
- “Statistical Analysis in Climate Research”, by von Storch, H., and Zwiers, F.W., Cambridge University press, Cambridge, UK, Chapter 13, 484 pp., 2002. ISBN: 9780521012300
Outputs¶
comp_project_eof_4d creates an output NetCDF file that contains the new principal component (or singular variables) time series computed from the eigenvectors (or singular vectors) of a previous EOF (or SVD) analysis. The number of principal components stored in the output NetCDF dataset is determined by the -se=selected_eofs argument. The number of observations in the output NetCDF dataset is determined from the -t=time1,time2 and -l=selected_time_period arguments. The output NetCDF dataset contains the following NetCDF variable :
netcdf_variable_pc
(ntime,number_of_eofs)
: the new principal component time series corresponding to the projection onto the selected eigenvectors or singular vectors.The new principal component time series are standardized with the standard-deviations estimated from the previous EOF (or SVD) analysis if the -normpc argument is specified.
Examples¶
For computing a
5
-EOF projection of a NetCDF variable namedvotemper
in the NetCDF fileST7_1m_20101_30012_grid_T_votemper.nc
on the5
first EOF patterns of a previous EOF analysis stored in the file namedeof_ST7_1m_0101_20012_grid_T_votemper.nc
and store the results in a NetCDF file namedST7_1m_20101_30012_grid_T_votemper_10pc.nc
, use the following command :$ comp_project_eof_4d \ -f=ST7_1m_20101_30012_grid_T_votemper.nc \ -v=votemper \ -fe=eof_ST7_1m_0101_20012_grid_T_votemper.nc \ -ve=votemper \ -se=1:5 \ -m=ST7_grid_T_votemper_mask.nc \ -o=ST7_1m_20101_20012_grid_T_votemper_10pc.nc