comp_ortho_rot_eof_4d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

29/04/2024

Purpose

Performs an orthogonal rotation of a (partial) Empirical Orthogonal Function (EOF) or Principal Component Analysis (PCA) model (e.g., a factor loading matrix and the associated standardized amplitude time series) computed from a previous EOF or PCA analysis of a fourdimensional NetCDF variable. The factor loading matrix (e.g., the associated scaled EOF patterns corresponding to the selected rotated PC time series) are also rotated and the new rotated EOF or PCA model is stored in a NetCDF file.

Spatial EOF patterns (e.g., scaled EOFs) and Principal Component (PC) time series derived from EOF analysis or PCA are constrained to be both spatially and temporally orthogonal to one another so that they maximize the variance they described over the entire analysis domain and selected time period [Jolliffe] [vonStorch_Zwiers] . However, the spatial orthogonality of the EOFs is a strong and undesirable constraint imposed on the EOFs and as a result, EOFs are subject to problems such as domain dependence and inaccurate representation of physical relationships embedded in the data [Jolliffe] [Richman] .

Orthogonal rotation of selected standardized PC time series can be used to obtain spatial and time patterns (e.g., rotated scaled EOFs and PC time series) that are hopefully more physically meaningful by relaxing the spatial orthogonality of the rotated scaled EOFs, while preserving the temporal orthogonality of the rotated standardized PC time series. Preserving the orthogonality of the (rotated) PC time series implies that these rotated PCs describe the same amount of variance as the original (unrotated) PC time series, which is a very useful statistical property, but this variance is more equally (or differently) distributed among the rotated PC time series [Jolliffe] [Richman] .

More precisely, the orthogonal rotation in comp_ortho_rot_eof_4d has the objective to make the rotated standardized PC time series as simple as possible to interpret by attempting to constrain the spatial coefficients (or loadings) associated with each of them to be large or very small in absolute magnitude, but avoiding as much as possible spatial loadings of moderate size [Jolliffe] [Jackson] .

The orthogonal rotation matrix, which will perform this linear transformation is obtained by maximizing a generalized orthomax criterion, including quartimax, varimax and equamax rotation methods [Jackson] at the user option (see description of the -w= argument below for more details about the selection of the rotation method which will be used). This generalized orthomax criterion is described in details in [Clarkson_Jennrich] [Jackson] and amounts to obtain rotated standardized PC time series which are more spatially localized by relaxing the spatial orthogonality of the original EOF (or PCA) modes, while preserving the orthogonality of the (rotated) standardized PC time series as described above.

The -w=rotation_factor argument determines the kind of orthogonal rotation to be computed [Clarkson_Jennrich] [Jackson] and may be set as follows :

  • -w=0. is the quartimax method [Jackson] , which attempts to get each variable (e.g., grid-point) to load highly on only one (or a few) EOFs or factors.
  • -w=1. is the varimax method [Jackson] , which attempts to load highly a relatively low number of variables on each EOF or factor. Varimax is the most widely used method of orthogonal rotation.
  • -w=nrot/2., where nrot is the number of rotated PCs, is the equamax method [Jackson] , which is a compromise of the above two.

More generally, the value of the -w= argument can be any positive real number, but best values lie in the closed interval [0., 5. * nrot]. Generally, the larger the value is, the more equal is the dispersion of the variance accounted for across the rotated EOFs or factors.

As an illustration, by default, comp_ortho_rot_eof_4d uses the varimax rotation method, which simplifies the spatial structure of the rotated (scaled) EOFs by maximizing the sum of the variances of the squared columns loadings of the selected rotated (scaled) EOFs [Jolliffe] [Jackson] [vonStorch_Zwiers] . As a result, the rotated EOFs tend to have coefficients biased towards either large or zero values. Hence these rotated EOFs should be easier to interpret than their unrotated counterparts.

Note that, other methods of rotation exist to obtain more physically based EOFs and PC time series. For example, the orthogonal rotation matrix can be designed to maximize the energy in a selected frequency band for the rotated PC time series, see comp_filt_rot_eof_4d for more details.

Using as input an EOF NetCDF file produced by comp_eof_4d, comp_ortho_rot_eof_4d performs an orthogonal rotation of selected standardized PC time series readed from this input EOF NetCDF file and computes the regression spatial patterns associated with these rotated standardized PC time series (e.g., the orthogonal projection onto the orthonormal set formed by the new rotated standardized PC time series). The raw variances and percentages of variance described by the new rotated standardized PC time series are also computed.

The selected standardized PC time series, which will be rotated is determined on entry by the value of the -se= argument, which is a list of PC (or EOF) numbers.

Optionally, the rotated PC time series can be estimated with the help of a metric such that the results are weighted by the surface (or volume) associated with each cell in the grid associated with the input tridimensional NetCDF variable (which contains the original scaled EOFs) so that equal areas (or volumes) carry equal weights in the results of the rotated EOF analysis (see the -d= argument description).

An output NetCDF dataset containing these rotated PC standardized time series, their associated regression spatial patterns and the explained variance statistics is created.

If the NetCDF variable used in the original EOF analysis is tridimensional use comp_ortho_rot_eof_3d instead of comp_ortho_rot_eof_4d.

Further Details

Usage

$ comp_ortho_rot_eof_4d\
  -f=eof_netcdf_file \
  -v=eof_netcdf_variable \
  -m=input_mesh_mask_netcdf_file \
  -se=selected_eofs                    (optional) \
  -g=grid_type                         (optional : n, t, u, v, w, f) \
  -r=resolution                        (optional : r2, r4) \
  -b=nlon_orca, nlat_orca, nlevel_orca (optional) \
  -x=lon1,lon2                         (optional) \
  -y=lat1,lat2                         (optional) \
  -z=level1,level2                     (optional) \
  -t=time1,time2                       (optional) \
  -d=distance                          (optional : dist2, dist3, ident) \
  -w=rotation_factor                   (optional :  0. > +inf) \
  -o=output_rot_eof_netcdf_file        (optional) \
  -mi=missing_value                    (optional) \
  -knorm                               (optional) \
  -double                              (optional) \
  -bigfile                             (optional) \
  -hdf5                                (optional) \
  -tlimited                            (optional)

By default

-se=
all the PC time series stored in the eof_netcdf_file
-g=
the grid_type is set to n which means that the 3-D grid-mesh associated with the input NetCDF EOF variable, eof_netcdf_variable, is assumed to be regular or Gaussian
-r=
if the input eof_netcdf_variable is from the NEMO ocean model (e.g., if -g= argument is not set to n) the resolution is assumed to be r2
-b=
if -n= is not set to n, the dimensions of the 3-D grid-mesh, nlon_orca, nlat_orca and nlevel_orca are determined from the -r= argument. However, you may override this choice by default with the -b= argument
-x=
the whole longitude domain associated with the eof_netcdf_variable
-y=
the whole latitude domain associated with the eof_netcdf_variable
-z=
the whole vertical resolution associated with the eof_netcdf_variable
-t=
the whole time period associated with the input PC time series
-d=
the distance is set to the distance used in the original EOF analysis.
-w=
the rotation_factor is set to 0., which means that a varimax orthogonal rotation is performed.
-o=
the output_rot_eof_netcdf_file is named ortho_rot_eof_netcdf_variable.nc
-mi=
the missing_value attribute for the rotated EOFs in the output NetCDF file is set to 1.e+20
-knorm
the orthogonal rotation of the standardized PC time series is performed without Kaiser’s row normalization. If -knorm is activated, Kaiser’s row normalization is performed before the orthogonal rotation of the PC time series and the spatial loadings (e.g., the rotated EOFs) associated with the rotated PC time series are rescaled after the orthogonal rotation
-double
the results of the rotated EOF analysis are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=eof_netcdf_variable argument specifies the NetCDF variable for which an EOF analysis was originally computed by comp_eof_4d and the -f=input_eof_netcdf_file argument specifies that the resulting EOF patterns and PC time series must be extracted from the NetCDF file, eof_netcdf_file. This NetCDF file must have exactly the same format as the files produced by comp_eof_4d.

    These EOF patterns and PC time series are used to compute the orthogonal rotation matrix which will be applied to the selected EOF model.

  2. The -m=input_mesh_mask_netcdf_file argument specifies the land-sea mask to apply to the eof_netcdf_variable (which contains the scaled eigenvectors of the EOF analysis) for transforming this fourdimensional NetCDF variable as a rectangular matrix before performing the rotated EOF analysis. It is assumed that this land-sea mask is exactly the same as the one used in the original EOF analysis, whose results are stored in the input NetCDF file eof_netcdf_file.

    The scale factors associated with the 3-D grid-mesh of the input NetCDF variable (needed if -d=dist2 or -d=dist3 is specified when calling the procedure) are also read from the input_mesh_mask_netcdf_file.

  3. The -se=selected_eofs argument allows the user to select the standardized PC time series which must be included in the orthogonal rotation. The list of selected PCs may be given in two formats:

    • -se=1,3,…,nn allows to include pc1,pc3,… and pcnn in the orthogonal rotation
    • -se=1:4 allows to include from pc1 to pc4 in the orthogonal rotation.

    The two forms of the -se= argument may be combined and repeated any number of times. Duplicate PC numbers are not allowed. If the -se= argument is not specified, all the PC time series stored in the input_eof_netcdf_file are used in the orthogonal rotation.

  4. If -g= is set to t, u, v, w or f it is assumed that the NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO ocean model. In this case, the duplicate points from the ORCA grid are removed before the rotated EOF analysis, as far as possible, and, in particular, if the 3-D grid-mesh of the input NetCDF variable covers the whole globe.

    If -g= is set to n, it is assumed that the 2-D grid-mesh is regular or Gaussian and as such has no duplicate points.

    The -g= argument is also used to determine the name of the NetCDF variables which contain the 3-D mesh-mask and the scale factors in the input_mesh_mask_netcdf_file (e.g., these variables are named grid_typemask, e1grid_type, e2grid_type and e3grid_typerespectively). This input_mesh_mask_netcdf_file may be created by comp_clim_4d if the 3-D grid-mesh is regular or gaussian.

  5. If -g= is set to t, u, v, w or f (e.g., if the input NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO ocean model), the -r= argument gives the resolution used. If:

    • -r=r2 the NetCDF variable is from an experiment with the ORCA R2 configuration of the NEMO ocean model
    • -r=r4 the NetCDF variable is from an experiment with the ORCA R4 configuration of the NEMO ocean model.
  6. If the NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO ocean model, but the resolution is not R2 or R4, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.

  7. If the -x=lon1,lon2, -y=lat1,lat2 and -z=level1,level2 arguments are not specified, the geographical domain used in the rotated EOF analysis is determined from the attributes of the input mesh-mask NetCDF variable named grid_typemask (e.g., lon1_Eastern_limit, lon2_Western_limit, lat1_Southern_limit, lat2_Northern_limit, level1_First_level and level2_Last_level), which is read from the input NetCDF file input_mesh_mask_netcdf_file. If these attributes are missing, the whole geographical domain associated with the netcdf_variable is used in the EOF projection. These arguments must be set normally to the same values as used in the original EOF analysis, whose results are stored in the input NetCDF file, eof_netcdf_file.

    The longitude, latitude or level range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_4d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_ortho_rot_eof_4d.

  8. If the -t=time1,time2 argument is missing, the whole time period associated with the original PC time series is used in the orthogonal rotation.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the output NetCDF file will have ntime = time2 - time1 + 1 observations.

  9. The geographical shapes of the eof_netcdf_variable (in the eof_netcdf_file), the land-sea mask and scale factors (in the input_mesh_mask_netcdf_file) must agree.

  10. The -d= argument specifies the metric and scalar product used in the rotated EOF analysis:

    • -d=dist2 means that the orthogonal rotation is done with the diagonal distance associated with the horizontal 2-D grid-mesh (e.g., each grid-point is weighted accordingly to the surface associated with it)
    • -d=dist3 means that the projection is done with the diagonal distance associated with the whole 3D grid-mesh (e.g., each grid point is weighted accordingly to the volume or weight associated with it)
    • -d=ident means that the orthogonal rotation is done with the identity metric, e.g., the usual Euclidean distance and scalar product are used in the rotated EOF analysis.

    By default, the -d= argument is set to the distance used in the original EOF analysis.

  11. The -w=rotation_factor argument specifies the exact form of the function criterion, which must be maximized over the set of nrot by nrot orthogonal rotation matrices (where nrot is the number of selected PCs as specified with the -se= argument). rotation_factor is a positive real scalar and may be set as follows:

    • -w=0. is the quartimax method, which attempts to get each variable (e.g., grid-point) to load highly on only one (or a few) EOFs or factors.
    • -w=1. is the varimax method, which attempts to load highly a relatively low number of variables on each EOF or factor. Varimax is the most widely used method of orthogonal rotation.
    • -w=nrot/2., where nrot is the number of rotated PCs, is the equamax method, which is a compromise of the above two.

    More generally, rotation_factor can be any positive real number, but best values lie in the closed interval [0., 5. * nrot]. Generally, the larger the value of rotation_factor is, the more equal is the dispersion of the variance accounted for across the rotated EOFs or factors.

  12. The -mi=missing_value argument specifies the missing value indicator associated with the NetCDF variables in the output_rot_eof_netcdf_file.

    If the -mi= argument is not specified missing_value is set to 1.e+20.

  13. The -knorm argument specifies that orthogonal rotation of the original PC time series must be performed using Kaiser’s row normalization of the associated spatial loadings (e.g., the original scaled EOFs) and the computed spatial loadings (e.g., the rotated EOFs) associated with the rotated PC time series are rescaled after the orthogonal rotation.

  14. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file, output_rot_eof_netcdf_file.

    By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  15. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g., -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher.

    If this argument is specified, the output_rot_eof_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  16. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g., -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher.

    If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  17. It is assumed that the data has no missing values excepted those associated with a constant land-sea mask.

  18. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  19. For more details on EOF analysis, PCA and orthogonal rotation methods, see

Outputs

comp_ortho_rot_eof_4d creates an output NetCDF file that contains the rotated standardized PC time series, their associated rotated EOF patterns and the variances described by the computed rotated PC time series.

The number of rotated PCs and EOFs, nrot, stored in the output NetCDF dataset is determined by the -se=selected_eofs argument. The number of observations in the output NetCDF dataset is determined from the -t=time1,time2 argument. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat, nlon and nlev are the lengths of the geographical dimensions of the input NetCDF variable eof_netcdf_variable and nrot is the number of selected PC time series, which have been rotated as determined by the list specified with the -se=selected_eofs argument) :

  1. pc_number(nrot) : the list of rotated PC time series as specified with the -se=selected_eofs argument.

  2. eof_netcdf_variable_ortho_rot_eof(nrot,nlev,nlat,nlon) : the rotated EOF patterns. These rotated EOFs are scaled such that they give the scalar products (if -a=scp in the original EOF analysis), covariances (-a=cov in the original EOF analysis) or correlations (-a=cor in the original EOF analysis) between the original observed variables and the rotated PC time series.

    The rotated EOF patterns are packed in a fourdimensional variable whose first, second and third dimensions are exactly the same as those associated with the input NetCDF variable eof_netcdf_variable even if you restrict the geographical domain with the -x=, -y= and -z= arguments. However, outside the selected domain, this output NetCDF variable is filled with missing values.

  3. eof_netcdf_variable_ortho_rot_pc(ntime,nrot) : the rotated standardized PC time series.

    The rotated PC time series are always standardized to unit variance.

  4. eof_netcdf_variable_ortho_rot_std(nrot) : the standard-deviations of the rotated PC time series.

    The squares of these statistics are equal to the raw variance described by the rotated PC time series.

  5. eof_netcdf_variable_ortho_rot_var(nrot) : the proportion of variance explained by each rotated PC time series (given between 0. and 1. with 1. corresponding to 100%).

Examples

  1. For computing a varimax rotated 10-EOF analysis from a previous EOF analysis (stored in the file eof_1m_197902_200501_votemper_oi.nc) of a fourdimensional NetCDF variable named votemper and store the results of the rotated EOF analysis in a NetCDF file named rot_eof_1m_197902_200501_votemper_oi.nc, use the following command :

    $ comp_ortho_rot_eof_4d \
      -f=eof_1m_197902_200501_votemper_oi.nc  \
      -v=votemper \
      -se=1:10 \
      -m=oi_mask.nc \
      -o=rot_eof_1m_197902_200501_votemper_oi.nc
    
Flag Counter