comp_filt_rot_eof_4d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

29/05/2024

Purpose

Performs an orthogonal rotation of selected standardized Principal Components (PC) time series extracted from a previous Empirical Orthogonal Function (EOF) analysis or Principal Component Analysis (PCA) of a fourdimensional NetCDF variable towards low-frequency, high-frequency or band-pass components at the user option [Wills_etal] . The associated factor loading matrix (e.g., the associated scaled EOF patterns corresponding to the selected rotated PC time series) are also rotated and the new rotated EOF or PCA model is stored in a NetCDF file.

Spatial EOF patterns (e.g., scaled EOFs) and PC time series derived from EOF analysis or PCA are constrained to be both spatially and temporally orthogonal to one another so that they maximize the variance they described over the entire analysis domain and selected time period [Jolliffe] [vonStorch_Zwiers] . However, the spatial orthogonality of the EOFs is a strong and undesirable constraint imposed on the EOFs and as a result, EOFs are subject to problems such as domain dependence and inaccurate representation of physical relationships embedded in the data [Jolliffe] [Richman] .

Orthogonal rotation of selected standardized PC time series can be used to obtain spatial or temporal patterns (e.g., rotated scaled EOFs and PC time series) that are hopefully more physically meaningful by relaxing the spatial orthogonality of the rotated scaled EOFs, while preserving the temporal orthogonality of the rotated standardized PC time series. Preserving the orthogonality of the (rotated) PC time series implies that these rotated PCs describe the same amount of variance as the original (unrotated) PC time series, which is a very useful property, but this variance is more equally (or differently) distributed among the rotated PC time series [Jolliffe] [Richman] .

In classical orthogonal rotation PCA methods (e.g., like varimax rotation see comp_ortho_rot_eof_4d for details), the objective is to make the rotated standardized PC time series as simple as possible to interpret by attempting to constrain the spatial coefficients (e.g., the scaled EOFs) associated with each of them to be large or very small in absolute magnitude, but avoiding as much as possible spatial loadings of moderate size [Jolliffe] [Jackson] . On the other hand, here in comp_filt_rot_eof_4d, the focus of the orthogonal rotation is to obtain a cleaner representation of the different time scales embedded in the selected PC time series such that these time scales will not be mixed on the same PC time series after the orthogonal rotation [Wills_etal] .

It can be shown that a rotation matrix, which will maximize such clean representation of the time scales of interest, can be estimated as the eigenvectors of the covariance matrix between the selected standardized PC time series filtered in the frequency band of interest. This type of method is called Low-Frequency Components Analysis (LFCA) [Wills_etal] .

Here, LFCA is considered as an orthogonal rotation method of the original standardized PC time series of a (partial) EOF or PCA model. The nrot by nrot orthogonal matrix (where nrot is the number of rotated PCs) used to rotate the original PCs is computed as the eigenvectors of the covariance (e.g., symmetric positive-definite) matrix between the filtered selected standardized PC time series [Wills_etal] .

In comp_filt_rot_eof_4d, the filtering of the original PC time series is performed with the help of a windowed FFT filter specified by the values of the -pl=, -ph=, -tr= and -win= arguments. See [Iacobucci_Noullez] for more details on the windowed FFT filter used here. This windowed filter is applied to the selected PC time series in the frequency domain [Iacobucci_Noullez] . The filter is obtained by convolving a raised-cosine window with the selected ideal rectangular filter response function (which is specified with the help of the -pl= and -ph= arguments). The specific form of the raised-cosine window can be controlled by the user with the help of the -win= argument described below. This filter is stationary and symmetric, therefore, it induces no phase-shift and is a good candidate for extracting frequency-defined series components from short-length time series. This windowed filter is also implemented in comp_fftfilter_4d, where more details on its properties are provided.

Using as input an EOF NetCDF file produced by comp_eof_4d, comp_filt_rot_eof_4d extracts selected PC time series from this input file (see the description of -se= argument below for details) and performs the orthogonal rotation of the selected standardized PC time series towards modes that have more energy in the frequency bands of interest selected with the help of the -pl= and -ph= arguments. comp_filt_rot_eof_4d also reads the associated scaled EOFs from the input EOF NetCDF file and computes the regression spatial patterns associated with the rotated standardized PC time series (e.g., the orthogonal projection onto the orthonormal set formed by the new rotated standardized PC time series). The percentages of variance described by the new rotated standardized PC time series are also computed and stored in the output NetCDF file, as are the ratios of filtered variance to total variance for the rotated standardized PC time series.

The selected standardized PC time series, which will be rotated, are determined on entry by the value of the -se= argument, which is a list of PC (or EOF) numbers.

An output NetCDF dataset containing these rotated PC standardized time series, their associated regression spatial patterns and the explained variance and ratio statistics is created.

If the NetCDF variable used in the original EOF analysis is tridimensional use comp_filt_rot_eof_3d instead of comp_filt_rot_eof_4d.

Further Details

Usage

$ comp_filt_rot_eof_4d\
  -f=eof_netcdf_file \
  -v=eof_netcdf_variable \
  -m=input_mesh_mask_netcdf_file \
  -pl=minimum_period \
  -ph=maximum_period \
  -se=selected_eofs                    (optional) \
  -g=grid_type                         (optional : n, t, u, v, w, f) \
  -r=resolution                        (optional : r2, r4) \
  -b=nlon_orca, nlat_orca, nlevel_orca (optional) \
  -x=lon1,lon2                         (optional) \
  -y=lat1,lat2                         (optional) \
  -z=level1,level2                     (optional) \
  -t=time1,time2                       (optional) \
  -tr=trend_removal                    (optional : 0, 1, 2, 3, -1, -2, -3) \
  -win=window_choice                   (optional : 0.5 > 1.) \
  -o=output_rot_eof_netcdf_file        (optional) \
  -mi=missing_value                    (optional) \
  -double                              (optional) \
  -bigfile                             (optional) \
  -hdf5                                (optional) \
  -tlimited                            (optional)

By default

-se=
all the PC time series stored in the eof_netcdf_file
-g=
the grid_type is set to n which means that the 2-D grid-mesh associated with the input NetCDF EOF variable, eof_netcdf_variable, is assumed to be regular or Gaussian
-r=
if the input eof_netcdf_variable is from the NEMO ocean model (e.g., if -g= argument is not set to n) the resolution is assumed to be r2
-b=
if -n= is not set to n, the dimensions of the 3-D grid-mesh, nlon_orca, nlat_orca and nlevel_orca are determined from the -r= argument. However, you may override this choice by default with the -b= argument
-x=
the whole longitude domain associated with the eof_netcdf_variable
-y=
the whole latitude domain associated with the eof_netcdf_variable
-z=
the whole vertical resolution associated with the eof_netcdf_variable
-t=
the whole time period associated with the input PC time series
-tr=0
trend_removal is set to 0, which means that no detrending is done before filtering the PC time series
-win=0.54
the Hamming window is convolved with the ideal filter response
-o=
the output_rot_eof_netcdf_file is named filt_rot_eof_netcdf_variable.nc
-mi=
the missing_value attribute for the rotated EOFs in the output NetCDF file is set to 1.e+20
-double
the results of the rotated EOF analysis are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=eof_netcdf_variable argument specifies the NetCDF variable for which an EOF analysis was originally computed by comp_eof_4d) and the -f=input_eof_netcdf_file argument specifies that the resulting EOF patterns and PC time series must be extracted from the NetCDF file, eof_netcdf_file. This NetCDF file must have exactly the same format as the files produced by comp_eof_4d.

    These EOF patterns and PC time series are used to compute the orthogonal rotation matrix which will be applied to the selected EOF model.

  2. The -m=input_mesh_mask_netcdf_file argument specifies the land-sea mask to apply to the eof_netcdf_variable (which contains the scaled eigenvectors of the EOF analysis) for transforming this fourdimensional NetCDF variable as a rectangular matrix before performing the rotated EOF analysis. It is assumed that this land-sea mask is exactly the same as the one used in the orignal EOF analysis, whose results are stored in the input NetCDF file eof_netcdf_file.

  3. The -se=selected_eofs argument allows the user to select the standardized PC time series which must be included in the orthogonal rotation. The list of selected PCs may be given in two formats:

    • -se=1,3,…,nn allows to include pc1,pc3,… and pcnn in the orthogonal rotation
    • -se=1:4 allows to include from pc1 to pc4 in the orthogonal rotation.

    The two forms of the -se= argument may be combined and repeated any number of times, but duplicate PC numbers are not allowed. If the -se= argument is not specified, all the PC time series stored in the input_eof_netcdf_file are used in the orthogonal rotation.

  4. If -g= is set to t, u, v, w or f it is assumed that the NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO model (ORCA configuration and R2, R4 or R05 resolutions).

    In this case, the duplicate points from the ORCA grid are removed before the rotated EOF analysis, as far as possible, and, in particular, if the 3-D grid-mesh of the input NetCDF variable covers the whole globe.

    If -g= is set to n, it is assumed that the 2-D grid-mesh is regular or Gaussian and as such has no duplicate points.

    The -g= argument is also used to determine the name of the NetCDF variables which contain the 3-D mesh-mask in the input_mesh_mask_netcdf_file (e.g., the variable named grid_typemask). This input_mesh_mask_netcdf_file may be created by comp_clim_4d if the 3-D grid-mesh is regular or gaussian.

  5. If -g= is set to t, u, v, w or f (e.g., if the input NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO ocean model), the -r= argument gives the resolution used. If:

    • -r=r2 the NetCDF variable is from an experiment with the ORCA R2 configuration of the NEMO ocean model
    • -r=r4 the NetCDF variable is from an experiment with the ORCA R4 configuration of the NEMO ocean model.
  6. If the NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO model, but the resolution is not R2 or R4, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.

  7. If the -x=lon1,lon2, -y=lat1,lat2 and -z=level1,level2 arguments are not specified, the geographical domain used in the rotated EOF analysis is determined from the attributes of the input mesh-mask NetCDF variable named grid_typemask (e.g., lon1_Eastern_limit, lon2_Western_limit, lat1_Southern_limit, lat2_Northern_limit, level1_First_level and level2_Last_level), which is read from the input NetCDF file input_mesh_mask_netcdf_file. If these attributes are missing, the whole geographical domain associated with the netcdf_variable is used in the EOF projection. These arguments must be set normally to the same values as used in the original EOF analysis, whose results are stored in the input NetCDF file, eof_netcdf_file.

    The longitude, latitude or level range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_4d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_filt_rot_eof_4d.

  8. If the -t=time1,time2 argument is missing, the whole time period associated with the original PC time series is used to compute the orthogonal rotation matrix.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the output NetCDF file will have ntime = time2 - time1 + 1 observations.

  9. The geographical shapes of the eof_netcdf_variable (in the eof_netcdf_file) and the land-sea mask (in the input_mesh_mask_netcdf_file) must agree.

  10. The -pl= argument specifies the minimum period of oscillation of the filtered PC time series. The minimum_period is expressed in number of time observations.

    Do not use the -pl= argument or use -pl=0 for high-pass filtering frequencies corresponding to periods shorter than -ph=PH .

    The -pl= argument is a positive integer equal to 0 or greater than 2.

  11. The -ph= argument specifies the maximum period of oscillation of the filtered PC time series. The maximum_period is expressed in number of time observations. Do not use the -ph= argument or use -ph=0 for low-pass filtering frequencies corresponding to periods longer than -pl=PL . For example, -pl=6 (or 18) and -ph=32 (or 96) select periods between 1.5 and 8 years for quarterly (monthly) time series.

    The -ph= argument is a positive integer equal to 0 or greater than 2 and less than the length of the time series or the periodicity if the -p= argument is used.

    The -ph= argument must also be greater or equal to the -pl= argument.

  12. Setting -pl= and -ph= to the same value P is allowed. In this case, an -ideal- band-pass filter with peak response near one at the single period P is computed and applied to the PC time series.

  13. Setting -pl=PL, -ph=PH and PH``<``PL is also allowed and performs band rejection of periods between PH and PL. In that case, the meaning of the -pl= and -ph= arguments reversed.

  14. Setting both -pl=0 and -ph=0 is not allowed.

  15. The -tr=trend_removal argument specifies pre- and post-filtering processing of the multichannel PC time series. If:

    • -tr=+/-1, the mean of the PC time series is removed before time filtering
    • -tr=+/-2, the drift from the PC time series is removed before time filtering. The drift for the PC time series is estimated using the formula : drift = ( tseries(ntime) - tseries(1) )/( ntime - 1 )
    • -tr=+/-3, the least-squares line from the PC time series is removed before time filtering.

    If -tr=-1, -2 or -3, the mean, drift or least-squares line are reintroduced post-filtering, respectively. The covariance matrix between the (detrended) and filtered PC time series is then computed and the eigenvectors of this symmetric positive-definite matrix, ranked in decreasing order of the associated positive eigenvalues are calculated in order to obtain the orthogonal rotation matrix to apply to the selected EOFs and PC time series.

    For other values of the -tr= argument, nothing is done before or after filtering the PC time series.

    The -tr= argument must be an integer and the default value for the -tr= argument is 0.

  16. The -win= argument controls the form of the window, which will be convolved with the ideal filter response. By default, a Hamming window is used (e.g., -win=0.54).

    Set -win=0.5 for using a Hanning window or -win=1. for a rectangular window (e.g., the “ideal” filter).

    The -win= argument is a real number greater or equal to O.5 and less or equal to 1..

  17. The -mi=missing_value argument specifies the missing value indicator associated with the rotated EOF NetCDF variable in the output_rot_eof_netcdf_file. If the -mi= argument is not specified missing_value is set to 1.e+20.

  18. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file, output_rot_eof_netcdf_file.

    By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  19. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g. -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher.

    If this argument is specified, the output_rot_eof_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  20. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher.

    If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  21. It is assumed that the original EOF variable has no missing values excepted those associated with a constant land-sea mask.

  22. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  23. For more details on EOF analysis, PCA and orthogonal rotation of PC time series to selected-frequency components and windowed filtering of time series, see

    • “A manual for EOF and SVD analyses of climate data”, by Bjornsson, H., and Venegas, S.A., McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp, 1997. https://www.jsg.utexas.edu/fu/files/EOFSVD.pdf
    • “Statistical Analysis in Climate Research”, by von Storch, H., and Zwiers, F.W., Cambridge University press, Cambridge, UK, Chapter 13, 484 pp., 2002. ISBN: 9780521012300
    • “Principal Component Analysis”, by Jolliffe, I.T., Springer-Verlag, New York, USA, 487 pp., 2nd Ed, 2002. ISBN: 978-0-387-22440-4. https://www.springer.com/gp/book/9780387954424
    • “A user’s guide to principal components”, by Jackson, J.E., John Wiley and Sons, New York, USA, 592 pp., 2003. ISBN: 978-0-471-47134-9
    • “Rotation of principal components” by Richman, M.B., International Journal of Climatology, Vol. 6, 293-335, 1986. https://doi.org/10.1002/joc.3370060305
    • “Disentangling Global Warming, Multidecadal Variability, and El Nino in Pacific Temperatures”, by Wills, R.C., Schneider, T., Wallace, J.M., Battisti, D.S., and Hartmann, D.L., Geophysical Research Letters, Vol. 45, 2487-2496, 2018. https://doi.org/10.1002/2017GL076327
    • “A Frequency Selective Filter for Short-Length Time Series”, by Iacobucci, A., and Noullez, A., Computational Economics, Vol. 25, 75-102, 2005. https://link.springer.com/article/10.1007/s10614-005-6276-7

Outputs

comp_filt_rot_eof_4d creates an output NetCDF file that contains the rotated standardized PC time series, their associated rotated EOF patterns and the variances described by the computed rotated PC time series. The ratios of filtered variance to total variance for the rotated standardized PC time series are also stored in the output NetCDF file.

The number of rotated PCs and EOFs, nrot, stored in the output NetCDF dataset is determined by the -se=selected_eofs argument. The number of observations in the output NetCDF dataset is determined from the -t=time1,time2 argument. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat, nlon and nlev are the length of the spatial and vertical dimensions of the input NetCDF variable eof_netcdf_variable and nrot is the number of selected PC time series, which have been rotated as determined by the list specified with the -se=selected_eofs argument) :

  1. pc_number(nrot) : the list of rotated PC time series as specified with the -se=selected_eofs argument.

  2. eof_netcdf_variable_filt_rot_eof(nrot,nlev,nlat,nlon) : the rotated EOF patterns. These rotated EOFs are scaled such that they give the scalar products (if -a=scp in the original EOF analysis), covariances (-a=cov in the original EOF analysis) or correlations (-a=cor in the original EOF analysis) between the original observed variables and the rotated PC time series.

    The rotated EOF patterns are packed in a fourdimensional variable whose first, second and third dimensions are exactly the same as those associated with the input NetCDF variable eof_netcdf_variable even if you restrict the geographical domain with the -x=, -y= and -z= arguments. However, outside the selected domain, this output NetCDF variable is filled with missing values.

  3. eof_netcdf_variable_filt_rot_pc(ntime,nrot) : the rotated standardized PC time series.

    The rotated PC time series are always standardized to unit variance.

  4. eof_netcdf_variable_filt_rot_std(nrot) : the standard-deviations of the rotated PC time series.

    The squares of these statistics are equal to the raw variance described by the rotated PC time series.

  5. eof_netcdf_variable_filt_rot_var(nrot) : the proportion of variance explained by each rotated PC time series (given between 0. and 1. with 1. corresponding to 100%).

  6. eof_netcdf_variable_filt_rot_ratio(nrot) : the ratio of filtered variance to total variance for each rotated PC time series (given between 0. and 1. with 1. corresponding to 100%).

Examples

  1. For computing a rotated 10-EOF analysis towards biennial modes (e.g., time scales between 18 and 30 months) from a previous EOF analysis (stored in the file eof_1m_197902_200501_votemper_oi.nc) of a fourdimensional NetCDF variable named votemper and store the results of the rotated EOF analysis in a NetCDF file named filt_rot_eof_1m_197902_200501_votemper_oi.nc, use the following command :

    $ comp_filt_rot_eof_4d \
      -f=eof_1m_197902_200501_votemper_oi.nc  \
      -v=votemper \
      -se=1:10 \
      -pl=18 \
      -ph=30 \
      -m=oi_mask.nc \
      -o=filt_rot_eof_1m_197902_200501_votemper_oi.nc
    
Flag Counter