comp_loess_rot_eof_3d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

29/04/2024

Purpose

Performs an orthogonal rotation of selected standardized Principal Components (PC) time series extracted from a previous Empirical Orthogonal Function (EOF) analysis or Principal Component Analysis (PCA) of a tridimensional NetCDF variable towards low-frequency or high-frequency components using a LOESS smoother [Wills_etal] [Cleveland] [Cleveland_Devlin] . The associated factor loading matrix (e.g., the associated scaled EOF patterns corresponding to the selected rotated PC time series) are also rotated and the new rotated EOF or PCA model is stored in a NetCDF file.

Spatial EOF patterns (e.g., scaled EOFs) and PC time series derived from EOF analysis or PCA are constrained to be both spatially and temporally orthogonal to one another so that they maximize the variance they described over the entire analysis domain and selected time period [Jolliffe] [vonStorch_Zwiers] . However, the spatial orthogonality of the EOFs is a strong and undesirable constraint imposed on the EOFs and, as a result, EOFs are subject to problems such as domain dependence and inaccurate representation of physical relationships embedded in the data [Jolliffe] [Richman] .

Orthogonal rotation of selected standardized PC time series can be used to obtain spatial or temporal patterns (e.g., rotated scaled EOFs and PC time series) that are hopefully more physically meaningful by relaxing the spatial orthogonality of the rotated scaled EOFs, while preserving the temporal orthogonality of the rotated standardized PC time series. Preserving the orthogonality of the (rotated) PC time series implies that these rotated PCs describe the same amount of variance as the original (unrotated) PC time series, which is a very useful property, but this variance is more equally (or differently) distributed among the rotated PC time series [Jolliffe] [Richman] .

In classical orthogonal rotation PCA methods (e.g., like varimax rotation, see comp_ortho_rot_eof_3d for details), the objective is to make the rotated standardized PC time series as simple as possible to interpret by attempting to constrain the spatial coefficients (e.g., the scaled EOFs) associated with each of them to be large or very small in absolute magnitude, but avoiding as much as possible spatial loadings of moderate size [Jolliffe] [Jackson] . On the other hand, here in comp_loess_rot_eof_3d, the focus of the orthogonal rotation is to obtain a cleaner representation of the different time scales embedded in the selected PC time series such that these time scales will not be mixed on the same PC time series after the orthogonal rotation [Wills_etal] .

It can be shown that a rotation matrix, which will maximize such clean representation of the time scales of interest, can be estimated as the eigenvectors of the covariance matrix between the selected standardized PC time series filtered in the frequency bands of interest. This type of generic method is called Low-Frequency Components Analysis (LFCA) [Wills_etal] .

Here, LFCA is considered as an orthogonal rotation method of the original standardized PC time series of a (partial) EOF or PCA model. The nrot by nrot orthogonal matrix (where nrot is the number of rotated PCs) used to rotate the original PCs is computed as the eigenvectors of the covariance (e.g., symmetric positive-definite) matrix between the filtered selected standardized PC time series [Wills_etal] .

In comp_loess_rot_eof_3d, the filtering of the original PC time series is performed with the help of a LOESS smoother specified by the values of the -nt=, -itdeg=, –ntjump= and -a= arguments. See [Cleveland] [Cleveland_Devlin] for more details on the LOESS smoother used here. In the LOESS procedure, the analyzed PC time series are decomposed into two terms:

PC(t) = T(t) + R(t)

where t refers to a time index, the T term is used to quantify the trend and low-frequency variations in the PC time series, and, finally, the R term contains the residual (high-frequency) component of the PC time series. The trend is estimated through a sequence of applications of locally weighted regression or low-order polynomial (e.g., LOESS) to data windows whose length is chosen by the user (see description of the -nt= and -itdeg= arguments below for details) and the residual component is estimated by difference with the original PC time series. This LOESS smoother is a good candidate for extracting frequency-defined series components from time series. This filter is also implemented in comp_trend_3d, where more details are provided.

This LOESS smoother is applied to the selected PC time series before their covariance matrix and its associated eigenvectors are computed. At the user option, the covariance matrix can be computed from the trend or residual components of the PC time series (see description of the -a= argument below for details). In the first case, the rotated standardized PC time series will be ordered from low- to high-frequency modes and, in the second case, they will be ordered from high- to low-frequency modes.

The purpose here is exactly the same as in comp_filt_rot_eof_3d excepted that comp_loess_rot_eof_3d uses a LOESS smoother [Cleveland] [Cleveland_Devlin] instead of a windowed filtering [Iacobucci_Noullez] to filter the original PC time series.

In summary, using as input an EOF NetCDF file produced by comp_eof_3d or comp_eof_miss_3d, comp_loess_rot_eof_3d extracts selected PC time series from this input file (see the description of -se= argument below for details) and performs the orthogonal rotation of the selected standardized PC time series towards modes that have more energy in the frequency bands of interest selected with the help of the -nt=, -itdeg= and -a= arguments. comp_loess_rot_eof_3d also reads the associated scaled EOFs from the input EOF NetCDF file and computes the regression spatial patterns associated with the rotated standardized PC time series (e.g., the orthogonal projection onto the orthonormal set formed by the new rotated standardized PC time series). The percentages of variance described by the new rotated standardized PC time series are also computed and stored in the output NetCDF file, as are the ratios of filtered variance to total variance for the rotated standardized PC time series.

The selected standardized PC time series, which will be rotated, are determined on entry by the value of the -se= argument, which is a list of PC (or EOF) numbers.

An output NetCDF dataset containing these rotated PC standardized time series, their associated regression spatial patterns and the explained variance and ratio statistics is created.

If the NetCDF variable used in the original EOF analysis is fourdimensional use comp_loess_rot_eof_4d instead of comp_loess_rot_eof_3d.

Further Details

Usage

$ comp_loess_rot_eof_3d\
  -f=eof_netcdf_file \
  -v=eof_netcdf_variable \
  -m=input_mesh_mask_netcdf_file \
  -nt=trend_smoother_length        (nt) \
  -se=selected_eofs                          (optional) \
  -g=grid_type                               (optional : n, t, u, v, w, f) \
  -r=resolution                              (optional : r2, r4) \
  -b=nlon_orca, nlat_orca                    (optional) \
  -x=lon1,lon2                               (optional) \
  -y=lat1,lat2                               (optional) \
  -t=time1,time2                             (optional) \
  -itdeg=trend_smoother_degree     (itdeg)   (optional : 0, 1, 2 ) \
  -ntjump=trend_skipping_value     (ntjump)  (optional) \
  -a=type_of_analysis                        (optional : trend, residual)  \
  -o=output_rot_eof_netcdf_file              (optional) \
  -mi=missing_value                          (optional) \
  -double                                    (optional) \
  -bigfile                                   (optional) \
  -hdf5                                      (optional) \
  -tlimited                                  (optional)

By default

-se=
all the PC time series stored in the eof_netcdf_file
-g=
the grid_type is set to n which means that the 2-D grid-mesh associated with the input NetCDF EOF variable, eof_netcdf_variable, is assumed to be regular or Gaussian
-r=
if the input eof_netcdf_variable is from the NEMO ocean model (e.g., if -g= argument is not set to n) the resolution is assumed to be r2
-b=
if -g= is not set to n, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, are determined from the -r= argument. However, you may override this default choice with the -b= argument
-x=
the whole longitude domain associated with the eof_netcdf_variable
-y=
the whole latitude domain associated with the eof_netcdf_variable
-t=
the whole time period associated with the input PC time series
-itdeg=
the trend_smoother_degree is set to 1
-ntjump=
the trend_skipping_value is set to nt/10 where nt is the value of the trend_smoother_length argument
-a=
the type_of_analysis is set to trend. This means that the trend components of the PC times series are used to compute the orthogonal rotation matrix
-o=
the output_rot_eof_netcdf_file is named loess_rot_eof_netcdf_variable.nc
-mi=
the missing_value attribute for the rotated EOFs in the output NetCDF file is set to 1.e+20
-double
the results of the rotated EOF analysis are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=eof_netcdf_variable argument specifies the NetCDF variable for which an EOF analysis was originally computed by comp_eof_3d (or comp_eof_miss_3d) and the -f=input_eof_netcdf_file argument specifies that the resulting EOF patterns and PC time series must be extracted from the NetCDF file, eof_netcdf_file. This NetCDF file must have exactly the same format as the files produced by comp_eof_3d or comp_eof_miss_3d.

    These EOF patterns and PC time series are used to compute the orthogonal rotation matrix which will be applied to the selected EOF model.

  2. The -m=input_mesh_mask_netcdf_file argument specifies the land-sea mask to apply to the eof_netcdf_variable (which contains the scaled eigenvectors of the EOF analysis) for transforming this tridimensional NetCDF variable as a rectangular matrix before performing the rotated EOF analysis. It is assumed that this land-sea mask is exactly the same as the one used in the orignal EOF analysis, whose results are stored in the input NetCDF file eof_netcdf_file.

  3. The -se=selected_eofs argument allows the user to select the standardized PC time series which must be included in the orthogonal rotation. The list of selected PCs may be given in two formats:

    • -se=1,3,…,nn allows to include pc1,pc3,… and pcnn in the orthogonal rotation
    • -se=1:4 allows to include from pc1 to pc4 in the orthogonal rotation.

    The two forms of the -se= argument may be combined and repeated any number of times, but duplicate PC numbers are not allowed. If the -se= argument is not specified, all the PC time series stored in the input_eof_netcdf_file are used in the orthogonal rotation.

  4. If -g= is set to t, u, v, w or f it is assumed that the NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO ocean model. In this case, the duplicate points from the ORCA grid are removed before the rotated EOF analysis, as far as possible, and, in particular, if the 2-D grid-mesh of the input NetCDF variable covers the whole globe.

    If -g= is set to n, it is assumed that the 2-D grid-mesh is regular or Gaussian and as such has no duplicate points.

    The -g= argument is also used to determine the name of the NetCDF variables which contain the 2-D mesh-mask in the input_mesh_mask_netcdf_file (e.g., the variable named grid_typemask). This input_mesh_mask_netcdf_file may be created by comp_clim_3d if the 2-D grid-mesh is regular or gaussian.

  5. If -g= is set to t, u, v, w or f (e.g., if the input NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO ocean model), the -r= argument gives the resolution used. If:

    • -r=r2 the NetCDF variable is from an experiment with the ORCA R2 configuration of the NEMO ocean model
    • -r=r4 the NetCDF variable is from an experiment with the ORCA R4 configuration of the NEMO ocean model.
  6. If the NetCDF variable, eof_netcdf_variable, is from an experiment with the NEMO model, but the resolution is not R2 or R4, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.

  7. If the -x=lon1,lon2 and -y=lat1,lat2 arguments are not specified, the geographical domain used in the rotated EOF analysis is determined from the attributes of the input mesh mask NetCDF variable named grid_typemask (e.g,. lon1_Eastern_limit, lon2_Western_limit, lat1_Southern_limit and lat2_Northern_limit), which is read from the input NetCDF file input_mesh_mask_netcdf_file. If these attributes are missing, the whole geographical domain associated with the netcdf_variable is used in the EOF projection. These arguments must be set normally to the same values as used in the original EOF analysis, whose results are stored in the input NetCDF file, eof_netcdf_file.

    The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_3d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_loess_rot_eof_3d.

  8. If the -t=time1,time2 argument is missing, the whole time period associated with the original PC time series is used to compute the orthogonal rotation matrix.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the output NetCDF file will have ntime = time2 - time1 + 1 observations.

  9. The geographical shapes of the eof_netcdf_variable (in the eof_netcdf_file) and the land-sea mask (in the input_mesh_mask_netcdf_file) must agree.

  10. The -nt= argument specifies the length of the trend smoother, nt. The value of nt should be an odd integer greater than or equal to 3. As nt increases the values of the trend components of the PC time series, which are used to compute the orthogonal rotation matrix (if -a=trend) become smoother.

  11. The -itdeg= argument specifies the degree of the locally-fitted polynomial in trend smoothing of the PC time series. The value is 0, 1 or 2.

  12. The -ntjump= argument specifies the skipping value for trend smoothing of the PC time series. The trend smoother skips ahead ntjump points and then linearly interpolates in between. The value of ntjump should be a positive integer; if ntjump is set to 1, a trend smooth is calculated at all points in the PC time series. To make the procedure run faster, a reasonable choice for ntjump is 10% or 20% of nt.

  13. The -a= argument specifies if the residuals from the trend components or the trend components are used to compute the orthogonal rotation matrix. If:

    • -a=trend, the trend components are used.
    • -a=residual, the residuals from the trends are used.

    The default is -a=trend , e.g., the trend components are used to estimate the orthogonal rotation matrix. More precisely, the covariance matrix between the trend components of the PC time series is then computed and the eigenvectors of this symmetric positive-definite matrix, ranked in decreasing order of the associated (positive) eigenvalues are calculated in order to obtain the orthogonal rotation matrix to apply to the selected EOFs and PC time series.

  14. The -mi=missing_value argument specifies the missing value indicator associated with the rotated EOF NetCDF variable in the output_rot_eof_netcdf_file. If the -mi= argument is not specified missing_value is set to 1.e+20.

  15. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file, output_rot_eof_netcdf_file.

    By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  16. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g., -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher.

    If this argument is specified, the output_rot_eof_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  17. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g., -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher.

    If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  18. It is assumed that the original EOF variable has no missing values excepted those associated with a constant land-sea mask.

  19. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  20. For more details on EOF analysis, PCA and orthogonal rotation of PC time series to selected-frequency components and LOESS smoothing of time series, see

    • “A manual for EOF and SVD analyses of climate data”, by Bjornsson, H., and Venegas, S.A., McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp, 1997. https://www.jsg.utexas.edu/fu/files/EOFSVD.pdf
    • “Statistical Analysis in Climate Research”, by von Storch, H., and Zwiers, F.W., Cambridge University press, Cambridge, UK, Chapter 13, 484 pp., 2002. ISBN: 9780521012300
    • “Principal Component Analysis”, by Jolliffe, I.T., Springer-Verlag, New York, USA, 487 pp., 2nd Ed, 2002. ISBN: 978-0-387-22440-4. https://www.springer.com/gp/book/9780387954424
    • “A user’s guide to principal components”, by Jackson, J.E., John Wiley and Sons, New York, USA, 592 pp., 2003. ISBN: 978-0-471-47134-9
    • “Rotation of principal components” by Richman, M.B., International Journal of Climatology, Vol. 6, 293-335, 1986. https://doi.org/10.1002/joc.3370060305
    • “Disentangling Global Warming, Multidecadal Variability, and El Nino in Pacific Temperatures”, by Wills, R.C., Schneider, T., Wallace, J.M., Battisti, D.S., and Hartmann, D.L., Geophysical Research Letters, Vol. 45, 2487-2496, 2018. https://doi.org/10.1002/2017GL076327
    • “Robust Locally Weighted Regression and Smoothing Scatterplots”, by Cleveland, W.S., Journal of the American Statistical Association, Vol. 74, 829-836, 1979. doi: 10.1080/01621459.1979.10481038
    • “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting”, by Cleveland, W.S., and Devlin, S.J., Journal of the American Statistical Association, Vol. 83, 596-610, 1988. doi: 10.1080/01621459.1988.10478639
    • “A Seasonal-Trend Decomposition Procedure Based on Loess”, by Cleveland, R.B., Cleveland, W.S., McRae, J.E., and Terpennings, I., Journal of Official Statistics, 6, 3-73, 1990. http://www.jos.nu/Articles/abstract.asp?article=613
    • “A Frequency Selective Filter for Short-Length Time Series”, by Iacobucci, A., and Noullez, A., Computational Economics, Vol. 25, 75-102, 2005. https://link.springer.com/article/10.1007/s10614-005-6276-7

Outputs

comp_loess_rot_eof_3d creates an output NetCDF file that contains the rotated standardized PC time series, their associated rotated EOF patterns and the variances described by the computed rotated PC time series. The ratios of filtered variance to total variance for the rotated standardized PC time series are also stored in the output NetCDF file.

The number of rotated PCs and EOFs, nrot, stored in the output NetCDF dataset is determined by the -se=selected_eofs argument. The number of observations in the output NetCDF dataset is determined from the -t=time1,time2 argument. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat and nlon are the lengths of the spatial dimensions of the input NetCDF variable eof_netcdf_variable and nrot is the number of selected PC time series, which have been rotated, as determined by the list specified with the -se=selected_eofs argument) :

  1. pc_number(nrot) : the list of rotated PC time series as specified with the -se=selected_eofs argument.

  2. eof_netcdf_variable_loess_rot_eof(nrot,nlat,nlon) : the rotated EOF patterns. These rotated EOFs are scaled such that they give the scalar products (if -a=scp in the original EOF analysis), covariances (-a=cov in the original EOF analysis) or correlations (-a=cor in the original EOF analysis) between the original observed variables and the rotated PC time series.

    The rotated EOF patterns are packed in a tridimensional variable whose first and second dimensions are exactly the same as those associated with the input NetCDF variable eof_netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, this output NetCDF variable is filled with missing values.

  3. eof_netcdf_variable_loess_rot_pc(ntime,nrot) : the rotated standardized PC time series.

    The rotated PC time series are always standardized to unit variance.

  4. eof_netcdf_variable_loess_rot_std(nrot) : the standard-deviations of the rotated PC time series.

    The squares of these statistics are equal to the raw variance described by the rotated PC time series.

  5. eof_netcdf_variable_loess_rot_var(nrot) : the proportion of variance explained by each rotated PC time series (given between 0. and 1. with 1. corresponding to 100%).

  6. eof_netcdf_variable_loess_rot_ratio(nrot) : the ratio of filtered variance to total variance for each rotated PC time series (given between 0. and 1. with 1. corresponding to 100%).

Examples

  1. For computing a rotated 10-EOF analysis towards low-frequency modes from a previous EOF analysis (stored in the file eof_HadISST1_1m_197902_200501_sst_oi.nc`) of a tridimensional NetCDF variable named ``sst, which contains monthly SST data; and stored the results of the rotated EOF analysis in a NetCDF file named loess_rot_eof_HadISST1_1m_197902_200501_sst_oi.nc, use the following command :

    $ comp_loess_rot_eof_3d \
      -f=eof_HadISST1_1m_197902_200501_sst_oi.nc  \
      -v=sst \
      -se=1:10 \
      -nt=48 \
      -m=HadISST1_mask.nc \
      -o=loess_rot_eof_HadISST1_1m_197902_200501_sst_oi.nc
    
Flag Counter