comp_trend_3d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

05/05/2021

Purpose

Extract a trend component from a tridimensional variable in a NetCDF dataset by using a LOESS smoother [Cleveland] [Cleveland_Devlin]. The LOESS procedure is a powerful statistical technique for describing a discrete time series (see the references in the Remarks Section below). In the LOESS procedure, the analyzed multi-channel time series is decomposed into two terms:

X(t) = T(t) + R(t)

where t refers to a time index, the T term is used to quantify the trend and low-frequency variations in the time series, and, finally, the R term contains the residual component.

The trend is estimated through a sequence of applications of locally weighted regression or low-order polynomial (e.g. Loess) to data windows whose length is chosen by the user. Other important features of the LOESS method are the specification of the amount of trend smoothing according to the choice of the user, the ability to produce robust estimates of the trend component that are not distorted by aberrant behavior in the data and the stationarity of the R time series.

This procedure returns the trend component as estimated by the LOESS procedure and, optionally, the residuals in a NetCDF dataset, for the multi-channel time series associated with a NetCDF variable.

If the NetCDF variable is unidimensional or fourdimensional use comp_trend_1d or comp_trend_4d, respectively instead of comp_trend_3d. It the time series have a seasonal (or diurnal) cycle, use comp_stl_3d instead of comp_trend_3d in order to estimate the trend component by the Loess procedure.

The exact meaning and default values for most of the optional parameters of comp_trend_3d are exactly the same as in the original (STL) procedure described by [Cleveland_etal] and the user is referred to this publication for further details on the LOESS procedure as implemented here.

This procedure is parallelized if OpenMP is used.

Further Details

Usage

$ comp_trend_3d \
  -f=input_netcdf_file \
  -v=input_netcdf_variable \
  -nt=trend_smoother_length        (nt) \
  -m=input_mesh_mask_netcdf_file             (optional) \
  -g=grid_type                               (optional : n, t, u, v, w, f) \
  -x=lon1,lon2                               (optional) \
  -y=lat1,lat2                               (optional) \
  -t=time1,time2                             (optional) \
  -a=type_of_analysis                        (optional : trend, residual)  \
  -o=output_netcdf_file                      (optional) \
  -itdeg=trend_smoother_degree     (itdeg)   (optional : 0, 1, 2 ) \
  -ntjump=trend_skipping_value     (ntjump)  (optional) \
  -maxit=max_robustness_iterations (maxit)   (optional) \
  -smt=trend_smoothing_factor                (optional) \
  -robust                                    (optional) \
  -mi=missing_value                          (optional) \
  -double                                    (optional) \
  -hdf5                                      (optional) \
  -bigfile                                   (optional) \
  -tlimited                                  (optional)

By default

-m=
an input_mesh_mask_netcdf_file is not used
-g=
the grid_type is set to n which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian
-x=
the whole longitude domain associated with the netcdf_variable
-y=
the whole latitude domain associated with the netcdf_variable
-t=
the whole time period associated with the netcdf_variable
-a=
the type_of_analysis is set to trend. This means that the residual times series are not computed and not stored in the output_netcdf_file
-o=
the output_netcdf_file is named trend_netcdf_variable.nc
-itdeg=
the trend_smoother_degree is set to 1
-ntjump=
the trend_skipping_value is set to nt/10 where nt is the value of the trend_smoother_length argument
-maxit=
the max_robustness_iterations is set to 15
-smt=
no smoothing is applied to the trend component
-robust
robustness iterations are not used, by default
-mi=
the missing_value for the output variables is equal to 1.e+20
-double
the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable from which the multi-channel time series must be decomposed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.

  2. If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing, the whole geographical domain associated with the netcdf_variable is used to select the multi-channel time series.

    The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_3d for transforming geographical coordinates as indices before using comp_trend_3d.

  3. If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used to decompose the time series.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the output NetCDF file will have ntime = time2 - time1 + 1 time observations.

  4. The geographical shapes of the netcdf_variable (in the input_netcdf_file) and the mask (in the input_mesh_mask_netcdf_file) must agree if the -m= argument is used.

  5. If -g= is set to t, u, v, w or f it is assumed that the NetCDF variable is from an experiment with the NEMO or ORCA model.

    If -g= is set to n, it is assumed that the 2-D grid-mesh is regular or Gaussian and as such has no duplicate points.

    This argument is also used to determine the name of the NetCDF mesh_mask variable if an input_mesh_mask_netcdf_file is used as specified with the -m= argument.

  6. The -nt= argument specifies the length of the trend smoother, nt. The value of nt should be an odd integer greater than or equal to 3. As nt increases the values of the trend component become smoother.

  7. The -itdeg= argument specifies the degree of the locally-fitted polynomial in trend smoothing. The value is 0, 1 or 2.

  8. The -ntjump= argument specifies the skipping value for trend smoothing. The trend smoother skips ahead ntjump points and then linearly interpolates in between. The value of ntjump should be a positive integer; if ntjump is set to 1, a trend smooth is calculated at all points in the time series. To make the procedure run faster, a reasonable choice for ntjump is 10% or 20% of nt.

  9. The -a= argument specifies if the residuals from the trend component are stored in the output NetCDF file. If:

    • -a=trend, the residuals are not computed
    • -a=residual, the residuals are computed and stored.

    The default is -a=trend , e.g. the residuals are not stored. Note that in all cases, the trend component is computed and stored in the output NetCDF file.

  10. If -robust is specified, robustness iterations are carried out until convergence of the trend component or with a maximum of maxit iterations as specified by the -maxit= argument. Convergence occurs if the maximum changes in individual trend fits are less than 1% of the component’s range after the previous iteration.

  11. -smt=trend_smoothing_factor means that the trend component extracted from the netcdf_variable (e.g. the -v= argument) must be smoothed with a moving average of approximately 2*trend_smoothing_factor+1 terms. The smoothing is applied before estimating the residuals (e.g. if -a= argument is set to residual ). trend_smoothing_factor must be a strictly positive integer (e.g. greater than 0). Note that this additional smoothing is not implemented in the original LOESS procedure.

  12. The -mi=missing_value argument specifies the missing value attribute for the output variables in the output_netcdf_file. If the -mi= argument is not specified, the missing_value is set to 1.e+20.

  13. The -double argument specifies that, the results are stored as double-precision floating point numbers in the output NetCDF file. By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  14. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g. -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  15. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  16. It is assumed that the data has no missing values excepted those associated with a constant land-sea mask.

  17. It the time series have a seasonal or diurnal cycle, use comp_stl_3d instead of comp_trend_3d.

  18. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  19. For more details on the LOESS procedure, see

    • “Robust Locally Weighted Regression and Smoothing Scatterplots”, by Cleveland, W.S., Journal of the American Statistical Association, Vol. 74, 829-836, 1979. doi: 10.1080/01621459.1979.10481038
    • “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting”, by Cleveland, W.S., and Devlin, S.J., Journal of the American Statistical Association, Vol. 83, 596-610, 1988. doi: 10.1080/01621459.1988.10478639
    • “A Seasonal-Trend Decomposition Procedure Based on Loess”, by Cleveland, R.B., Cleveland, W.S., McRae, J.E., and Terpennings, I., Journal of Official Statistics, 6, 3-73, 1990. http://www.jos.nu/Articles/abstract.asp?article=613

Outputs

comp_trend_3d creates an output NetCDF file that contains the trend and residual components extracted from the time series associated with the input NetCDF variable. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat and nlon are the length of the spatial dimensions of the input NetCDF variable and ntime is the selected number of time observations) :

  1. netcdf_variable_trend(ntime,nlat,nlon) : the trend component for each of the time series of the 2-D grid-mesh associated with the input NetCDF variable.

  2. netcdf_variable_residual(ntime,nlat,nlon) : the residual component for each of the time series of the 2-D grid-mesh associated with the input NetCDF variable.

    This variable is stored only if the -a=residual argument has been specified when calling comp_trend_3d.

The trend and residual components are packed in tridimensional variables whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variables are filled with missing values.

Examples

  1. For estimating a trend from the tridimensional NetCDF variable called sosstsst extracted from the file ST7_1m_00101_20012_sosstsst_ano_grid_T.nc, which includes monthly anomalies time series, and store the results in the NetCDF file stl_ST7_1m_00101_20012_sosstsst_ano_grid_T.nc, use the following command :

    $ comp_trend_3d \
      -f=ST7_1m_00101_20012_sosstsst_ano_grid_T.nc \
      -v=sosstsst \
      -nt=127 \
      -a=residual \
      -robust \
      -o=stl_ST7_1m_00101_20012_sosstsst_ano_grid_T.nc