comp_trend_1d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

29/05/2024

Purpose

Extract a trend component from an unidimensional variable in a NetCDF dataset by using a LOESS smoother [Cleveland] [Cleveland_Devlin]. The LOESS procedure is a powerful statistical technique for describing a discrete time series (see the references in the Remarks Section below). In the LOESS procedure, the analyzed time series is decomposed into two terms:

X(t) = T(t) + R(t)

where t refers to a time index, the T term is used to quantify the trend and low-frequency variations in the time series, and, finally, the R term contains the residual component.

The trend is estimated through a sequence of applications of locally weighted regression or low-order polynomial (e.g., Loess) to data windows whose length is chosen by the user. Other important features of the LOESS method are the specification of the amount of trend smoothing according to the choice of the user, the ability to produce a robust estimate of the trend component that is not distorted by aberrant behavior in the data and the stationarity of the R residual time series.

This procedure returns the trend component as estimated by the LOESS procedure and, optionally, the residuals in a NetCDF dataset, for the raw time series associated with a NetCDF variable.

If the NetCDF variable is trid- or fourdimensional use comp_trend_3d or comp_trend_4d, respectively instead of comp_trend_1d. It the time series has a seasonal (or diurnal) cycle, use comp_stl_1d instead of comp_trend_1d in order to estimate the trend component by the Loess procedure.

The exact meaning and default values for most of the optional parameters of comp_trend_1d are exactly the same as in the original (STL) procedure described by [Cleveland_etal] and the user is referred to this publication for further details on the LOESS procedure as implemented here.

Further Details

Usage

$ comp_trend_1d \
  -f=input_netcdf_file \
  -v=input_netcdf_variable \
  -nt=trend_smoother_length        (nt) \
  -t=time1,time2                             (optional) \
  -a=type_of_analysis                        (optional : trend, residual)  \
  -o=output_netcdf_file                      (optional) \
  -itdeg=trend_smoother_degree     (itdeg)   (optional : 0, 1, 2 ) \
  -ntjump=trend_skipping_value     (ntjump)  (optional) \
  -maxit=max_robustness_iterations (maxit)   (optional) \
  -smt=trend_smoothing_factor                (optional) \
  -robust                                    (optional) \
  -double                                    (optional) \
  -hdf5                                      (optional) \
  -tlimited                                  (optional)

By default

-t=
the whole time period associated with the netcdf_variable
-a=
the type_of_analysis is set to trend. This means that the residual times series is not computed and not stored in the output_netcdf_file
-o=
the output_netcdf_file is named trend_netcdf_variable.nc
-itdeg=
the trend_smoother_degree is set to 1
-ntjump=
the trend_skipping_value is set to nt/10 where nt is the value of the trend_smoother_length argument
-maxit=
the max_robustness_iterations is set to 15
-smt=
no smoothing is applied to the trend component
-robust
robustness iterations are not used, by default
-double
the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable from which the time series must be decomposed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.

  2. If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used to decompose the time series.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the output NetCDF file will have ntime = time2 - time1 + 1 time observations.

  3. The -nt= argument specifies the length of the trend smoother, nt. The value of nt should be an odd integer greater than or equal to 3. As nt increases the values of the trend component become smoother.

  4. The -itdeg= argument specifies the degree of the locally-fitted polynomial in trend smoothing. The value is 0, 1 or 2.

  5. The -ntjump= argument specifies the skipping value for trend smoothing. The trend smoother skips ahead ntjump points and then linearly interpolates in between. The value of ntjump should be a positive integer; if ntjump is set to 1, a trend smooth is calculated at all points in the time series. To make the procedure run faster, a reasonable choice for ntjump is 10% or 20% of nt.

  6. The -a= argument specifies if the residuals from the trend component are stored in the output NetCDF file. If:

    • -a=trend, the residuals are not computed
    • -a=residual, the residuals are computed and stored.

    The default is -a=trend , e.g., the residuals are not stored. Note that in all cases, the trend component is computed and stored in the output NetCDF file.

  7. If -robust is specified, robustness iterations are carried out until convergence of the trend component or with a maximum of maxit iterations as specified by the -maxit= argument. Convergence occurs if the maximum changes in individual trend fits are less than 1% of the component’s range after the previous iteration.

  8. -smt=trend_smoothing_factor means that the trend component extracted from the netcdf_variable (e.g., the -v= argument) must be smoothed with a moving average of approximately 2*trend_smoothing_factor+1 terms. The smoothing is applied before estimating the residuals (e.g., if -a= argument is set to residual ). trend_smoothing_factor must be a strictly positive integer (e.g., greater than 0). Note that this additional smoothing is not implemented in the original LOESS procedure.

  9. The -double argument specifies that, the results are stored as double-precision floating point numbers in the output NetCDF file.

    By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  10. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g., -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher.

    If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  11. It is assumed that the data has no missing values.

  12. It the time series have a seasonal or diurnal cycle, use comp_stl_1d instead of comp_trend_1d.

  13. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  14. For more details on the LOESS procedure, see

    • “Robust Locally Weighted Regression and Smoothing Scatterplots”, by Cleveland, W.S., Journal of the American Statistical Association, Vol. 74, 829-836, 1979. doi: 10.1080/01621459.1979.10481038
    • “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting”, by Cleveland, W.S., and Devlin, S.J., Journal of the American Statistical Association, Vol. 83, 596-610, 1988. doi: 10.1080/01621459.1988.10478639
    • “A Seasonal-Trend Decomposition Procedure Based on Loess”, by Cleveland, R.B., Cleveland, W.S., McRae, J.E., and Terpennings, I., Journal of Official Statistics, 6, 3-73, 1990. http://www.jos.nu/Articles/abstract.asp?article=613

Outputs

comp_trend_1d creates an output NetCDF file that contains the trend and residual components extracted from the time series associated with the input NetCDF variable. The output NetCDF dataset contains the following NetCDF variables (in the description below, ntime is the selected number of time observations) :

  1. netcdf_variable_trend(ntime) : the trend component for the time series associated with the input NetCDF variable.

  2. netcdf_variable_residual(ntime) : the residual component for the time series associated with the input NetCDF variable.

    This variable is stored only if the -a=residual argument has been specified when calling comp_trend_1d.

Examples

  1. For estimating a trend from the unidimensional NetCDF variable called ts extracted from the file ts_gfdl_cm2_0.picntrl_monthly_glob.nc, which includes a monthly anomalies time series, and store the results in the NetCDF file ts_gfdl_cm2_0.picntrl_monthly_glob_trend.nc, use the following command :

    $ comp_trend_1d \
      -f=ts_gfdl_cm2_0.picntrl_monthly_glob.nc \
      -v=ts \
      -nt=111 \
      -a=trend \
      -robust \
      -o=ts_gfdl_cm2_0.picntrl_monthly_glob_trend.nc
    
Flag Counter