comp_stl_3d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

07/05/2021

Purpose

Decompose time series of a tridimensional variable extracted from a NetCDF data set into seasonal and trend components by the Seasonal-Trend decomposition procedure based on Loess (STL) [Cleveland_etal]. The STL procedure is a powerful statistical technique for describing a discrete time series [Cleveland_etal] [Terrayc]. In the STL procedure, the analyzed multi-channel time series is decomposed into three terms:

X(t) = T(t) + A(t) + R(t)

where t refers to a time index, the T term is used to quantify the trend and low-frequency variations in the time series, the A term describes the harmonic component (e.g. diurnal or seasonal cycle) and its modulation through time and, finally, the R term contains the residual component.

All the terms are estimated through a sequence of applications of locally weighted regression or low-order polynomial (e.g. Loess) to data windows whose length is chosen by the user. The STL procedure is an iterative process, which may be interpreted as a frequency filter directly applicable to non-stationary (unidimensional) time series including harmonic components [Cleveland_etal]. Other important features of STL are the specification of the amounts of seasonal and trend smoothing according to the choice of the user, the ability to produce robust estimates of the trend and harmonic components that are not distorted by aberrant behavior in the data and the stationarity of the R time series.

This procedure returns the seasonal and trend components as estimated by the STL procedure and, optionally, the residuals or the sum of the residual and trend components in a NetCDF dataset, for the multi-channel time series associated with a NetCDF variable.

If the NetCDF variable is unidimensional or fourdimensional use comp_stl_1d or comp_stl_4d, respectively instead of comp_stl_3d. It the time series have no seasonal (or diurnal) cycle, use comp_trend_3d instead of comp_stl_3d in order to estimate the trend component by the Loess procedure.

The exact meaning and default values for most of the optional parameters of comp_stl_3d are exactly the same as in the original procedure described by [Cleveland_etal] and the user is referred to this publication for further details on the STL procedure.

This procedure is parallelized if OpenMP is used.

Further Details

Usage

$ comp_stl_3d \
  -f=input_netcdf_file \
  -v=input_netcdf_variable \
  -p=periodicity \
  -ns=seasonal_smoother_length     (ns) \
  -m=input_mesh_mask_netcdf_file             (optional) \
  -g=grid_type                               (optional : n, t, u, v, w, f) \
  -x=lon1,lon2                               (optional) \
  -y=lat1,lat2                               (optional) \
  -t=time1,time2                             (optional) \
  -a=type_of_analysis                        (optional : stl, residual,  \
                                                         residual_trend) \
  -o=output_netcdf_file                      (optional) \
  -nt=trend_smoother_length        (nt)      (optional) \
  -nl=low_pass_smoother_length     (nl)      (optional) \
  -isdeg=seasonal_smoother_degree  (isdeg)   (optional : 0, 1 ) \
  -itdeg=trend_smoother_degree     (itdeg)   (optional : 0, 1, 2 ) \
  -ildeg=low_pass_smoother_degree  (ildeg)   (optional : 0, 1, 2 ) \
  -nsjump=seasonal_skipping_value  (nsjump)  (optional) \
  -ntjump=trend_skipping_value     (ntjump)  (optional) \
  -nljump=low_pass_skipping_value  (nljump)  (optional) \
  -maxit=max_robustness_iterations (maxit)   (optional) \
  -ni=number_of_loops              (ni)      (optional) \
  -sms=seasonal_smoothing_factor             (optional) \
  -smt=trend_smoothing_factor                (optional) \
  -robust                                    (optional) \
  -mi=missing_value                          (optional) \
  -double                                    (optional) \
  -hdf5                                      (optional) \
  -bigfile                                   (optional) \
  -tlimited                                  (optional)

By default

-m=
an input_mesh_mask_netcdf_file is not used
-g=
the grid_type is set to n which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian
-x=
the whole longitude domain associated with the netcdf_variable
-y=
the whole latitude domain associated with the netcdf_variable
-t=
the whole time period associated with the netcdf_variable
-a=
the type_of_analysis is set to stl. This means that the residual times series are not computed and not stored in the output_netcdf_file
-o=
the output_netcdf_file is named stl_netcdf_variable.nc
-nt=
the trend_smoother_length is set to the smallest odd integer greater than or equal to (1.5*periodicity) / (1-(1.5/ns))
-nl=
the low_pass_smoother_length is set to the smallest odd integer greater than or equal to periodicity
-isdeg=
the seasonal_smoother_degree is set to 1
-itdeg=
the trend_smoother_degree is set to 1
-ildeg=
the low_pass_smoother_degree is set to the trend_smoother_degree
-nsjump=
the seasonal_skipping_value is set to ns/10
-ntjump=
the trend_skipping_value is set to nt/10
-nljump=
the low_pass_skipping_value is set to nl/10
-maxit=
the max_robustness_iterations is set to 15
-ni=
the number_of_loops is set to 1 if -robust is specified and 2 otherwise
-sms=
no smoothing is applied to the seasonal component
-smt=
no smoothing is applied to the trend component
-robust
robustness iterations are not used, by default
-mi=
the missing_value for the output variables is equal to 1.e+20
-double
the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable from which the multi-channel time series must be decomposed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.

  2. If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing, the whole geographical domain associated with the netcdf_variable is used to select the multi-channel time series.

    The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_3d for transforming geographical coordinates as indices before using comp_stl_3d.

  3. If the -t=time1,time2 argument is missing the whole time period associated with the netcdf_variable is used to decompose the time series.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1. Note that the output NetCDF file will have ntime = time2 - time1 + 1 time observations.

  4. The geographical shapes of the netcdf_variable (in the input_netcdf_file), the mask (in the input_mesh_mask_netcdf_file) must agree if the -m= argument is used.

  5. If -g= is set to t, u, v, w or f it is assumed that the NetCDF variable is from an experiment with the NEMO or ORCA model.

    If -g= is set to n, it is assumed that the 2-D grid-mesh is regular or Gaussian and as such has no duplicate points.

    This argument is also used to determine the name of the NetCDF mesh_mask variable if an input_mesh_mask_netcdf_file is used as specified with the -m= argument

  6. The -p=periodicity argument gives the periodicity of the input data. For example, with monthly data -p=12 should be specified. The periodicity must be greater than 2. If your time series do not have a periodic component, e.g. if periodicity is equal to 1 , use comp_trend_3d instead of comp_stl_3d.

  7. The -ns= argument specifies the length of the seasonal smoother, ns. The value of ns should be an odd integer greater than or equal to 3; a value greater than 6 is recommended. As the value of the -ns= argument increases the values of the seasonal component at a given point in the seasonal cycle (e.g., January values of a monthly series with a yearly cycle) become smoother. However, the value of ns has no direct effect on the smoothness of successive values of the seasonal component.

  8. The -nt= argument specifies the length of the trend smoother, nt. The value of nt should be an odd integer greater than or equal to 3; a value of nt between 1.5*periodicity and 2*periodicity is recommended. As nt increases the values of the trend component become smoother.

  9. The -nl= argument specifies the length of the low-pass smoother, nl. The value of nl should be an odd integer greater than or equal to 3; the smallest odd integer greater than or equal to periodicity is recommended.

  10. The -isdeg= argument specifies the degree of locally-fitted polynomial in seasonal smoothing. The value is 0 or 1.

  11. The -itdeg= argument specifies the degree of locally-fitted polynomial in trend smoothing. The value is 0, 1 or 2.

  12. The -ildeg= argument specifies the degree of locally-fitted polynomial in low-pass smoothing. The value is 0, 1 or 2.

  13. The -nsjump= argument specifies the skipping value for seasonal smoothing. The seasonal smoother skips ahead nsjump points and then linearly interpolates in between. The value of nsjump should be a positive integer; if nsjump is set to 1, a seasonal smooth is calculated at all points in the time series. To make the procedure run faster, a reasonable choice for nsjump is 10% or 20% of ns. The default value is ns/10.

  14. The -ntjump= argument specifies the skipping value for trend smoothing. The default value is nt/10.

  15. The -nljump= argument specifies the skipping value for the low-pass filter. The default value is nl/10.

  16. The -ni= argument specifies the number of loops for updating the seasonal and trend components. The value of ni should be a positive integer. If the data are well behaved without outliers, then robustness iterations are not needed. In this case do not use the -robust argument, and set ni between 2 and 5 depending on how much security you want that the seasonal-trend looping converges. If outliers are present then use the -robust argument and set ni to 1 or 2.

  17. The -a= argument specifies if the residuals from the trend and seasonal components are stored in the output NetCDF file. If:

    • -a=stl means that the residuals are not computed
    • -a=residual means that the residuals are computed and stored
    • -a=residual_trend means that the residual and trend components are summed and the result is stored.

    The default is -a=stl , e.g. the residuals are not stored. Note that in all cases, the seasonal and trend components are computed and stored in the output NetCDF file.

  18. If -robust is specified, robustness iterations are carried out until convergence of both seasonal and trend components or with a maximum of max_robustness_iterations iterations as specified by the -maxit= argument. Convergence occurs if the maximum changes in individual seasonal and trend fits are less than 1% of the component’s range after the previous iteration.

  19. -sms=seasonal_smoothing_factor means that the seasonal component extracted from the netcdf_variable (e.g. the -v= argument) must be smoothed with a moving average of approximately 2*seasonal_smoothing_factor+1 terms. The smoothing is applied before estimating the residuals (e.g. if -a= argument is set to residual or residual_trend ). seasonal_smoothing_factor must be a strictly positive integer (e.g. greater than 0). Note that this additional smoothing is not implemented in the original STL procedure.

  20. -smt=trend_smoothing_factor means that the trend component extracted from the netcdf_variable (e.g. the -v= argument) must be smoothed with a moving average of approximately 2*trend_smoothing_factor+1 terms. The smoothing is applied before estimating the residuals (e.g. if -a= argument is set to residual or residual_trend ). trend_smoothing_factor must be a strictly positive integer (e.g. greater than 0). Note that this additional smoothing is not implemented in the original STL procedure.

  21. The -mi=missing_value argument specifies the missing value indicator for the output variables in the output_netcdf_file. If the -mi= argument is not specified, the missing_value is set to 1.e+20.

  22. The -double argument specifies that, the results are stored as double-precision floating point numbers in the output NetCDF file. By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  23. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g. -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  24. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  25. It is assumed that the data has no missing values excepted those associated with a constant land-sea mask.

  26. It the time series have no seasonal cycle, use comp_trend_3d instead of comp_stl_3d.

  27. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  28. For more details on the STL procedure and examples of use in the climate literature, see

    • “A Seasonal-Trend Decomposition Procedure Based on Loess”, by Cleveland, R.B., Cleveland, W.S., McRae, J.E., and Terpennings, I., Journal of Official Statistics, 6, 3-73, 1990. http://www.jos.nu/Articles/abstract.asp?article=613
    • “Southern Hemisphere extra-tropical forcing: A new paradigm for El Nino-Southern Oscillation” by Terray, P., Climate Dynamics, Vol. 36, 2171-2199, 2011. doi: 10.1007/s00382-010-0825-z

Outputs

comp_stl_3d creates an output NetCDF file that contains the trend and harmonic components extracted from the time series associated with the input NetCDF variable. The output NetCDF data set contains the following NetCDF variables (in the description below, nlat and nlon are the length of the spatial dimensions of the input NetCDF variable and ntime is the selected number of time observations) :

  1. netcdf_variable_trend(ntime,nlat,nlon) : the trend component for each of the time series of the 2-D grid-mesh associated with the input NetCDF variable.

  2. netcdf_variable_seasonal(ntime,nlat,nlon) : the harmonic component for each of the time series of the 2-D grid-mesh associated with the input NetCDF variable.

  3. netcdf_variable_residual(ntime,nlat,nlon) : the residual component for each of the time series of the 2-D grid-mesh associated with the input NetCDF variable.

    This variable is stored only if the -a=residual argument has been specified when calling comp_stl_3d.

  4. netcdf_variable_residual_trend(ntime,nlat,nlon) : the sum of the residual and trend components for each of the time series of the 2-D grid-mesh associated with the input NetCDF variable.

    This variable is stored only if the -a=residual_trend argument has been specified when calling comp_stl_3d.

The trend, harmonic and residual components are packed in tridimensional variables whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variables are filled with missing values.

Examples

  1. For computing a STL decomposition from the tridimensional NetCDF variable called sosstsst extracted from the file ST7_1m_00101_20012_sosstsst_grid_T.nc, which includes monthly time series, and store the results in the NetCDF file stl_ST7_1m_00101_20012_sosstsst_grid_T.nc, use the following command :

    $ comp_stl_3d \
      -f=ST7_1m_00101_20012_sosstsst_grid_T.nc \
      -v=sosstsst \
      -p=12 \
      -ns=35 \
      -nt=127 \
      -a=residual \
      -robust \
      -o=stl_ST7_1m_00101_20012_sosstsst_grid_T.nc