comp_stl_4d¶
Authors¶
Pascal Terray (LOCEAN/IPSL)
Latest revision¶
07/05/2021
Purpose¶
Decompose time series of a fourdimensional variable extracted from a NetCDF dataset into seasonal and trend components by the Seasonal-Trend decomposition procedure based on Loess (STL). The STL procedure is a powerful statistical technique for describing a discrete time series [Cleveland_etal] [Terrayc]. In the STL procedure, the analyzed multi-channel time series is decomposed into three terms:
X(t) = T(t) + A(t) + R(t)
where t refers to a time index, the T term is used to quantify the trend and low-frequency variations in the time series, the A term describes the harmonic component (e.g. diurnal or seasonal cycle) and its modulation through time and, finally, the R term contains the residual component.
All the terms are estimated through a sequence of applications of locally weighted regression or low-order polynomial (e.g. Loess) to data windows whose length is chosen by the user. The STL procedure is an iterative process, which may be interpreted as a frequency filter directly applicable to non-stationary (unidimensional) time series including harmonic components [Cleveland_etal]. Other important features of STL are the specification of the amounts of seasonal and trend smoothing according to the choice of the user, the ability to produce robust estimates of the trend and harmonic components that are not distorted by aberrant behavior in the data and the stationarity of the R time series.
This procedure returns the seasonal and trend components as estimated by the STL procedure and, optionally, the residuals or the sum of the residual and trend components in a NetCDF dataset, for the multi-channel time series associated with a NetCDF variable.
If the NetCDF variable is unidimensional or tridimensional use comp_stl_1d or comp_stl_3d, respectively instead of comp_stl_4d. It the time series have no seasonal (or diurnal) cycle, use comp_trend_4d instead of comp_stl_4d in order to estimate the trend component by the Loess procedure.
The exact meaning and default values for most of the optional parameters of comp_stl_4d are exactly the same as in the original procedure described by [Cleveland_etal] and the user is referred to this publication for further details on the STL procedure.
This procedure is parallelized if OpenMP is used.
Further Details¶
Usage¶
$ comp_stl_4d \
-f=input_netcdf_file \
-v=input_netcdf_variable \
-p=periodicity \
-ns=seasonal_smoother_length (ns) \
-m=input_mesh_mask_netcdf_file (optional) \
-g=grid_type (optional : n, t, u, v, w, f) \
-x=lon1,lon2 (optional) \
-y=lat1,lat2 (optional) \
-z=level1,level2 (optional) \
-t=time1,time2 (optional) \
-a=type_of_analysis (optional : stl, residual, \
residual_trend) \
-o=output_netcdf_file (optional) \
-nt=trend_smoother_length (nt) (optional) \
-nl=low_pass_smoother_length (nl) (optional) \
-isdeg=seasonal_smoother_degree (isdeg) (optional : 0, 1 ) \
-itdeg=trend_smoother_degree (itdeg) (optional : 0, 1, 2 ) \
-ildeg=low_pass_smoother_degree (ildeg) (optional : 0, 1, 2 ) \
-nsjump=seasonal_skipping_value (nsjump) (optional) \
-ntjump=trend_skipping_value (ntjump) (optional) \
-nljump=low_pass_skipping_value (nljump) (optional) \
-maxit=max_robustness_iterations (maxit) (optional) \
-ni=number_of_loops (ni) (optional) \
-sms=seasonal_smoothing_factor (optional) \
-smt=trend_smoothing_factor (optional) \
-robust (optional) \
-mi=missing_value (optional) \
-double (optional) \
-hdf5 (optional) \
-bigfile (optional) \
-tlimited (optional)
By default¶
- -m=
- an input_mesh_mask_netcdf_file is not used
- -g=
- the grid_type is set to
n
which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian- -x=
- the whole longitude domain associated with the netcdf_variable
- -y=
- the whole latitude domain associated with the netcdf_variable
- -z=
- the whole vertical resolution associated with the netcdf_variable
- -t=
- the whole time period associated with the netcdf_variable
- -a=
- the type_of_analysis is set to
stl
. This means that the residual times series are not computed and not stored in the output_netcdf_file- -o=
- the output_netcdf_file is named
stl_
netcdf_variable.nc
- -nt=
- the trend_smoother_length is set to the smallest odd integer greater than or equal to
(1.5*periodicity) / (1-(1.5/ns))
- -nl=
- the low_pass_smoother_length is set to the smallest odd integer greater than or equal to periodicity
- -isdeg=
- the seasonal_smoother_degree is set to
1
- -itdeg=
- the trend_smoother_degree is set to
1
- -ildeg=
- the low_pass_smoother_degree is set to the trend_smoother_degree
- -nsjump=
- the seasonal_skipping_value is set to
ns/10
- -ntjump=
- the trend_skipping_value is set to
nt/10
- -nljump=
- the low_pass_skipping_value is set to
nl/10
- -maxit=
- the max_robustness_iterations is set to
15
- -ni=
- the number_of_loops is set to
1
if -robust is specified and2
otherwise- -sms=
- no smoothing is applied to the seasonal component
- -smt=
- no smoothing is applied to the trend component
- -robust
- robustness iterations are not used, by default.
- -mi=
- the missing_value for the output variables is equal to
1.e+20
- -double
- the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
- -bigfile
- a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
- -hdf5
- a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
- -tlimited
- the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file
Remarks¶
The -v=netcdf_variable argument specifies the NetCDF variable from which the multi-channel time series must be decomposed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.
If the -x=lon1,lon2, -y=lat1,lat2 and -z=level1,level2 arguments are missing, the whole geographical domain and vertical resolution associated with the netcdf_variable is used to select the multi-channel time series.
The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to
1
. Negative values are allowed for lon1. In this case the longitude domain is fromnlon
+lon1+1
to lon2 wherenlon
is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.Refer to comp_mask_4d for transforming geographical coordinates as indices before using comp_stl_4d.
If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used to decompose the time series.
The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
. Note that the output NetCDF file will haventime
= time2 - time1 +1
time observations.If -g= is set to
t
,u
,v
,w
orf
it is assumed that the NetCDF variable is from an experiment with the NEMO or ORCA model.If -g= is set to
n
, it is assumed that the 3-D grid-mesh is regular or Gaussian and as such has no duplicate points.This argument is also used to determine the name of the NetCDF mesh_mask variable if an input_mesh_mask_netcdf_file is used as specified with the -m= argument
The geographical shapes of the netcdf_variable (in the input_netcdf_file), the mask (in the input_mesh_mask_netcdf_file) must agree if the -m= argument is used.
The -p=periodicity argument gives the periodicity of the input data. For example, with monthly data -p=
12
should be specified. The periodicity must be greater than2
. If your multi-channel time series do not have a periodic component, e.g. if periodicity is equal to1
, use comp_trend_4d instead of comp_stl_4d.The -ns= argument specifies the length of the seasonal smoother,
ns
. The value ofns
should be an odd integer greater than or equal to3
; a value greater than6
is recommended. As the value of the -ns= argument increases the values of the seasonal component at a given point in the seasonal cycle (e.g., January values of a monthly series with a yearly cycle) become smoother. However, the value ofns
has no direct effect on the smoothness of successive values of the seasonal component.The -nt= argument specifies the length of the trend smoother,
nt
. The value ofnt
should be an odd integer greater than or equal to3
; a value ofnt
between1.5
*periodicity and2
*periodicity is recommended. Asnt
increases the values of the trend component become smoother.The -nl= argument specifies the length of the low-pass smoother,
nl
. The value ofnl
should be an odd integer greater than or equal to3
; the smallest odd integer greater than or equal to periodicity is recommended.The -isdeg= argument specifies the degree of locally-fitted polynomial in seasonal smoothing. The value is
0
or1
.The -itdeg= argument specifies the degree of locally-fitted polynomial in trend smoothing. The value is
0
,1
or2
.The -ildeg= argument specifies the degree of locally-fitted polynomial in low-pass smoothing. The value is
0
,1
or2
.The -nsjump= argument specifies the skipping value for seasonal smoothing. The seasonal smoother skips ahead
nsjump
points and then linearly interpolates in between. The value ofnsjump
should be a positive integer; ifnsjump
is set to1
, a seasonal smooth is calculated at all points in the time series. To make the procedure run faster, a reasonable choice fornsjump
is10
% or20
% ofns
. The default value isns/10
.The -ntjump= argument specifies the skipping value for trend smoothing. The default value is
nt/10
.The -nljump= argument specifies the skipping value for the low-pass filter. The default value is
nl/10
.The -ni= argument specifies the number of loops for updating the seasonal and trend components. The value of
ni
should be a positive integer. If the data are well behaved without outliers, then robustness iterations are not needed. In this case do not use the -robust argument, and setni
between2
and5
depending on how much security you want that the seasonal-trend looping converges. If outliers are present then use the -robust argument and setni
to1
or2
.The -a= argument specifies if the residuals from the trend and seasonal components are stored in the output NetCDF file. If:
- -a=
stl
means that the residuals are not computed- -a=
residual
means that the residuals are computed and stored- -a=
residual_trend
means that the residual and trend components are summed and the result is stored.The default is -a=
stl
, e.g. the residuals are not stored. Note that in all cases, the seasonal and trend components are computed and stored in the output NetCDF file.If -robust is specified, robustness iterations are carried out until convergence of both seasonal and trend components or with a maximum of max_robustness_iterations iterations as specified by the -maxit= argument. Convergence occurs if the maximum changes in individual seasonal and trend fits are less than
1
% of the component’s range after the previous iteration.-sms=seasonal_smoothing_factor means that the seasonal component extracted from the netcdf_variable (e.g. the -v= argument) must be smoothed with a moving average of approximately
2
*seasonal_smoothing_factor+1
terms. The smoothing is applied before estimating the residuals (e.g. if -a= argument is set toresidual
orresidual_trend
). seasonal_smoothing_factor must be a strictly positive integer (e.g. greater than0
). Note that this additional smoothing is not implemented in the original STL procedure.-smt=trend_smoothing_factor means that the trend component extracted from the netcdf_variable (e.g. the -v= argument) must be smoothed with a moving average of approximately
2
*trend_smoothing_factor+1
terms. The smoothing is applied before estimating the residuals (e.g. if -a= argument is set toresidual
orresidual_trend
). trend_smoothing_factor must be a strictly positive integer (e.g. greater than0
). Note that this additional smoothing is not implemented in the original STL procedure.The -mi=missing_value argument specifies the missing value indicator for the output variables in the output_netcdf_file. If the -mi= argument is not specified, the missing_value is set to
1.e+20
.The -double argument specifies that, the results are stored as double-precision floating point numbers in the output NetCDF file. By default, the results are stored as single-precision floating point numbers in the output NetCDF file.
The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g.
-D_USE_NETCDF36
or-D_USE_NETCDF4
) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g.
-D_USE_NETCDF4
) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.It is assumed that the data has no missing values excepted those associated with a constant land-sea mask.
It the time series have no seasonal cycle, use comp_trend_4d instead of comp_stl_4d.
Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.
For more details on the STL procedure and examples of use in the climate literature, see
- “A Seasonal-Trend Decomposition Procedure Based on Loess”, by Cleveland, R.B., Cleveland, W.S., McRae, J.E., and Terpennings, I., Journal of Official Statistics, 6, 3-73, 1990. http://www.jos.nu/Articles/abstract.asp?article=613
- “Southern Hemisphere extra-tropical forcing: A new paradigm for El Nino-Southern Oscillation” by Terray, P., Climate Dynamics, Vol. 36, 2171-2199, 2011. doi: 10.1007/s00382-010-0825-z
Outputs¶
comp_stl_4d creates an output NetCDF file that contains the trend and harmonic components extracted from the time series associated with the input NetCDF variable. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlev, nlat and nlon are the length of the vertical and spatial dimensions of the input NetCDF variable and ntime is the selected number of time observations) :
netcdf_variable_trend
(ntime,nlev,nlat,nlon)
: the trend component for each of the time series of the 3-D grid-mesh associated with the input NetCDF variable.netcdf_variable_seasonal
(ntime,nlev,nlat,nlon)
: the harmonic component for each of the time series of the 3-D grid-mesh associated with the input NetCDF variable.netcdf_variable_residual
(ntime,nlev,nlat,nlon)
: the residual component for each of the time series of the 3-D grid-mesh associated with the input NetCDF variable.This variable is stored only if the -a=
residual
argument has been specified when calling comp_stl_4d.netcdf_variable_residual_trend
(ntime,nlev,nlat,nlon)
: the sum of the residual and trend components for each of the time series of the 3-D grid-mesh associated with the input NetCDF variable.This variable is stored only if the -a=
residual_trend
argument has been specified when calling comp_stl_4d.The trend, harmonic and residual components are packed in fourdimensional variables whose first, second and third dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x=, -y= and -z= arguments. However, outside the selected domain, the output NetCDF variables are filled with missing values.
Examples¶
For computing a STL decomposition from the fourdimensional NetCDF variable called
votemper
extracted from the fileST7_1m_00101_20012_votemper_grid_T.nc
, which includes monthly time series, and store the results in the NetCDF filestl_ST7_1m_00101_20012_votemper_grid_T.nc
, use the following commands :$ comp_stl_4d \ -f=ST7_1m_00101_20012_votemper_grid_T.nc \ -v=votemper \ -p=12 \ -ns=35 \ -nt=30 \ -o=stl_ST7_1m_00101_20012_votemper_grid_T.nc