comp_trend_3d¶
Authors¶
Pascal Terray (LOCEAN/IPSL)
Latest revision¶
05/05/2021
Purpose¶
Extract a trend component from a tridimensional variable in a NetCDF dataset by using a LOESS smoother [Cleveland] [Cleveland_Devlin]. The LOESS procedure is a powerful statistical technique for describing a discrete time series (see the references in the Remarks Section below). In the LOESS procedure, the analyzed multi-channel time series is decomposed into two terms:
X(t) = T(t) + R(t)
where t refers to a time index, the T term is used to quantify the trend and low-frequency variations in the time series, and, finally, the R term contains the residual component.
The trend is estimated through a sequence of applications of locally weighted regression or low-order polynomial (e.g. Loess) to data windows whose length is chosen by the user. Other important features of the LOESS method are the specification of the amount of trend smoothing according to the choice of the user, the ability to produce robust estimates of the trend component that are not distorted by aberrant behavior in the data and the stationarity of the R time series.
This procedure returns the trend component as estimated by the LOESS procedure and, optionally, the residuals in a NetCDF dataset, for the multi-channel time series associated with a NetCDF variable.
If the NetCDF variable is unidimensional or fourdimensional use comp_trend_1d or comp_trend_4d, respectively instead of comp_trend_3d. It the time series have a seasonal (or diurnal) cycle, use comp_stl_3d instead of comp_trend_3d in order to estimate the trend component by the Loess procedure.
The exact meaning and default values for most of the optional parameters of comp_trend_3d are exactly the same as in the original (STL) procedure described by [Cleveland_etal] and the user is referred to this publication for further details on the LOESS procedure as implemented here.
This procedure is parallelized if OpenMP is used.
Further Details¶
Usage¶
$ comp_trend_3d \
-f=input_netcdf_file \
-v=input_netcdf_variable \
-nt=trend_smoother_length (nt) \
-m=input_mesh_mask_netcdf_file (optional) \
-g=grid_type (optional : n, t, u, v, w, f) \
-x=lon1,lon2 (optional) \
-y=lat1,lat2 (optional) \
-t=time1,time2 (optional) \
-a=type_of_analysis (optional : trend, residual) \
-o=output_netcdf_file (optional) \
-itdeg=trend_smoother_degree (itdeg) (optional : 0, 1, 2 ) \
-ntjump=trend_skipping_value (ntjump) (optional) \
-maxit=max_robustness_iterations (maxit) (optional) \
-smt=trend_smoothing_factor (optional) \
-robust (optional) \
-mi=missing_value (optional) \
-double (optional) \
-hdf5 (optional) \
-bigfile (optional) \
-tlimited (optional)
By default¶
- -m=
- an input_mesh_mask_netcdf_file is not used
- -g=
- the grid_type is set to
n
which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian- -x=
- the whole longitude domain associated with the netcdf_variable
- -y=
- the whole latitude domain associated with the netcdf_variable
- -t=
- the whole time period associated with the netcdf_variable
- -a=
- the type_of_analysis is set to
trend
. This means that the residual times series are not computed and not stored in the output_netcdf_file- -o=
- the output_netcdf_file is named
trend_
netcdf_variable.nc
- -itdeg=
- the trend_smoother_degree is set to
1
- -ntjump=
- the trend_skipping_value is set to
nt/10
wherent
is the value of the trend_smoother_length argument- -maxit=
- the max_robustness_iterations is set to
15
- -smt=
- no smoothing is applied to the trend component
- -robust
- robustness iterations are not used, by default
- -mi=
- the missing_value for the output variables is equal to
1.e+20
- -double
- the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
- -bigfile
- a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
- -hdf5
- a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
- -tlimited
- the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file
Remarks¶
The -v=netcdf_variable argument specifies the NetCDF variable from which the multi-channel time series must be decomposed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.
If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing, the whole geographical domain associated with the netcdf_variable is used to select the multi-channel time series.
The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to
1
. Negative values are allowed for lon1. In this case the longitude domain is fromnlon
+lon1+1
to lon2 wherenlon
is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.Refer to comp_mask_3d for transforming geographical coordinates as indices before using comp_trend_3d.
If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used to decompose the time series.
The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
. Note that the output NetCDF file will haventime
= time2 - time1 +1
time observations.The geographical shapes of the netcdf_variable (in the input_netcdf_file) and the mask (in the input_mesh_mask_netcdf_file) must agree if the -m= argument is used.
If -g= is set to
t
,u
,v
,w
orf
it is assumed that the NetCDF variable is from an experiment with the NEMO or ORCA model.If -g= is set to
n
, it is assumed that the 2-D grid-mesh is regular or Gaussian and as such has no duplicate points.This argument is also used to determine the name of the NetCDF mesh_mask variable if an input_mesh_mask_netcdf_file is used as specified with the -m= argument.
The -nt= argument specifies the length of the trend smoother,
nt
. The value ofnt
should be an odd integer greater than or equal to3
. Asnt
increases the values of the trend component become smoother.The -itdeg= argument specifies the degree of the locally-fitted polynomial in trend smoothing. The value is
0
,1
or2
.The -ntjump= argument specifies the skipping value for trend smoothing. The trend smoother skips ahead
ntjump
points and then linearly interpolates in between. The value ofntjump
should be a positive integer; ifntjump
is set to1
, a trend smooth is calculated at all points in the time series. To make the procedure run faster, a reasonable choice forntjump
is10
% or20
% ofnt
.The -a= argument specifies if the residuals from the trend component are stored in the output NetCDF file. If:
- -a=
trend
, the residuals are not computed- -a=
residual
, the residuals are computed and stored.The default is -a=
trend
, e.g. the residuals are not stored. Note that in all cases, the trend component is computed and stored in the output NetCDF file.If -robust is specified, robustness iterations are carried out until convergence of the trend component or with a maximum of
maxit
iterations as specified by the -maxit= argument. Convergence occurs if the maximum changes in individual trend fits are less than1
% of the component’s range after the previous iteration.-smt=trend_smoothing_factor means that the trend component extracted from the netcdf_variable (e.g. the -v= argument) must be smoothed with a moving average of approximately
2
*trend_smoothing_factor+1
terms. The smoothing is applied before estimating the residuals (e.g. if -a= argument is set toresidual
). trend_smoothing_factor must be a strictly positive integer (e.g. greater than0
). Note that this additional smoothing is not implemented in the original LOESS procedure.The -mi=missing_value argument specifies the missing value attribute for the output variables in the output_netcdf_file. If the -mi= argument is not specified, the missing_value is set to
1.e+20
.The -double argument specifies that, the results are stored as double-precision floating point numbers in the output NetCDF file. By default, the results are stored as single-precision floating point numbers in the output NetCDF file.
The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g.
-D_USE_NETCDF36
or-D_USE_NETCDF4
) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g.
-D_USE_NETCDF4
) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.It is assumed that the data has no missing values excepted those associated with a constant land-sea mask.
It the time series have a seasonal or diurnal cycle, use comp_stl_3d instead of comp_trend_3d.
Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.
For more details on the LOESS procedure, see
- “Robust Locally Weighted Regression and Smoothing Scatterplots”, by Cleveland, W.S., Journal of the American Statistical Association, Vol. 74, 829-836, 1979. doi: 10.1080/01621459.1979.10481038
- “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting”, by Cleveland, W.S., and Devlin, S.J., Journal of the American Statistical Association, Vol. 83, 596-610, 1988. doi: 10.1080/01621459.1988.10478639
- “A Seasonal-Trend Decomposition Procedure Based on Loess”, by Cleveland, R.B., Cleveland, W.S., McRae, J.E., and Terpennings, I., Journal of Official Statistics, 6, 3-73, 1990. http://www.jos.nu/Articles/abstract.asp?article=613
Outputs¶
comp_trend_3d creates an output NetCDF file that contains the trend and residual components extracted from the time series associated with the input NetCDF variable. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat and nlon are the length of the spatial dimensions of the input NetCDF variable and ntime is the selected number of time observations) :
netcdf_variable_trend
(ntime,nlat,nlon)
: the trend component for each of the time series of the 2-D grid-mesh associated with the input NetCDF variable.netcdf_variable_residual
(ntime,nlat,nlon)
: the residual component for each of the time series of the 2-D grid-mesh associated with the input NetCDF variable.This variable is stored only if the -a=
residual
argument has been specified when calling comp_trend_3d.The trend and residual components are packed in tridimensional variables whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variables are filled with missing values.
Examples¶
For estimating a trend from the tridimensional NetCDF variable called
sosstsst
extracted from the fileST7_1m_00101_20012_sosstsst_ano_grid_T.nc
, which includes monthly anomalies time series, and store the results in the NetCDF filestl_ST7_1m_00101_20012_sosstsst_ano_grid_T.nc
, use the following command :$ comp_trend_3d \ -f=ST7_1m_00101_20012_sosstsst_ano_grid_T.nc \ -v=sosstsst \ -nt=127 \ -a=residual \ -robust \ -o=stl_ST7_1m_00101_20012_sosstsst_ano_grid_T.nc