comp_spectvar_1d¶
Authors¶
Pascal Terray (LOCEAN/IPSL)
Latest revision¶
28/05/2024
Purpose¶
Compute variance estimates in a selected frequency band for a real time series by windowed filtering in the frequency domain. These variance estimates correspond exactly to the variance of the filtered time series output by comp_fftfilter_1d when similar parameters are used in both procedures. Alternatively, comp_spectvar_1d can also compute spectral variance estimates in a given frequency band from a spectrum analysis of a real time series as output by comp_spectrum_1d or comp_cross_spectrum_1d. The input time series is extracted from a uni- or bidimensional variable readed from a NetCDF dataset.
At the user option, the time series can be detrended before computing the power spectrum and the variance estimate in the selected frequency band (see the -tr= and -tr2= arguments). The variance estimate and power spectrum are computed with the help of the Fast Fourier Transform (FFT) of the real time series and windowed filtering in the frequency domain.
The user can select different data windows to apply to the time series for the computations of the Power Spectral Density (PSD) estimates (see the -win2= argument) and also different frequency windows to convolve with the selected filter response in the frequency domain (see the -win= argument) in order to get more accurate variance estimates.
Furthermore, in order to obtain stable PSD and variance estimates, the real time series can be segmented and PSD estimates computed on, eventually overlapping, segments can be averaged at the user option (see the -sl= and -nooverlap= arguments). The stability of the PSD and final variance estimates is increased by this averaging process. That is, the greater the number of segments, the more stable, the resulting PSD and final variance estimates. See the documentation of comp_spectrum_1d or comp_cross_spectrum_1d for more details on this method known as the Welch’s method [Welch] .
Optionally, theses PSD estimates may also been smoothed in the frequency domain by Daniell weights (e.g., a simple moving average; see the -nsm= argument) as in a conventional spectrum analysis.
By default, the variance estimate is returned in units, which are the square of the data, and, at the user option, as percentage of the total variance in the selected frequency band if the -normvar argument is specified in the call to the procedure.
Finally, the variances estimates can be computed separately for different times intervals (see the -p= argument).
The variance estimates are stored in an output NetCDF dataset.
For more information on spectral analysis and filtering in the frequency domain, see the references below [Welch] [Bloomfield] [Diggle] [Iacobucci_Noullez] .
Further Details¶
Usage¶
$ comp_spectvar_1d \
-f=input_netcdf_file \
-v=netcdf_variable \
-t=time1,time2 (optional) \
-o=output_netcdf_file (optional) \
-ni=index_for_2d_netcdf_variable (optional) \
-p=period_length (optional) \
-sl=segment_length (optional : 16,32,64,128,...) \
-pl=minimum_period (optional) \
-ph=maximum_period (optional) \
-tr=trend_removal (optional : 0,1,2,3) \
-tr2=trend_removal_on_segment (optional : 0,1,2,3) \
-win=window_parameter (optional : 0.5 > 1.) \
-win2=window_choice (optional : 1,2,3,4,5,6) \
-tap=tapered_percentage (optional : 0.0 > 1.) \
-nsm=Daniell_filter_length (optional) \
-mi=missing_value (optional) \
-nooverlap (optional) \
-normvar (optional) \
-double (optional) \
-hdf5 (optional) \
-tlimited (optional)
By default¶
- -t=
- the whole time period associated with the netcdf_variable
- -o=
- the output_netcdf_file is named
spectvar_
netcdf_variable.nc
- -ni=
- if the netcdf_variable is bidimensional, the first time series is used
- -p=
- the period_length is set to
time2 - time1 + 1
, which means that the time series is considered as continuous with only one time period- -sl=
- the segment_length is set to
0
, which means that the time series (or each time period if the -p= argument is used) is not segmented to stabilize the the PSD and final variance estimates- -pl=
- the minimum_period is set to
0
, which means that no filtering is done for the shorter periods- -ph=
- the maximum_period is set to
0
, which means that no filtering is done for the longer periods- -tr=
- the trend_removal is set to
1
, which means that the mean of the whole time series is removed before the spectral computations- -tr2=
- the trend_removal_on_segment is set to
0
, which means that no trend processing is done on each time segment before the spectral computations- -win=
- the window_parameter is set to
1.
, which means that the rectangular window is convolved with the ideal filter response in the frequency domain in order to compute the final variance estimates- -win2=
- the window_choice is set to
2
, which means that the square window is applied to the time series and used in the spectral computations- -tap=
- the tapered_percentage is set to
0.2
, which means that if a split-cosine-bell window is used for the spectral computations (e.g., if the window_choice is set to6
with the -win2= argument),20
% of the data are tapered- -nsm=
- the Daniell_filter_length is set to
0
, which means that the PSD estimates are not smoothed in the frequency domain by a moving average before computing the variance estimates in the selected frequency band- -mi=
- the missing_value for the output NetCDF variable is set to
1.e+20
- -nooverlap
- normally, if the -sl= argument is used, the time segments are overlapped. However, if -nooverlap is specified, the time segments are not overlapped
- -normvar
- normally, the raw variance in the selected frequency band is output. However, if -normvar is specified, the percentage of total variance in the selected frequency band for the real multichannel time series is also computed and stored in the output NetCDF file.
- -double
- the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
- -hdf5
- a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
- -tlimited
- the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file
Remarks¶
The -v=netcdf_variable argument specifies the NetCDF variable for which a frequency variance analysis must be performed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.
If the -t=time1,time2 argument is missing the whole time period associated with the netcdf_variable is used to compute the spectral variance estimates in the selected frequency band.
The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
.time2 - time1 +
1
must be an even integer if the -sl= and -p= arguments are not used and of any length if one of these arguments is used.The -ni=index_for_2d_netcdf_variable argument specifies the index for selecting the time series if the netcdf_variable is a 2D NetCDF variable. By default, the first time series is used, which is equivalent to set index_for_2d_netcdf_variable to
1
.If the -p= argument is specified, the frequency variance estimates are computed separately for each time period of length period_length (as determined by the value of the -p= argument).
The whole selected time period (e.g., time2 - time1 +
1
) must be a multiple of the period_length. The period_length must also be of even length if the -sl= argument is not used.The -sl=segment_length argument specifies an integer used to segment the time series. The segment_length must be a positive even integer less or equal to the length of the time series (e.g., time2 - time1 +
1
) or the period_length if the -p= argument is used.Spectral computations are at ( segment_length/
2
) +1
Fourier frequencies and suggested values for segment_length are16
,32
,64
,128
, … (e.g., the spectral estimates are taken at frequencies(i-1)
/ segment_length for i=1,2, … , (segment_length/2
) +1
.By default, segment_length is set to
0
, e.g., the time series is not segmented and the segment length is equal to time2 - time1 +1
or period_length if the -p= argument is used.The -pl= argument specifies the minimum period of oscillation of the filtered time series. The minimum_period is expressed in number of time observations.
Do not use the -pl= argument or use -pl=
0
for high-pass filtering frequencies corresponding to periods shorter than -ph=PH
.The -pl= argument is a positive integer equal to
0
or greater than2
.The -ph= argument specifies the maximum period of oscillation of the filtered time series. The maximum_period is expressed in number of time observations. Do not use the -ph= argument or use -ph=
0
for low-pass filtering frequencies corresponding to periods longer than -pl=PL
. For example, -pl=6
(or18
) and -ph=32
(or96
) select periods between1.5
and8
years for quarterly (monthly) time series.The -ph= argument is a positive integer equal to
0
or greater than2
and less than the length of the time series or the period_length if the -p= argument is used.The -ph= argument must also be greater or equal to the -pl= argument if both are specified.
Setting -pl=
PL
, -ph=PH
andPH``<``PL
is also allowed and performs band rejection of periods betweenPH
andPL
. In that case, the meaning of the -pl= and -ph= arguments reversed.Setting -pl= and -ph= to the same value
P
is allowed. In this case, an -ideal- band-pass filter with peak response near one at the single periodP
is applied to the time series to get the variance estimates at this particular fequency.Setting both -pl=
0
and -ph=0
is not allowed since in that case, no frequency filtering is done.The -tr= argument specifies pre-filtering processing of the real time series before estimating the power spectrum of the time series. If:
- -tr=
+1
, the mean of the time series is removed before the frequency analysis- -tr=
+2
, the drift from the time series is removed before the frequency analysis. The drift for the time series is estimated using the formula :drift = ( tseries(ntime) - tseries(1) )/( ntime - 1 )
- -tr=
+3
, the least-squares line from the time series is removed before the frequency analysis.For other values of the -tr= argument, nothing is done before estimating the spectrum.
If the -p= argument is present, the pre-filtering processing is applied to each time period, separately.
The -tr= argument must be an integer and the default value for the -tr= argument is
0
, e.g., nothing is done before the spectrum computations.The -tr2= argument specifies pre-filtering processing of the real time series before estimating the power spectrum on each time segment (as specified by the -sl= argument). If:
- -tr2=
+1
, the mean of the time segment is removed before computing the power spectrum on the segment- -tr2=
+2
, the drift from the time segment is removed before computing the power spectrum on the segment- -tr2=
+3
, the least-squares line from the time segment is removed before computing the power spectrum on the segment.For other values of the -tr2= argument, nothing is done before estimating the spectrum on each time segment.
The -tr2= argument must be an integer and the default value for the -tr2= argument is
0
, e.g., nothing is done before the spectral computations on each time segment.The -win=window_parameter argument specifies the form of the window which will be convolved with the ideal filter response as in the comp_fftfilter_1d procedure.
Set -win=
0.5
for using a Hanning window, -win=0.54
for using a Hamming window or -win=1.
for rectangular window.The -win=window_parameter argument is a real number greater or equal to
O.5
and less or equal to1.
.By default, rectangular window filtering is used (i.e., -win=
1.
).The -win2=window_choice argument specifies the form of the data window used in the computations of the power spectrum as the -win=window_choice argument in the comp_spectrum_1d procedure. If:
- -win2=
1
The Bartlett window is used- -win2=
2
The square window is used- -win2=
3
The Welch window is used- -win2=
4
The Hann window is used- -win2=
5
The Hamming window is used- -win2=
6
A split-cosine-bell window is used.The window_choice must be an integer between
1
and6
.The default value for the window_choice is
2
, e.g., a square window window is used for the spectral computations.The -tap=tapered_percentage argument specifies the total percentage of the data to be tapered if the window_choice is set to
6
with the -win2= argument. The tapered_percentage is a real number which must be greater than0.
and less or equal to1.
.The default value for the tapered_percentage is
0.2
(e.g.,20
% of the data are tapered,10
% at the start of the series and10
% at the end).The -nsm=Daniell_filter_length argument specifies that the PSD estimates must be computed by smoothing the periodogram with Daniell weights (e.g., a simple moving average) as in the comp_spectrum_1d procedure.
On entry, the -nsm= argument gives the length of the Daniell filter to be applied. The Daniell_filter_length must be an odd integer, greater than
2
and less or equal to ( segment_length/2
) +1
if the -sl= argument is specified.By default, the PSD estimates are not smoothed in the frequency domain.
If the -nooverlap argument is specified and the -sl= argument is also specified, comp_spectvar_1d segments the time series without any overlapping (in each time period if the -p= argument is used).
By default, comp_spectvar_1d overlaps the segments by one half of their length (which is equal to the value of the -sl= argument).
In both cases, each segment will be FFT’d, and the resulting periodograms will be averaged together to obtain PSD estimates at the ( segment_length/
2
) +1
frequencies.If the -sl= argument is not specified, the -nooverlap argument has no effect.
- The -normvar argument specifies that the percentage of variance in the selected frequency
band must be also computed and output in addition of the raw variance estimates.
The -mi=missing_value argument specifies the missing value indicator for the output variables in the output_netcdf_file.
If the -mi= argument is not specified, the missing_value is set to
1.e+20
.The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file.
By default, the results are stored as single-precision floating point numbers in the output NetCDF file.
The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g.,
-D_USE_NETCDF4
) and linked to the NetCDF 4 library or higher.If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.
It is assumed that the data has no missing values.
Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.
For more details on power spectrum analysis and on windowed filtering in the frequency domain, see
- “The use of Fast Fourier Transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms”, by P.D. Welch, IEEE trans. on audio and electroacoustics, 15:2, 70-73, 1967. DOI: https://doi.org/10.1109/TAU.1967.1161901
- “Fourier analysis of time series- An introduction”, by Bloomfield, P., John Wiley and Sons, New York, Chapter 6, 1976. http://as.wiley.com/WileyCDA/WileyTitle/productCd-0471889482.html
- “Time series: a biostatistical introduction”, by P.J. Diggle, Clarendon Press, Oxford, Chap. 4, 1990. http://www.oupcanada.com/catalog/9780198522263.html
- “A Frequency Selective Filter for Short-Length Time Series”, by Iacobucci, A., and Noullez, A., Computational Economics, Vol. 25, 75-102, 2005. https://link.springer.com/article/10.1007/s10614-005-6276-7
Outputs¶
comp_spectvar_1d creates an output NetCDF file that contains the variance estimates in the selected frequency band for the time series associated with the input NetCDF variable. The output NetCDF dataset contains the following NetCDF variables (in the description below, nperiod is
1
if the -p= argument is not used or is equal tontime
/ period_length if the -p= argument is used, wherentime
= time2 - time1 +1
) :
netcdf_variable_spectvar
(nperiod)
: the corresponding variance estimates for each time period associated with the input NetCDF variable.netcdf_variable_spectvar_percent
(nperiod)
: the corresponding percentages of total variance for each time period associated with the input NetCDF variable.This variable is stored only if the -normvar argument has been specified when calling comp_spectvar_1d.
Examples¶
For estimating variance in the biennial time scale (e.g., between
18
and30``months) for a real monthly time series readed from a unidimensional NetCDF variable called ``sst
extracted from the filesst.monthly.nino34.nc
, which includes a monthly time series, and store the results in the NetCDF filevar_tbo_sst_nino34.nc
, use the following command :$ comp_spectvar_1d \ -f=sst.monthly.nino34.nc \ -v=sst \ -pl=18 \ -ph=30 \ -o=var_tbo_sst_nino34.nc \ -normvar