comp_cross_spectrum2_3d¶
Authors¶
Pascal Terray (LOCEAN/IPSL)
Latest revision¶
29/05/2024
Purpose¶
Compute Fast Fourier Transform (FFT) estimates of the power spectra and cross-power spectra between an index time series
and a tridimensional variable readed from a NetCDF dataset. From the Power Spectral Density (PSD) and Cross-Spectral
Density (CSD) estimates, this program also computes the amplitude and phase relationships between the signals, e.g.,
the phases of the multi-channel cross-spectrum and the squared coherency estimates at each frequency. In output of
comp_cross_spectrum_3d, both are given on the closed interval [0.
, 1.
], meaning that both are unitless.
In general terms, the phase gives the lag (negative values) or lead (positive values) of the index time series with
respect to the variable multi-channel time series at each frequency in radians.
Here, the phase is given in fractions of a circle (e.g., on the closed interval [0.
, 1.
] ) instead of being in radians.
In other words, for phase values less than 0.5
the index time series leads the variable multi-channel time series and
vice-versa for phase values greater than 0.5
.
The magnitude-squared-coherence can be used to identify significant frequency-domain correlation (at the Fourier frequencies) between the index time series and the multi-channel time series. Phase estimates in the cross-spectrum are only useful where significant frequency domain correlation exists and you can neglect the cross spectrum when the squared coherency is small.
Spectral analysis amounts to estimate the power spectrum of a signal (e.g., a time series) from its time representation. The PSD estimates, which constitute the power spectrum, characterize the frequency content of the signal and decompose the variance of the signal into the different frequencies and allow to identify the periodicities present in the signal. In a similar way, cross-spectral analysis amounts to decompose the correlation between two signals into the different frequencies.
comp_cross_spectrum2_3d is based on a non-parametric method, which divides the time series into segments (see the -sl= argument),
applies FFT on each segment, computes the PSD and CSD estimates from the FFTs and, finally, averages these estimates from each segment.
Such non-parametric method is based on the observed time series and does not require prior knowledge about the data or a model generating
the data. This method is known as the Welch’s averaged, modified periodogram method for estimating the power and cross-power spectra of
different time series [Welch] . This Welch’s Overlapped Segment Averaging (WOSA) method also provides a consistent average coherence estimator,
which prevents obtaining a squared coherency estimate that is identically 1.
for all frequencies.
At the user option, the different segments of each time series can be detrended before computing the PSD and CSD estimates (see the -tr2= argument) in order to ensure that the analyzed time series remain stationary on each segment.
In order to obtain stable PSD and CSD estimates (e.g., reduce the variance of the estimates) and minimize the loss of information, the real time series can also be segmented, and PSD and CSD estimates on, eventually overlapping, segments can be averaged at the user option (this is known as the Welch’s method as already noted above). The stability of the PSD and CSD estimates is increased by this averaging process because the Welch’s method uses PSD and CSD estimates from different segments of the time series, which represent approximately uncorrelated estimates of the true PSD and CSD estimates. That is, the greater the number of segments, the more stable, the resulting PSD and CSD estimates as their variance is reduced by the averaging.
The user can also select different data windows for the computations of the PSD and CSD estimates (see the -win= argument) on each segment. The segments are multiplied by a data window (selected with the -win= argument) before performing the FFTs, so that the Welch’s method used in comp_cross_spectrum2_3d computes a modified periodogram on each segment and then averaged these estimates to produce the final PSD and CSD estimates. Because the segments usually overlap (see the -nooverlap argument), data values at the beginning and end of each segment tapered by the data window on one segment, occur away from the ends of adjacent segments. In this way, the overlap guards against the loss of information caused by windowing, but allows to increase the number of segments and, finally, PSD and CSD estimates on each segment can still be considered as uncorrelated.
Optionally, the PSD and CSD estimates may also been smoothed in the frequency domain by Daniell weights (e.g., a simple moving average; see the -nsm= argument).
The PSD estimates are returned in units, which are the square of the data if the -nonormpsd argument is used (meaning that the sum of the PSD estimates is equal to the variance of the time series) or in PSD units otherwise (meaning that the total area under the power spectrum is equal to the variance of the time series).
Finally, again at the user option, statistical confidence interval and equivalent number of degrees of freedom of the PSD estimates and critical values for testing the null hypothesis that the squared coherency between the time series is zero can be computed (see the -conflim= argument).
All the final PSD and CSD estimates of the different time series and the associated statistics are stored in an output NetCDF dataset.
If the NetCDF variable is unidimensional use comp_cross_spectrum_1d instead of comp_cross_spectrum2_3d. If the multi-channel time series has a seasonal (or diurnal) cycle, use comp_stl_3d in order to estimate and remove the harmonic components of the time series before using comp_cross_spectrum2_3d.
comp_cross_spectrum2_3d performs the same task as comp_cross_spectrum_3d, but requires much less memory and will be faster than comp_cross_spectrum_3d if OpenMP parallelism is used. On the other hand, comp_cross_spectrum2_3d is slightly less flexible (e.g., the -tr= and -p= arguments are not allowed in comp_cross_spectrum2_3d).
Finally, use comp_spectvar_3d to compute variance or PSD estimates in a selected frequency band for the real multi-channel time series after computing the power spectra of the real multi-channel time series.
For more information on spectral and cross-spectral analysis, see the references below [Welch] [Bloomfield] [Diggle].
Further Details¶
Usage¶
$ comp_cross_spectrum2_3d \
-f=input_netcdf_file \
-v=netcdf_variable \
-m=input_mesh_mask_netcdf_file (optional) \
-g=grid_type (optional : n, t, u, v, w, f) \
-x=lon1,lon2 (optional) \
-y=lat1,lat2 (optional) \
-t=time1,time2 (optional) \
-o=output_netcdf_file (optional) \
-vi=index_netcdf_variable (optional) \
-ni=indice_for_2d_index_netcdf_variable (optional) \
-fi=input_index_netcdf_file (optional) \
-ti=itime1,itime2 (optional) \
-pi=iperiodicity,istep (optional) \
-sl=segment_length (optional : 16,32,64,128,...) \
-tr2=trend_removal_on_segment (optional : 0,1,2,3) \
-win=window_choice (optional : 1,2,3,4,5,6) \
-tap=tapered_percentage (optional : 0.0 > 1.) \
-nsm=Daniell_filter_length (optional) \
-pro=critical_probability (optional : 0.0 > 1.) \
-mi=missing_value (optional) \
-nonormpsd (optional) \
-nooverlap (optional) \
-conflim (optional) \
-nospect (optional) \
-double (optional) \
-bigfile (optional) \
-hdf5 (optional) \
-tlimited (optional)
By default¶
- -m=
- an input_mesh_mask_netcdf_file is not used
- -g=
- the grid_type is set to
n
which means that the 2-D grid-mesh associated with the input netcdf_variable is assumed to be regular or Gaussian- -x=
- the whole longitude domain associated with the netcdf_variable
- -y=
- the whole latitude domain associated with the netcdf_variable
- -t=
- the whole time period associated with the netcdf_variable
- -o=
- output_netcdf_file name is set to
cross_spectrum_
netcdf_variable.
index_netcdf_variable.nc
if index_netcdf_variable is specified with the -vi= argument and tospectrum_
netcdf_variable.nc
if the -vi= argument is not used- -vi=
- a power spectrum analysis is performed
- -ni=
- if the index_netcdf_variable is bidimensional, the first time series is used
- -fi=
- the same as the -f= argument
- -ti=
- the whole time period associated with the index_netcdf_variable
- -pi=
- this parameter is not used, which is equivalent to use
1,1
for the -pi= argument- -sl=
- the segment_length is set to
0
, which means that the time series is not segmented to stabilize the PSD and CSD estimates- -tr2=
- the trend_removal_on_segment is set to
0
, which means that no trend processing is done on each time segment before the spectral computations- -win=
- the window_choice is set to
1
, which means that the Bartlett window is used in the spectral computations- -tap=
- the tapered_percentage is set to
0.2
, which means that if a split-cosine-bell window is used for the spectral computations (e.g., if the window_choice is set to6
with the -win= argument),20
% of the data are tapered- -nsm=
- the Daniell_filter_length is set to
0
, which means that the PSD and CSD estimates are not smoothed in the frequency domain by a moving average- -pro=
- the critical_probability is set to
0.05
, which means that if a confidence interval for the power spectrum is requested with the -conflim argument, the lower and upper95
% confidence limit factors are computed. In addition, if the -vi= argument is used, the lower and upper(1-0.05)*100
% confidence limit factors and the critical value for testing the null hypothesis that the squared coherency is zero at the0.05*100
% significance level are computed- -mi=
- the missing_value for the output NetCDF variables is set to
1.e+20
- -nonormpsd
- the PSD estimates are normalized in such a way that the total area under the power spectrum is equal to the variance of the time series. If -nonormpsd is specified, the PSD estimates are returned in units which are the square of the data
- -nooverlap
- normally, if the -sl= argument is used, the time segments are overlapped. However, if -nooverlap is specified, the time segments are not overlapped
- -conflim
- normally, the equivalent number of degrees of freedom, the confidence limit factors of the PSD estimates and the critical value for testing the null hypothesis that the squared coherency is zero are not computed. However, if -conflim is specified, these statistics are computed and stored in the output NetCDF dataset
- -nospect
- the PSD estimates are stored in the output NetCDF dataset
- -double
- the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
- -bigfile
- a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
- -hdf5
- a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
- -tlimited
- the time dimension (e.g., the frequency axis) is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file
Remarks¶
The -v=netcdf_variable argument specifies the NetCDF variable (e.g., the multi-channel time series) for which a power or cross-power spectrum analysis must be performed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.
The optional argument -m=input_mesh_mask_netcdf_file specifies the land-sea mask to apply to netcdf_variable for transforming this tridimensional NetCDF variable as a rectangular matrix of observed variables before computing the correlation analysis. By default, it is assumed that each cell in the 2-D grid-mesh associated with the input tridimensional NetCDF variable is a valid time series (e.g. missing values are not present).
The geographical shapes of the netcdf_variable (in the input_netcdf_file) and the mask (in the input_mesh_mask_netcdf_file) must agree if an input_mesh_mask_netcdf_file is used.
Refer to comp_clim_3d or comp_mask_3d for creating a valid input_mesh_mask_netcdf_file NetCDF file for regular or gaussian grids before using comp_cross_spectrum2_3d.
If -g= is set to
t
,u
,v
,w
orf
it is assumed that the input NetCDF variable is from an experiment with the NEMO model (ORCA configuration and R2, R4 or R05 resolutions). This argument is also used to determine the name of the mesh_mask variable if an input_mesh_mask_netcdf_file is used.If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing the whole geographical domain associated with the netcdf_variable is used.
The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to
1
. Negative values are allowed for lon1. In this case the longitude domain is fromnlon
+lon1+1
to lon2 wherenlon
is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.Refer to comp_mask_3d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_cross_spectrum2_3d.
If the -t=time1,time2 argument is missing the whole time period associated with the netcdf_variable is used to estimate the power and cross-power spectrum.
The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
.time2 - time1 +
1
must be an even integer if the -sl= argument is not used and of any length if this argument is used.The -vi=index_netcdf_variable specifies a time series for the cross-power spectrum analysis. If the -vi=index_netcdf_variable is present, the -fi= argument must also be present and this argument specifies the NetCDF dataset which contains the index_netcdf_variable. However, if the NetCDF dataset which contains the index_netcdf_variable is the same as the NetCDF dataset specified by the -f= argument, it is not necessary to specify the -fi= argument.
If the -vi=index_netcdf_variable is absent, a power spectrum analysis of the NetCDF variable netcdf_variable specified by the -v= argument is performed.
The -ni= argument specifies the indice (e.g., an integer) for selecting the index time series if the index_netcdf_variable specified with the -vi= argument is a 2D NetCDF variable. By default, the first time series is used, which is equivalent to set indice_for_2d_netcdf_variable to
1
.If the -ti=itime1,itime2 argument is missing, data in the whole time period associated with the index_netcdf_variable is taken into account. The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
.The -pi= argument gives the periodicity and selects the time step for the index_netcdf_variable. For example, to compute a cross-power analysis with the January monthly time series extracted from the index_netcdf_variable which is assumed to be sampled every month, -pi=
12,1
should be specified, with yearly data -pi=1,1
may be used, etc.The selected time periods for the netcdf_variable and index_netcdf_variable must agree. This means that the following equality must be verified
time2 - time1 + 1 = ceiling((itime2 - itime1 - istep + 2)/iperiodicity)
,otherwise, an error message will be issued and the program will stop.
The -sl=segment_length argument specifies an integer used to segment the time series. The segment_length must be a positive even integer less or equal to the length of the time series (e.g., time2 - time1 +
1
). Spectral computations are at ( segment_length/2
) +1
Fourier frequencies and suggested values for segment_length are16
,32
,64
,128
, … (e.g., the spectral estimates are taken at frequencies(i-1)
/ segment_length for i=1,2, … , (segment_length/2
) +1
. By default, segment_length is set to0
, e.g., the time series is not segmented and segment_length is equal to time2 - time1 +1
.The -tr2= argument specifies pre-filtering processing of the real time series before estimating the PSD and CSD estimates on each time segment. If:
- -tr2=
+1
, the mean of the time segment is removed before computing the PSD and CSD estimates on the segment- -tr2=
+2
, the drift from the time segment is removed before computing the PSD and CSD estimates on the segment- -tr2=
+3
, the least-squares line from the time segment is removed before computing the PSD and CSD estimates on the segment.For other values of the -tr2= argument, nothing is done before estimating the PSD and CSD estimates on each time segment.
The -tr2= argument must be an integer and the default value for the -tr2= argument is
0
, e.g., nothing is done before the spectral computations on each time segment.The -win=window_choice argument specifies the form of the data window used in the computations of the modified periodogram on each segment. If:
- -win=
1
The Bartlett window is used- -win=
2
The square window is used- -win=
3
The Welch window is used- -win=
4
The Hann window is used- -win=
5
The Hamming window is used- -win=
6
A split-cosine-bell window is used.The window_choice must be an integer between
1
and6
. The default value for the window_choice is1
, e.g., the Bartlett window is used for the spectral computations on each segment.The -tap=tapered_percentage argument specifies the total percentage of the data to be tapered if the window_choice is set to
6
with the -win= argument. The tapered_percentage is a real number which must be greater than0.
and less or equal to1.
.The default value for the tapered_percentage is
0.2
(e.g.,20
% of the data are tapered,10
% at the start of the series and10
% at the end).The -nsm=Daniell_filter_length argument specified that the PSD and CSD estimates must be computed by smoothing the periodogram with Daniell weights (e.g., a simple moving average). On entry, the -nsm= argument gives the length of the Daniell filter to be applied. The Daniell_filter_length must be an odd integer, greater than
2
and less or equal to ( segment_length/2
) +1
if the -sl= argument is specified.By default, the PSD and CSD estimates are not smoothed in the frequency domain.
If the -nooverlap argument is specified and the -sl= argument is also specified, comp_cross_spectrum2_3d segments the time series without any overlapping.
By default, comp_cross_spectrum2_3d overlaps the segments by one half of their length (which is equal to the value of the -sl= argument).
In both cases, each segment will be FFT’d, and the resulting (modified if the -win= argument is used) periodograms and cross-periodograms will be averaged together to obtain PSD and CSD estimates at the ( segment_length/
2
) +1
frequencies.If the -sl= argument is not specified, the -nooverlap argument has no effect.
If the -nonormpsd argument is specified, the sum of the PSD estimates is equal to the variance of the time series. By default, the PSD estimates are normalized in such a way that the total area under the power spectrum is equal to the variance of the time series.
If the -conflim argument is specified, the equivalent number of degrees of freedom, the lower and upper (
1
- critical_probability ) *100
% confidence limit factors of the PSD estimates and the critical value for testing the null hypothesis that the squared coherency is zero at the critical_probability*100
% significance level are computed.Multiply the PSD estimates by the lower and upper confidence limit factors to get the lower and upper limits of a (
1
- critical_probability ) *100
% confidence interval for the PSD estimates.Squared coherency less than the computed critical value should be regarded as not significantly different from zero at the critical_probability*
100
% significance level.By default, these statistics are not computed.
The -pro=critical_probability argument specifies the critical probability which is used to determine the lower and upper confidence limit factors and the critical value for testing the null hypothesis that the squared coherency is zero.
critical_probability must be greater than
0.
and less than1.
. The default value is0.05
.The -mi=missing_value argument specifies the missing value indicator for the output NetCDF variables in the output_netcdf_file.
If the -mi= argument is not specified, the missing_value is set to
1.e+20
.If the -nospect argument is specified, the PSD estimates of the time series are not stored in the output NetCDF file. By default, the PSD estimates are stored.
The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file.
By default, the results are stored as single-precision floating point numbers in the output NetCDF file.
The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g.
-D_USE_NETCDF36
or-D_USE_NETCDF4
) and linked to the NetCDF 3.6 library or higher.If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.
The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g.
-D_USE_NETCDF4
) and linked to the NetCDF 4 library or higher.If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.
It is assumed that the different real time series have no missing values.
Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.
For more details on power and cross-power spectrum analysis and examples of use in the climate literature, see
- “The use of Fast Fourier Transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms”, by P.D. Welch, IEEE trans. on audio and electroacoustics, 15:2, 70-73, 1967. DOI: https://doi.org/10.1109/TAU.1967.1161901
- “Fourier analysis of time series- An introduction”, by P. Bloomfield, John Wiley and Sons, New York, Chapter 6, 1976. http://as.wiley.com/WileyCDA/WileyTitle/productCd-0471889482.html
- “Time series: a biostatistical introduction”, by P.J. Diggle, Clarendon Press, Oxford, Chap. 4, 1990. http://www.oupcanada.com/catalog/9780198522263.html
- “Impact of intra-daily SST variability on ENSO characteristics in a coupled model” by S. Masson et al., 2012, Climate Dynamics, Vol. 39:681-707, doi: 10.1007/s00382-011-1247-2
- “The role of the intra-daily SST variability in the Indian monsoon variability in a global coupled model” by P. Terray et al., 2012, Climate Dynamics, Vol. 39:729-754, doi: 10.1007/s00382-011-1240-9
- “Southern Hemisphere extra-tropical forcing: A new paradigm for El Nino-Southern Oscillation” by P. Terray, 2011, Climate Dynamics, Vol. 36:2171-2199, doi: 10.1007/s00382-010-0825-z
Outputs¶
comp_cross_spectrum2_3d creates an output NetCDF file that contains the PSD and CSD estimates of the time series associated with the input NetCDF variables. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat and nlon are the lengths of the spatial dimensions of the input NetCDF variable netcdf_variable, nfreq is the number of frequencies where the PSD and CSD estimates are computed (as determined by the -t= and -sl= arguments):
frequency
(nfreq)
: the frequencies at which the PSD estimates are computed.periodicity
(nfreq)
: the corresponding periodicities in number of time observations.netcdf_variable_spectrum
(nfreq,nlat,nlon)
: the corresponding PSD estimates for the multi-channel time series associated with the input NetCDF variable.This variable is stored only if the -nospect argument is absent when calling comp_cross_spectrum2_3d.
index_netcdf_variable_spectrum
(nfreq)
: the corresponding PSD estimates for the time series associated with the input index NetCDF variable.This variable is stored only if the -vi= argument has been specified when calling comp_cross_spectrum2_3d.
index_netcdf_variable_netcdf_variable_squaredcoherency
(nfreq,nlat,nlon)
: the corresponding squared coherency estimates for the multi-channel time series associated with the input NetCDF variable and the time series associated with the input index NetCDF variable. The square coherency is given on the closed interval [0.
,1.
].This variable is stored only if the -vi= argument has been specified when calling comp_cross_spectrum2_3d.
index_netcdf_variable_netcdf_variable_phase
(nfreq,nlat,nlon)
: the corresponding phase estimates of the cross-spectrum for the multi-channel time series associated with the input NetCDF variable and the time series associated with the input index NetCDF variable. In general terms, the phase gives the lag (negative values) or lead (positive values) of the index_netcdf_variable time series with respect to the netcdf_variable multi-channel time series at each frequency in radians.Here, the phase is given in fractions of a circle (e.g., on the closed interval [
0.
,1.
] ) instead of being in radians. In other words, for phase values less than0.5
the index_netcdf_variable leads the netcdf_variable and vice-versa for phase values greater than0.5
.This variable is stored only if the -vi= argument has been specified when calling comp_cross_spectrum2_3d.
edof
(nfreq)
: the equivalent number of degrees of freedom of the PSD and CSD estimates of the time series associated with the input NetCDF variable.This variable is stored only if the -conflim argument has been specified when calling comp_cross_spectrum2_3d.
lower_factor
(nfreq)
: the lower confidence limit factors of the PSD estimates. Multiply the PSD estimates by the corresponding lower confidence limit factors to get the lower limits of the (1
- critical_probability ) *100
% confidence intervals for the PSD estimates.This variable is stored only if the -conflim argument has been specified when calling comp_cross_spectrum2_3d.
upper_factor
(nfreq)
: the upper confidence limit factors of the PSD estimates. Multiply the PSD estimates by the corresponding upper confidence limit factors to get the upper limits of the (1
- critical_probability ) *100
% confidence intervals for the PSD estimates.This variable is stored only if the -conflim argument has been specified when calling comp_cross_spectrum2_3d.
coher_test
(nfreq)
: the critical values for the statistical test of the squared coherency estimates. Squared coherency less than these computed critical values for a given frequency should be regarded as not significantly different from zero at the critical_probability*100
% significance level.This variable is stored only if the -conflim and -vi= arguments have been specified when calling comp_cross_spectrum2_3d.
All the PSD and CSD estimates involving the input NetCDF variable netcdf_variable are packed in tridimensional variables whose first and second dimensions are exactly the same as those associated with this input NetCDF variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variables are filled with missing values.
Examples¶
For computing the power and cross-power spectra of a real multi-channel monthly time series from a tridimensional NetCDF variable called
sst
extracted from the filesst.monthly.nc
with a monthly Nino34 SST index in a NetCDF variable calledsst_nino34
extracted from the filesst.monthly.nino34.nc
and store the results in a NetCDF file namedcross_spectrum_sst_nino34_sst.nc
, use the following command:$ comp_cross_spectrum2_3d \ -f=sst.monthly.nc \ -v=sst \ -fi=sst.monthly.nino34.nc \ -vi=sst_nino34 \ -sl=128 \ -o=cross_spectrum_sst_nino34_sst.nc \ -nonormpsd \ -nooverlap