comp_cross_spectrum_1d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

28/05/2024

Purpose

Compute Fast Fourier Transform (FFT) estimates of the power spectra and cross-power spectra between an index time series and an unidimensional variable readed from a NetCDF dataset. From the Power Spectral Density (PSD) and Cross-Spectral Density (CSD) estimates, this program also computes the phase and amplitude relationships between the signals, e.g., the phase of the cross-spectrum and the squared coherency estimate at each frequency. In output of comp_cross_spectrum_1d, both are given on the closed interval [0., 1.], meaning that both are unitless.

In general terms, the phase gives the lag (negative values) or lead (positive values) of the index time series with respect to the variable time series at each frequency in radians. Here, the phase is given in fractions of a circle (e.g., on the closed interval [0., 1.] ) instead of being in radians. In other words, for phase values less than 0.5 the index time series leads the variable time series and vice-versa for phase values greater than 0.5.

The magnitude-squared-coherence can be used to identify significant frequency-domain correlation (at the Fourier frequencies) between the two time series. Phase estimates in the cross-spectrum are only useful where significant frequency domain correlation exists and you can neglect the cross spectrum when the squared coherency is small.

Spectral analysis amounts to estimate the power spectrum of a signal (e.g., a time series) from its time representation. The PSD estimates, which constitute the power spectrum, characterize the frequency content of the signal and decompose the variance of the signal into the different frequencies and allow to identify the periodicities present in the signal. In a similar way, cross-spectral analysis amounts to decompose the correlation between two signals into the different frequencies.

comp_cross_spectrum_1d is based on a non-parametric method, which divides the time series into segments (see the -sl= argument), applies FFT on each segment, computes the PSD and CSD estimates from the FFTs and, finally, averages these estimates from each segment. Such non-parametric method is based on the observed time series and does not require prior knowledge about the data or a model generating the data. This method is known as the Welch’s averaged, modified periodogram method for estimating the power and cross-power spectra of different time series [Welch] . This Welch’s Overlapped Segment Averaging (WOSA) method also provides a consistent average coherence estimator, which prevents obtaining a squared coherency estimate that is identically 1. for all frequencies.

At the user option, the different time series can be detrended before computing the PSD and CSD estimates (see the -tr= and -tr2= arguments) in order to ensure that the analyzed time series remain stationary.

In order to obtain stable PSD and CSD estimates (e.g., reduce the variance of the estimates) and minimize the loss of information, the real time series can also be segmented, and PSD and CSD estimates on, eventually overlapping, segments can be averaged at the user option (this is known as the Welch’s method as already noted above). The stability of the PSD and CSD estimates is increased by this averaging process because the Welch’s method uses PSD and CSD estimates from different segments of the time series, which represent approximately uncorrelated estimates of the true PSD and CSD estimates. That is, the greater the number of segments, the more stable, the resulting PSD and CSD estimates as their variance is reduced by the averaging.

The user can also select different data windows for the computations of the PSD and CSD estimates (see the -win= argument) on each segment. The segments are multiplied by a data window (selected with the -win= argument) before performing the FFTs, so that the Welch’s method used in comp_cross_spectrum_1d computes a modified periodogram on each segment and then averaged these estimates to produce the final PSD and CSD estimates. Because the segments usually overlap (see the -nooverlap argument), data values at the beginning and end of each segment tapered by the data window on one segment, occur away from the ends of adjacent segments. In this way, the overlap guards against the loss of information caused by windowing, but allows to increase the number of segments and, finally, PSD and CSD estimates on each segment can still be considered as uncorrelated.

Optionally, the PSD and CSD estimates may also been smoothed in the frequency domain by Daniell weights (e.g., a simple moving average; see the -nsm= argument).

Additionally, the spectral and cross-spectral analysis can be done separately for different segments of equal length of the selected time series if these time series are not continuous in time (see the -p= argument).

The PSD estimates are returned in units, which are the square of the data if the -nonormpsd argument is used (meaning that the sum of the PSD estimates is equal to the variance of the time series) or in PSD units otherwise (meaning that the total area under the power spectrum is equal to the variance of the time series).

Finally, again at the user option, statistical confidence interval and equivalent number of degrees of freedom of the PSD estimates and critical values for testing the null hypothesis that the squared coherency between the time series is zero can be computed (see the -conflim= argument).

All the final PSD and CSD estimates of the different time series and the associated statistics are stored in an output NetCDF dataset.

If the NetCDF variable is tridimensional use comp_cross_spectrum_3d or comp_cross_spectrum2_3d instead of comp_cross_spectrum_1d. If the time series has a seasonal (or diurnal) cycle, use comp_stl_1d in order to estimate and remove the harmonic components of the time series before using comp_cross_spectrum_1d.

Finally, use comp_spectvar_1d to compute variance or PSD estimates in a selected frequency band for the real time series after computing the power spectrum of the real time series.

For more information on spectral and cross-spectral analysis, see the references below [Welch] [Bloomfield] [Diggle].

Further Details

Usage

$ comp_cross_spectrum_1d \
       -f=input_netcdf_file \
       -v=netcdf_variable \
       -t=time1,time2                           (optional) \
       -p=period_length                         (optional) \
       -o=output_netcdf_file                    (optional) \
       -vi=index_netcdf_variable                (optional) \
       -ni=indice_for_2d_index_netcdf_variable  (optional) \
       -fi=input_index_netcdf_file              (optional) \
       -ti=itime1,itime2                        (optional) \
       -pi=iperiodicity,istep                   (optional) \
       -sl=segment_length                       (optional : 16,32,64,128,...) \
       -tr=trend_removal                        (optional : 0,1,2,3) \
       -tr2=trend_removal_on_segment            (optional : 0,1,2,3) \
       -win=window_choice                       (optional : 1,2,3,4,5,6) \
       -tap=tapered_percentage                  (optional : 0.0 > 1.) \
       -nsm=Daniell_filter_length               (optional) \
       -pro=critical_probability                (optional : 0.0 > 1.) \
       -mi=missing_value                        (optional) \
       -nonormpsd                               (optional) \
       -nooverlap                               (optional) \
       -conflim                                 (optional) \
       -double                                  (optional) \
       -hdf5                                    (optional) \
       -tlimited                                (optional)

By default

-t=
the whole time period associated with the netcdf_variable
-p=
the period_length is set to time2 - time1 + 1, which means that the time series is considered as continuous with only one time interval
-o=
output_netcdf_file name is set to cross_spectrum_netcdf_variable.index_netcdf_variable.nc if index_netcdf_variable is specified with the -vi= argument and to spectrum_netcdf_variable.nc if the -vi= argument is not used
-vi=
a power spectrum analysis is performed
-ni=
if the index_netcdf_variable is bidimensional, the first time series is used
-fi=
the same as the -f= argument
-ti=
the whole time period associated with the index_netcdf_variable
-pi=
this parameter is not used, which is equivalent to use 1,1 for the -pi= argument
-sl=
the segment_length is set to 0, which means that the time series (or each time interval if the -p= argument is used) is not segmented to stabilize the the PSD and CSD estimates
-tr=
the trend_removal is set to 1, which means that the mean of the whole time series is removed before the spectral computations
-tr2=
the trend_removal_on_segment is set to 0, which means that no trend processing is done on each time segment before the spectral computations
-win=
the window_choice is set to 1, which means that the Bartlett window is used in the spectral computations
-tap=
the tapered_percentage is set to 0.2, which means that if a split-cosine-bell window is used for the spectral computations (e.g., if the window_choice is set to 6 with the -win= argument), 20% of the data are tapered
-nsm=
the Daniell_filter_length is set to 0, which means that the PSD and CSD estimates are not smoothed in the frequency domain by a moving average
-pro=
the critical_probability is set to 0.05, which means that if a confidence interval for the power spectrum is requested with the -conflim argument, the lower and upper 95% confidence limit factors are computed. In addition, if the -vi= argument is used, the lower and upper (1-0.05)*100% confidence limit factors and the critical value for testing the null hypothesis that the squared coherency is zero at the 0.05*100% significance level are computed
-mi=
the missing_value for the output NetCDF variables is set to 1.e+20
-nonormpsd
the PSD estimates are normalized in such a way that the total area under the power spectrum is equal to the variance of the time series. If -nonormpsd is specified, the PSD estimates are returned in units which are the square of the data
-nooverlap
normally, if the -sl= argument is used, the time segments are overlapped. However, if -nooverlap is specified, the time segments are not overlapped
-conflim
normally, the equivalent number of degrees of freedom, the confidence limit factors of the PSD estimates and the critical value for testing the null hypothesis that the squared coherency is zero are not computed. However, if -conflim is specified, these statistics are computed and stored in the output NetCDF dataset
-double
the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension (e.g., the frequency axis) is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable (e.g., the real time series) for which a power or cross-power spectrum analysis must be performed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.

  2. If the -t=time1,time2 argument is missing the whole time period associated with the netcdf_variable is used to estimate the power and cross-power spectrum.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

    time2 - time1 + 1 must be an even integer if the -sl= argument is not used and of any length if this argument is used.

  3. If the -p= argument is specified, the PSD and CSD estimates are computed separately for each time period of length period_length (as determined by the value of the -p= argument).

    The whole selected time period (e.g., time2 - time1 + 1 ) must be a multiple of the period_length. The period_length must also be of even length if the -sl= argument is not used.

  4. The -vi=index_netcdf_variable specifies a time series for the cross-power spectrum analysis. If the -vi=index_netcdf_variable is present, the -fi= argument must also be present and this argument specifies the NetCDF dataset which contains the index_netcdf_variable. However, if the NetCDF dataset which contains the index_netcdf_variable is the same as the NetCDF dataset specified by the -f= argument, it is not necessary to specify the -fi= argument.

    If the -vi=index_netcdf_variable is absent, a power spectrum analysis of the NetCDF variable netcdf_variable specified by the -v= argument is performed.

  5. The -ni= argument specifies the indice (e.g., an integer) for selecting the index time series if the index_netcdf_variable specified with the -vi= argument is a 2D NetCDF variable. By default, the first time series is used, which is equivalent to set indice_for_2d_netcdf_variable to 1.

  6. If the -ti=itime1,itime2 argument is missing, data in the whole time period associated with the index_netcdf_variable is taken into account. The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

  7. The -pi= argument gives the periodicity and selects the time step for the index_netcdf_variable. For example, to compute a cross-spectral analysis with the January monthly time series extracted from the index_netcdf_variable which is assumed to be sampled every month, -pi=12,1 should be specified, with yearly data -pi=1,1 may be used, etc.

  8. The selected time periods for the netcdf_variable and index_netcdf_variable must agree. This means that the following equality must be verified

    time2 - time1 + 1  = ceiling((itime2 - itime1 - istep + 2)/iperiodicity),

    otherwise, an error message will be issued and the program will stop.

  9. The -sl=segment_length argument specifies an integer used to segment the time series. The segment_length must be a positive even integer less or equal to the length of the time series (e.g., time2 - time1 + 1 ) or the period_length if the -p= argument is used. Spectral computations are at ( segment_length/2 ) + 1 Fourier frequencies and suggested values for segment_length are 16, 32, 64, 128, … (e.g., the spectral estimates are taken at frequencies (i-1) / segment_length for i=1,2, … , (segment_length/2) + 1. By default, segment_length is set to 0, e.g., the time series is not segmented and segment_length is equal to time2 - time1 + 1 or period_length if the -p= argument is used.

  10. The -tr= argument specifies pre-filtering processing of the real time series before estimating the PSD and CSD estimates of the time series. If:

    • -tr=+1, the mean of the time series is removed before computing the PSD and CSD estimates
    • -tr=+2, the drift from the time series is removed before computing the PSD and CSD estimates. The drift for each time series (say tseries) is estimated using the formula : drift = ( tseries(ntime) - tseries(1) )/( ntime - 1 )
    • -tr=+3, the least-squares line from the time series is removed before computing the PSD and CSD estimates.

    For other values of the -tr= argument, nothing is done before estimating the PSD and CSD estimates.

    If the -p= argument is present, the pre-filtering processing is applied to each time interval of length period_length, separately.

    The -tr= argument must be an integer and the default value for the -tr= argument is 1, e.g., the mean of the time series is removed before the spectrum computations.

  11. The -tr2= argument specifies pre-filtering processing of the real time series before estimating the PSD and CSD estimates on each time segment. If:

    • -tr2=+1, the mean of the time segment is removed before computing the PSD and CSD estimates on the segment
    • -tr2=+2, the drift from the time segment is removed before computing the PSD and CSD estimates on the segment
    • -tr2=+3, the least-squares line from the time segment is removed before computing the PSD and CSD estimates on the segment.

    For other values of the -tr2= argument, nothing is done before estimating the PSD and CSD estimates on each time segment.

    The -tr2= argument must be an integer and the default value for the -tr2= argument is 0, e.g., nothing is done before the spectral computations on each time segment.

  12. The -win=window_choice argument specifies the form of the data window used in the computations of the modified periodogram on each segment. If:

    • -win=1 The Bartlett window is used
    • -win=2 The square window is used
    • -win=3 The Welch window is used
    • -win=4 The Hann window is used
    • -win=5 The Hamming window is used
    • -win=6 A split-cosine-bell window is used.

    The window_choice must be an integer between 1 and 6. The default value for the window_choice is 1, e.g., the Bartlett window is used for the spectral computations on each segment.

  13. The -tap=tapered_percentage argument specifies the total percentage of the data to be tapered if the window_choice is set to 6 with the -win= argument. The tapered_percentage is a real number which must be greater than 0. and less or equal to 1..

    The default value for the tapered_percentage is 0.2 (e.g., 20% of the data are tapered, 10% at the start of the series and 10% at the end).

  14. The -nsm=Daniell_filter_length argument specified that the PSD and CSD estimates must be computed by smoothing the periodogram with Daniell weights (e.g., a simple moving average). On entry, the -nsm= argument gives the length of the Daniell filter to be applied. The Daniell_filter_length must be an odd integer, greater than 2 and less or equal to ( segment_length/2 ) + 1 if the -sl= argument is specified.

    By default, the PSD and CSD estimates are not smoothed in the frequency domain.

  15. If the -nooverlap argument is specified and the -sl= argument is also specified, comp_cross_spectrum_1d segments the time series without any overlapping (in each time period if the -p= argument is used).

    By default, comp_cross_spectrum_1d overlaps the segments by one half of their length (which is equal to the value of the -sl= argument).

    In both cases, each segment will be FFT’d, and the resulting (modified if the -win= argument is used) periodograms will be averaged together to obtain PSD and CSD estimates at the ( segment_length/2 ) + 1 frequencies.

    If the -sl= argument is not specified, the -nooverlap argument has no effect.

  16. If the -nonormpsd argument is specified, the sum of the PSD estimates is equal to the variance of the time series. By default, the PSD estimates are normalized in such a way that the total area under the power spectrum is equal to the variance of the time series.

  17. If the -conflim argument is specified, the equivalent number of degrees of freedom, the lower and upper ( 1 - critical_probability ) * 100 % confidence limit factors of the PSD estimates and the critical value for testing the null hypothesis that the squared coherency is zero at the critical_probability* 100 % significance level are computed.

    Multiply the PSD estimates by the lower and upper confidence limit factors to get the lower and upper limits of a ( 1 - critical_probability ) * 100 % confidence interval for the PSD estimates.

    Squared coherency less than the computed critical value should be regarded as not significantly different from zero at the critical_probability* 100 % significance level.

    By default, these statistics are not computed.

  18. The -pro=critical_probability argument specifies the critical probability which is used to determine the lower and upper confidence limit factors and the critical value for testing the null hypothesis that the squared coherency is zero.

    critical_probability must be greater than 0. and less than 1.. The default value is 0.05.

  19. The -mi=missing_value argument specifies the missing value indicator for the output NetCDF variables in the output_netcdf_file.

    If the -mi= argument is not specified, the missing_value is set to 1.e+20.

  20. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file. By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  21. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g., -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  22. It is assumed that the different real time series have no missing values.

  23. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  24. For more details on power and cross-power spectrum analysis and examples of use in the climate literature, see

    • “The use of Fast Fourier Transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms”, by P.D. Welch, IEEE trans. on audio and electroacoustics, 15:2, 70-73, 1967. DOI: https://doi.org/10.1109/TAU.1967.1161901
    • “Fourier analysis of time series- An introduction”, by P. Bloomfield, John Wiley and Sons, New York, Chapter 6, 1976. http://as.wiley.com/WileyCDA/WileyTitle/productCd-0471889482.html
    • “Time series: a biostatistical introduction”, by P.J. Diggle, Clarendon Press, Oxford, Chap. 4, 1990. http://www.oupcanada.com/catalog/9780198522263.html
    • “Impact of intra-daily SST variability on ENSO characteristics in a coupled model” by S. Masson et al., 2012, Climate Dynamics, Vol. 39:681-707, doi: 10.1007/s00382-011-1247-2
    • “The role of the intra-daily SST variability in the Indian monsoon variability in a global coupled model” by P. Terray et al., 2012, Climate Dynamics, Vol. 39:729-754, doi: 10.1007/s00382-011-1240-9
    • “Southern Hemisphere extra-tropical forcing: A new paradigm for El Nino-Southern Oscillation” by P. Terray, 2011, Climate Dynamics, Vol. 36:2171-2199, doi: 10.1007/s00382-010-0825-z

Outputs

comp_cross_spectrum_1d creates an output NetCDF file that contains the PSD and CSD estimates of the time series associated with the input NetCDF variables. The output NetCDF dataset contains the following NetCDF variables (in the description below, nfreq is the number of frequencies where the PSD and CSD estimates are computed (as determined by the -t=, -p= and -sl= arguments) and nperiod is 1 if the -p= argument is not used or equal to ntime / period_length if it is used, where ntime = time2 - time1 + 1 ) :

  1. frequency(nfreq) : the frequencies at which the PSD estimates are computed.

  2. periodicity(nfreq) : the corresponding periodicities in number of time observations.

  3. netcdf_variable_spectrum(nperiod,nfreq) : the corresponding PSD estimates for each time interval of the time series associated with the input NetCDF variable.

  4. index_netcdf_variable_spectrum(nperiod,nfreq) : the corresponding PSD estimates for each time interval of the time series associated with the input index NetCDF variable.

    This variable is stored only if the -vi= argument has been specified when calling comp_cross_spectrum_1d.

  5. index_netcdf_variable_netcdf_variable_squaredcoherency(nperiod,nfreq) : the corresponding squared coherency estimates for each time interval of the time series associated with the input NetCDF variable and the time series associated with the input index NetCDF variable. The square coherency is given on the closed interval [0., 1.].

    This variable is stored only if the -vi= argument has been specified when calling comp_cross_spectrum_1d.

  6. index_netcdf_variable_netcdf_variable_phase(nperiod,nfreq) : the corresponding phase estimates of the cross-spectrum for each time interval of the time series associated with the input NetCDF variable and the time series associated with the input index NetCDF variable. In general terms, the phase gives the lag (negative values) or lead (positive values) of the index_netcdf_variable time series with respect to the netcdf_variable time series at each frequency in radians.

    Here, the phase is given in fractions of a circle (e.g., on the closed interval [0., 1.] ) instead of being in radians. In other words, for phase values less than 0.5 the index_netcdf_variable leads the netcdf_variable and vice-versa for phase values greater than 0.5.

    This variable is stored only if the -vi= argument has been specified when calling comp_cross_spectrum_1d.

  7. edof(nfreq) : the equivalent number of degrees of freedom of the PSD and CSD estimates of the time series associated with the input NetCDF variable.

    This variable is stored only if the -conflim argument has been specified when calling comp_cross_spectrum_1d.

  8. lower_factor(nfreq) : the lower confidence limit factors of the PSD estimates. Multiply the PSD estimates by the corresponding lower confidence limit factors to get the lower limits of the ( 1 - critical_probability ) * 100 % confidence intervals for the PSD estimates.

    This variable is stored only if the -conflim argument has been specified when calling comp_cross_spectrum_1d.

  9. upper_factor(nfreq) : the upper confidence limit factors of the PSD estimates. Multiply the PSD estimates by the corresponding upper confidence limit factors to get the upper limits of the ( 1 - critical_probability ) * 100 % confidence intervals for the PSD estimates.

    This variable is stored only if the -conflim argument has been specified when calling comp_cross_spectrum_1d.

  10. coher_test(nfreq) : the critical values for the statistical test of the squared coherency estimates. Squared coherency less than these computed critical values for a given frequency should be regarded as not significantly different from zero at the critical_probability* 100 % significance level.

    This variable is stored only if the -conflim and -vi= arguments have been specified when calling comp_cross_spectrum_1d.

Examples

  1. For computing the power and cross-power spectra of a real monthly time series from a unidimensional NetCDF variable called sst_iod extracted from the file sst.monthly.iod.nc with a monthly Nino34 SST index in a NetCDF variable called sst_nino34 extracted from the file sst.monthly.nino34.nc and store the results in a NetCDF file named cross_spectrum_iod_nino34.nc, use the following command:

    $ comp_cross_spectrum_1d \
      -f=sst.monthly.iod.nc \
      -v=sst_iod \
      -fi=sst.monthly.nino34.nc \
      -vi=sst_nino34 \
      -sl=128 \
      -o=cross_spectrum_iod_nino34.nc \
      -nonormpsd \
      -nooverlap
    
Flag Counter