comp_spectrum_1d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

24/05/2024

Purpose

Compute Fast Fourier Transform (FFT) estimates of the power spectrum of a real time series extracted from a uni- or bidimensional variable readed from a NetCDF dataset.

Spectral analysis amounts to estimate the power spectrum of a signal (e.g., a time series) from its time representation. The Power Spectral Density (PSD) estimates, which constitute the power spectrum, characterize the frequency content of the signal and decompose the variance of the signal into the different frequencies and allow to identify the periodicities present in the signal.

comp_spectrum_1d is based on a non-parametric method, which divides the time series into segments (see the -sl= argument), applies FFT on each segment, computes the PSD estimates from the FFTs and, finally, averages these estimates from each segment. Such non-parametric method is based on the observed time series and does not require prior knowledge about the data or a model generating the data. This method is known as the Welch’s averaged, modified periodogram method for estimating the power spectrum of a time series [Welch] .

At the user option, the time series can be detrended before computing the PSD estimates (see the -tr= and -tr2= arguments) in order to ensure that the analyzed time series remain stationary.

In order to obtain stable PSD estimates (e.g., reduce the variance of the estimates) and minimize the loss of information, the real time series can also be segmented, and PSD estimates on, eventually overlapping, segments can be averaged at the user option. This is known as the Welch’s method. The stability of the PSD estimates is increased by this averaging process because the Welch’s method uses PSD estimates from different segments of the time series, which represent approximately uncorrelated estimates of the true PSD estimates. That is, the greater the number of segments, the more stable, the resulting PSD estimates as their variance is reduced by the averaging.

The user can also select different data windows for the computations of the PSD estimates (see the -win= argument) on each segment. The segments are multiplied by a data window (selected with the -win= argument) before performing the FFTs, so that the Welch’s method used in comp_spectrum_1d computes a modified periodogram on each segment and then averaged these estimates to produce the final PSD estimates. Because the segments usually overlap (see the -nooverlap argument), data values at the beginning and end of each segment tapered by the data window on one segment, occur away from the ends of adjacent segments. In this way, the overlap guards against the loss of information caused by windowing, but allows to increase the number of segments and, finally, PSD estimates on each segment can still be considered as uncorrelated.

Optionally, the PSD estimates may also been smoothed in the frequency domain by Daniell weights (e.g., a simple moving average; see the -nsm= argument).

Additionally, the spectral analysis can be done separately for different segments of equal length of the selected time series if these time series are not continuous in time (see the -p= argument).

The PSD estimates are returned in units, which are the square of the data if the -nonormpsd argument is used (meaning that the sum of the PSD estimates is equal to the variance of the time series) or in PSD units otherwise (meaning that the total area under the power spectrum is equal to the variance of the time series).

Finally, again at the user option, statistical confidence interval and equivalent number of degrees of freedom of the PSD estimates can be computed (see the -conflim= argument).

All these power spectrum statistics are stored in an output NetCDF dataset.

If the NetCDF variable is tridimensional use comp_cross_spectrum_3d or comp_cross_spectrum2_3d instead of comp_spectrum_1d to estimate the pwer spectra of the multichannel time series. If the time series has a seasonal (or diurnal) cycle, use comp_stl_1d in order to estimate and remove the harmonic components of the time series before using comp_spectrum_1d.

If you want to compute a cross-power spectrum analysis between two time series use comp_cross_spectrum_1d instead of comp_spectrum_1d.

Finally, use comp_spectvar_1d to compute variance or PSD estimates in a selected frequency band for the real time series before or after computing the power spectrum of the real time series.

For more information on spectral analysis, see the references below [Welch] [Bloomfield] [Diggle].

Further Details

Usage

$ comp_spectrum_1d \
       -f=input_netcdf_file \
       -v=netcdf_variable \
       -t=time1,time2                    (optional) \
       -p=period_length                  (optional) \
       -o=output_netcdf_file             (optional) \
       -ni=index_for_2d_netcdf_variable  (optional) \
       -sl=segment_length                (optional : 16,32,64,128,...) \
       -tr=trend_removal                 (optional : 0,1,2,3) \
       -tr2=trend_removal_on_segment     (optional : 0,1,2,3) \
       -win=window_choice                (optional : 1,2,3,4,5,6) \
       -tap=tapered_percentage           (optional : 0.0 > 1.) \
       -nsm=Daniell_filter_length        (optional) \
       -pro=critical_probability         (optional : 0.0 > 1.) \
       -mi=missing_value                 (optional) \
       -nonormpsd                        (optional) \
       -nooverlap                        (optional) \
       -conflim                          (optional) \
       -double                           (optional) \
       -hdf5                             (optional) \
       -tlimited                         (optional)

By default

-t=
the whole time period associated with the netcdf_variable
-p=
the period_length is set to time2 - time1 + 1, which means that the time series is considered as continuous with only one time interval
-o=
the output_netcdf_file is named spectrum_netcdf_variable.nc
-ni=
if the netcdf_variable is bidimensional, the first time series is used
-sl=
the segment_length is set to 0, which means that the time series (or each time interval if the -p= argument is used) is not segmented to stabilize the the PSD estimates
-tr=
the trend_removal is set to 1, which means that the mean of the whole time series is removed before the spectral computations
-tr2=
the trend_removal_on_segment is set to 0, which means that no trend processing is done on each time segment before the spectral computations
-win=
the window_choice is set to 1, which means that the Bartlett window is used in the spectral computations
-tap=
the tapered_percentage is set to 0.2, which means that if a split-cosine-bell window is used for the spectral computations (e.g., if the window_choice is set to 6 with the -win= argument), 20% of the data are tapered
-nsm=
the Daniell_filter_length is set to 0, which means that the PSD estimates are not smoothed in the frequency domain by a moving average
-pro=
the critical_probability is set to 0.05, which means that if a confidence interval for the spectrum is requested with the -conflim argument, the lower and upper 95% confidence limit factors are computed
-mi=
the missing_value for the output NetCDF variables is set to 1.e+20
-nonormpsd
the PSD estimates are normalized in such a way that the total area under the power spectrum is equal to the variance of the time series. If -nonormpsd is specified, the PSD estimates are returned in units which are the square of the data
-nooverlap
normally, if the -sl= argument is used, the time segments are overlapped. However, if -nooverlap is specified, the time segments are not overlapped
-conflim
normally, the equivalent number of degrees of freedom and the confidence limit factors of the power spectrum estimates are not computed. However, if -conflim is specified, these statistics are computed and stored in the output NetCDF dataset
-double
the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension (e.g., the frequency axis) is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable (e.g., the time series) for which a power spectrum must be computed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.

  2. If the -t=time1,time2 argument is missing the whole time period associated with the netcdf_variable is used to estimate the power spectrum.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

    time2 - time1 + 1 must be an even integer if the -sl= argument is not used and of any length if this argument is used.

  3. If the -p= argument is specified, the power spectrum estimates are computed separately for each time period of length period_length (as determined by the value of the -p= argument).

    The whole selected time period (e.g., time2 - time1 + 1 ) must be a multiple of the period_length. The period_length must also be of even length if the -sl= argument is not used.

  4. The -ni=index_for_2d_netcdf_variable argument specifies the index for selecting the time series if the netcdf_variable is a 2D NetCDF variable. By default, the first time series is used, which is equivalent to set index_for_2d_netcdf_variable to 1.

  5. The -sl=segment_length argument specifies an integer used to segment the time series. The segment_length must be a positive even integer less or equal to the length of the time series (e.g., time2 - time1 + 1 ) or the period_length if the -p= argument is used. Spectral computations are at ( segment_length/2 ) + 1 Fourier frequencies and suggested values for segment_length are 16, 32, 64, 128, … (e.g., the spectral estimates are taken at frequencies (i-1) / segment_length for i=1,2, … , (segment_length/2) + 1. By default, segment_length is set to 0, e.g., the time series is not segmented and the segment length is equal to time2 - time1 + 1 or period_length if the -p= argument is used.

  6. The -tr= argument specifies pre-filtering processing of the real time series before estimating the power spectrum of the time series. If:

    • -tr=+1, the mean of the time series is removed before computing the power spectrum
    • -tr=+2, the drift from the time series is removed before computing the power spectrum. The drift for the time series (say tseries) is estimated using the formula : drift = ( tseries(ntime) - tseries(1) )/( ntime - 1 )
    • -tr=+3, the least-squares line from the time series is removed before computing the power spectrum.

    For other values of the -tr= argument, nothing is done before estimating the spectrum.

    If the -p= argument is present, the pre-filtering processing is applied to each time interval of length period_length, separately.

    The -tr= argument must be an integer and the default value for the -tr= argument is 1, e.g., the mean of the time series is removed before the spectrum computations.

  7. The -tr2= argument specifies pre-filtering processing of the real time series before estimating the power spectrum on each time segment. If:

    • -tr2=+1, the mean of the time segment is removed before computing the power spectrum on the segment
    • -tr2=+2, the drift from the time segment is removed before computing the power spectrum on the segment
    • -tr2=+3, the least-squares line from the time segment is removed before computing the power spectrum on the segment.

    For other values of the -tr2= argument, nothing is done before estimating the spectrum on each time segment.

    The -tr2= argument must be an integer and the default value for the -tr2= argument is 0, e.g., nothing is done before the spectral computations on each time segment.

  8. The -win=window_choice argument specifies the form of the data window used in the computations of the modified periodogram on each segment. If:

    • -win=1 The Bartlett window is used
    • -win=2 The square window is used
    • -win=3 The Welch window is used
    • -win=4 The Hann window is used
    • -win=5 The Hamming window is used
    • -win=6 A split-cosine-bell window is used.

    The window_choice must be an integer between 1 and 6. The default value for the window_choice is 1, e.g., the Bartlett window is used for the spectral computations on each segment.

  9. The -tap=tapered_percentage argument specifies the total percentage of the data to be tapered if the window_choice is set to 6 with the -win= argument. The tapered_percentage is a real number which must be greater than 0. and less or equal to 1..

    The default value for the tapered_percentage is 0.2 (e.g., 20% of the data are tapered, 10% at the start of the series and 10% at the end).

  10. The -nsm=Daniell_filter_length argument specified that the PSD estimates must be computed by smoothing the periodogram with Daniell weights (e.g., a simple moving average). On entry, the -nsm= argument gives the length of the Daniell filter to be applied. The Daniell_filter_length must be an odd integer, greater than 2 and less or equal to ( segment_length/2 ) + 1 if the -sl= argument is specified.

    By default, the PSD estimates are not smoothed in the frequency domain.

  11. If the -nooverlap argument is specified and the -sl= argument is also specified, comp_spectrum_1d segments the time series without any overlapping (in each time period if the -p= argument is used).

    By default, comp_spectrum_1d overlaps the segments by one half of their length (which is equal to the value of the -sl= argument).

    In both cases, each segment will be FFT’d, and the resulting (modified if the -win= argument is used) periodograms will be averaged together to obtain PSD estimates at the ( segment_length/2 ) + 1 frequencies.

    If the -sl= argument is not specified, the -nooverlap argument has no effect.

  12. If the -nonormpsd argument is specified, the sum of the PSD estimates is equal to the variance of the time series. By default, the PSD estimates are normalized in such a way that the total area under the power spectrum is equal to the variance of the time series.

  13. If the -conflim argument is specified, the equivalent number of degrees of freedom, the lower and upper ( 1 - critical_probability ) * 100 % confidence limit factors of the PSD estimates are computed. Multiply the PSD estimates by the lower and upper confidence limit factors to get the lower and upper limits of a ( 1 - critical_probability ) * 100 % confidence interval for the PSD estimates.

    By default, these statistics are not computed.

  14. The -pro=critical_probability argument specifies the critical probability which is used to determine the lower and upper confidence limit factors. critical_probability must be greater than 0. and less than 1.. The default value is 0.05.

  15. The -mi=missing_value argument specifies the missing value indicator for the output NetCDF variables in the output_netcdf_file.

    If the -mi= argument is not specified, the missing_value is set to 1.e+20.

  16. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file. By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  17. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g., -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  18. It is assumed that the real time series has no missing values.

  19. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  20. For more details on power spectrum analysis and examples of use in the climate literature, see

    • “The use of Fast Fourier Transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms”, by P.D. Welch, IEEE trans. on audio and electroacoustics, 15:2, 70-73, 1967. DOI: https://doi.org/10.1109/TAU.1967.1161901
    • “Fourier analysis of time series- An introduction”, by P. Bloomfield, John Wiley and Sons, New York, Chapter 6, 1976. http://as.wiley.com/WileyCDA/WileyTitle/productCd-0471889482.html
    • “Time series: a biostatistical introduction”, by P.J. Diggle, Clarendon Press, Oxford, Chap. 4, 1990. http://www.oupcanada.com/catalog/9780198522263.html
    • “Impact of intra-daily SST variability on ENSO characteristics in a coupled model” by S. Masson et al., 2012, Climate Dynamics, Vol. 39:681-707, doi: 10.1007/s00382-011-1247-2
    • “The role of the intra-daily SST variability in the Indian monsoon variability in a global coupled model” by P. Terray et al., 2012, Climate Dynamics, Vol. 39:729-754, doi: 10.1007/s00382-011-1240-9
    • “Southern Hemisphere extra-tropical forcing: A new paradigm for El Nino-Southern Oscillation” by P. Terray, 2011, Climate Dynamics, Vol. 36:2171-2199, doi: 10.1007/s00382-010-0825-z

Outputs

comp_spectrum_1d creates an output NetCDF file that contains the power spectrum of the time series associated with the input NetCDF variable. The output NetCDF dataset contains the following NetCDF variables (in the description below, nfreq is the number of frequencies where the PSD estimates are computed as determined by the -t=, -p= and -sl= arguments and nperiod is 1 if the -p= argument is not used or equal to ntime / period_length if it is used, where ntime = time2 - time1 + 1 ) :

  1. frequency(nfreq) : the frequencies at which the PSD estimates are computed.

  2. periodicity(nfreq) : the corresponding periodicities in number of time observations.

  3. netcdf_variable_spectrum(nperiod,nfreq) : the corresponding PSD estimates for each time interval of the time series associated with the input NetCDF variable.

  4. edof(nfreq) : the equivalent number of degrees of freedom of the PSD estimates of the time series associated with the input NetCDF variable.

    This variable is stored only if the -conflim argument has been specified when calling comp_spectrum_1d.

  5. lower_factor(nfreq) : the lower confidence limit factors of the PSD estimates. Multiply the PSD estimates by the corresponding lower confidence limit factors to get the lower limits of the ( 1 - critical_probability ) * 100 % confidence intervals for the PSD estimates.

    This variable is stored only if the -conflim argument has been specified when calling comp_spectrum_1d.

  6. upper_factor(nfreq) : the upper confidence limit factors of the PSD estimates. Multiply the PSD estimates by the corresponding upper confidence limit factors to get the upper limits of the ( 1 - critical_probability ) * 100 % confidence intervals for the PSD estimates.

    This variable is stored only if the -conflim argument has been specified when calling comp_spectrum_1d.

Examples

  1. For computing the power spectrum of a real monthly time series from a unidimemsional NetCDF variable called sst extracted from the file sst.monthly.nino34.nc and store the results in the NetCDF file spectrum_sst_nino34.nc, use the following command :

    $ comp_spectrum_1d \
      -f=sst.monthly.nino34.nc \
      -v=sst \
      -sl=128 \
      -nonormpsd \
      -nooverlap \
      -o=spectrum_sst_nino34.nc
    
Flag Counter