comp_spectrum_1d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

02/01/2018

Purpose

Compute Fast Fourier Transform (FFT) estimates of the power spectrum of a real time series extracted from a uni or bidimensional variable readed from a NetCDF dataset.

At the user option, the time series can be detrended before computing the power spectrum (see the -tr= and -tr2= arguments).

The user can select different data windows for the computations of the Power Spectral Density (PSD) estimates (see the -win= argument).

In order to obtain stable PSD estimates, the real time series can be segmented and PSD estimates on, eventually overlapping, segments can be averaged at the user option. The stability of the PSD estimates is increased by this averaging process. That is, the greater the number of segments, the more stable, the resulting PSD estimates.

Optionally, theses PSD estimates may also been smoothed in the frequency domain by Daniell weights (e.g. a simple moving average; see the -nsm= argument).

The PSD estimates are returned in units, which are the square of the data if the -nonormpsd argument is used (meaning that the sum of the PSD estimates is equal to the variance of the time series) or in PSD units otherwise (meaning that the total area under the power spectrum is equal to the variance of the time series).

Finally, again at the user option, statistical confidence interval and equivalent number of degrees of freedom of the power spectrum estimates can be estimated.

All these power spectrum statistics are stored in an output NetCDF dataset.

For more information on spectral analysis, see the references below [Bloomfield] [Diggle].

Further Details

Usage

$ comp_spectrum_1d \
       -f=input_netcdf_file \
       -v=netcdf_variable \
       -t=time1,time2                    (optional) \
       -o=output_netcdf_file             (optional) \
       -ni=index_for_2d_netcdf_variable  (optional) \
       -p=period_length                  (optional) \
       -sl=segment_length                (optional : 16,32,64,128,...) \
       -tr=trend_removal                 (optional : 0,1,2,3) \
       -tr2=trend_removal_on_segment     (optional : 0,1,2,3) \
       -win=window_choice                (optional : 1,2,3,4,5,6) \
       -tap=tapered_percentage           (optional : 0.0 > 1.) \
       -nsm=Daniell_filter_length        (optional) \
       -pro=critical_probability         (optional : 0.0 > 1.) \
       -mi=missing_value                 (optional) \
       -nonormpsd                        (optional) \
       -nooverlap                        (optional) \
       -conflim                          (optional) \
       -double                           (optional) \
       -hdf5                             (optional) \
       -tlimited                         (optional)

By default

-t=
the whole time period associated with the netcdf_variable
-o=
the output_netcdf_file is named spectrum_netcdf_variable.nc
-ni=
if the netcdf_variable is bidimensional, the first time series is used
-p=
the period_length is set to time2 - time1 + 1, which means that the time series is considered as continuous with only one time interval
-sl=
the segment_length is set to 0, which means that the time series (or each time interval if the -p= argument is used) is not segmented to stabilize the the PSD estimates
-tr=
the trend_removal is set to 1, which means that the mean of the whole time series is removed before the spectral computations
-tr2=
the trend_removal_on_segment is set to 0, which means that no trend processing is done on each time segment before the spectral computations
-win=
the window_choice is set to 1, which means that the Bartlett window is used in the spectral computations
-tap=
the tapered_percentage is set to 0.2, which means that if a split-cosine-bell window is used for the spectral computations (e.g. if the window_choice is set to 6 with the -win= argument), 20% of the data are tapered
-nsm=
the Daniell_filter_length is set to 0, which means that the PSD estimates are not smoothed in the frequency domain by a moving average
-pro=
the critical_probability is set to 0.05, which means that if a confidence interval for the spectrum is requested with the -conflim argument, the lower and upper 95% confidence limit factors are computed
-mi=
the missing_value for the output NetCDF variables is set to 1.e+20
-nonormpsd
the PSD estimates are normalized in such a way that the total area under the power spectrum is equal to the variance of the time series. If -nonormpsd is specified, the PSD estimates are returned in units which are the square of the data
-nooverlap
Normally, if the -sl= argument is used, the time segments are overlapped. However, if -nooverlap is specified, the time segments are not overlapped
-conflim
Normally, the equivalent number of degrees of freedom and the confidence limit factors of the power spectrum estimates are not computed. However, if -conflim is specified, these statistics are computed and stored in the output NetCDF dataset
-double
the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable (e.g. the time series) for which a power spectrum must be computed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.

  2. If the -t=time1,time2 argument is missing the whole time period associated with the netcdf_variable is used to estimate the power spectrum.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

    time2 - time1 + 1 must be an even integer if the -sl= argument is not used and of any length if this argument is used.

  3. The -ni=index_for_2d_netcdf_variable argument specifies the index for selecting the time series if the netcdf_variable is a 2D NetCDF variable. By default, the first time series is used, which is equivalent to set index_for_2d_netcdf_variable to 1.

  4. If the -p= argument is specified, the power spectrum estimates are computed separately for each time period of length period_length (as determined by the value of the -p= argument).

    The whole selected time period (e.g. time2 - time1 + 1 ) must be a multiple of the period_length. The period_length must also be of even length if the -sl= argument is not used.

  5. The -sl=segment_length argument specifies an integer used to segment the time series. The segment_length must be a positive even integer less or equal to the length of the time series (e.g. time2 - time1 + 1 ) or the period_length if the -p= argument is used. Spectral computations are at ( segment_length/2 ) + 1 Fourier frequencies and suggested values for segment_length are 16, 32, 64, 128, … (e.g. the spectral estimates are taken at frequencies (i-1) / segment_length for i=1,2, … , (segment_length/2) + 1. By default, segment_length is set to 0, e.g. the time series is not segmented and the segment length is equal to time2 - time1 + 1 or period_length if the -p= argument is used.

  6. The -tr= argument specifies pre-filtering processing of the real time series before estimating the power spectrum of the time series. If:

    • -tr=+1, the mean of the time series is removed before computing the power spectrum
    • -tr=+2, the drift from the time series is removed before computing the power spectrum. The drift for the time series is estimated using the formula : drift = ( tseries(ntime) - tseries(1) )/( ntime - 1 )
    • -tr=+3, the least-squares line from the time series is removed before computing the power spectrum.

    For other values of the -tr= argument, nothing is done before estimating the spectrum.

    If the -p= argument is present, the pre-filtering processing is applied to each time interval, separately.

    The -tr= argument must be an integer and the default value for the -tr= argument is 1, e.g. the mean of the time series is removed before the spectrum computations.

  7. The -tr2= argument specifies pre-filtering processing of the real time series before estimating the power spectrum on each time segment. If:

    • -tr2=+1, the mean of the time segment is removed before computing the power spectrum on the segment
    • -tr2=+2, the drift from the time segment is removed before computing the power spectrum on the segment
    • -tr2=+3, the least-squares line from the time segment is removed before computing the power spectrum on the segment.

    For other values of the -tr2= argument, nothing is done before estimating the spectrum on each time segment.

    The -tr2= argument must be an integer and the default value for the -tr2= argument is 0, e.g. nothing is done before the spectral computations on each time segment.

  8. The -win=window_choice argument specifies the form of the data window used in the computations of the power spectrum. If:

    • -win=1 The Bartlett window is used
    • -win=2 The square window is used
    • -win=3 The Welch window is used
    • -win=4 The Hann window is used
    • -win=5 The Hamming window is used
    • -win=6 A split-cosine-bell window is used.

    The window_choice must be an integer between 1 and 6. The default value for the window_choice is 1, e.g. the Bartlett window is used for the spectral computations.

  9. The -tap=tapered_percentage argument specifies the total percentage of the data to be tapered if the window_choice is set to 6 with the -win= argument. The tapered_percentage must be greater than 0 and less or equal to 1. The default value for the window_choice is 0.2 (e.g. 20% of the data are tapered, 10% at the start of the series and 10% at the end).

  10. The -nsm=Daniell_filter_length argument specified that the PSD estimates must be computed by smoothing the periodogram with Daniell weights (e.g. a simple moving average). On entry, the -nsm= argument gives the length of the Daniell filter to be applied. The Daniell_filter_length must be odd, greater than 2 and less or equal to ( segment_length/2 ) + 1. By default, the PSD estimates are not smoothed in the frequency domain.

  11. If the -nooverlap argument is specified and the -sl= argument is also specified, comp_spectrum_1d segments the time series without any overlapping (in each period if the -p= argument is used). By default, comp_spectrum_1d overlaps the segments by one half of their length (which is equal to the value of the -sl= argument).

    In both cases, each segment will be FFT’d, and the resulting periodograms will be averaged together to obtain PSD estimates at the ( segment_length/2 ) + 1 frequencies.

    If the -sl= argument is not specified, the -nooverlap argument has no effect.

  12. If the -nonormpsd argument is specified, the sum of the PSD estimates is equal to the variance of the time series. By default, the PSD estimates are normalized in such a way that the total area under the power spectrum is equal to the variance of the time series.

  13. If the -conflim argument is specified, the equivalent number of degrees of freedom and the lower and upper ( 1 - critical_probability ) * 100 % confidence limit factors of the power spectrum estimates are computed. Multiply the PSD estimates by the lower and upper confidence limit factors to get the lower and upper limits of a ( 1 - critical_probability ) * 100 % confidence interval for the PSD estimates.

    By default, these statistics are not computed.

  14. The -pro=critical_probability argument specifies the critical probability which is used to determine the lower and upper confidence limit factors. critical_probability must be greater than 0 and less than 1. The default value is 0.05.

  15. The -mi=missing_value argument specifies the missing value indicator for the output NetCDF variables in the output_netcdf_file.

    If the -mi= argument is not specified, the missing_value is set to 1.e+20.

  16. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file. By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  17. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  18. It is assumed that the real time series has no missing values.

  19. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  20. For more details on power spectrum analysis and examples of use in the climate literature, see

Outputs

comp_spectrum_1d creates an output NetCDF file that contains the power spectrum of the time series associated with the input NetCDF variable. The output NetCDF dataset contains the following NetCDF variables (in the description below, nfreq is the number of frequencies where the PSD estimates are computed as determined by the -t=, -p= and -sl= arguments and nperiod is 1 if the -p= argument is not used or equal to ntime / period_length if it is used, where ntime = time2 - time1 + 1 ) :

  1. frequency(nfreq) : the frequencies at which the PSD estimates are computed.

  2. periodicity(nfreq) : the corresponding periods in number of time observations.

  3. netcdf_variable_spectrum(nfreq,nperiod) : the corresponding PSD estimates for each time interval of the time series associated with the input NetCDF variable.

  4. edof(nfreq) : the equivalent number of degrees of freedom of the PSD estimates the time series associated with the input NetCDF variable.

    This variable is stored only if the -conflim argument has been specified when calling comp_spectrum_1d.

  5. lower_factor(nfreq) : the lower confidence limit factors of the PSD estimates. Multiply the PSD estimates by the corresponding lower confidence limit factors to get the lower limits of the ( 1 - critical_probability ) * 100 % confidence intervals for the PSD estimates.

    This variable is stored only if the -conflim argument has been specified when calling comp_spectrum_1d.

  6. upper_factor(nfreq) : the upper confidence limit factors of the PSD estimates. Multiply the PSD estimates by the corresponding upper confidence limit factors to get the upper limits of the ( 1 - critical_probability ) * 100 % confidence intervals for the PSD estimates.

    This variable is stored only if the -conflim argument has been specified when calling comp_spectrum_1d.

Examples

  1. For computing the power spectrum of a real monthly time series from a NetCDF variable called sst extracted from the file sst.monthly.nino34.nc, which includes a monthly time series, and store the results in the NetCDF file spectrum_sst_nino34.nc, use the following command :

    $ comp_spectrum_1d \
      -f=sst.monthly.nino34.nc \
      -v=sst \
      -sl=128 \
      -nonormpsd \
      -nooverlap \
      -o=spectrum_sst_nino34.nc