comp_spectrum_diff_3d¶

Authors¶

Pascal Terray (LOCEAN/IPSL)

Latest revision¶

30/05/2024

Purpose¶

Test if two multi-channel power spectra, computed from two real multi-channel time series with the help of comp_cross_spectrum_3d or comp_cross_spectrum2_3d, are significantly different or have a different shape using various nonparametric statistical tests based on the average of the squares of the (Log) spectral ratios of the two estimated multi-channel power spectra (e.g., a CHI2 statistic), the minimum and maximum values of these spectral ratios or the differences between the two (normalized) cumulative periodograms derived from the two estimated multi-channel power spectra [Diggle] [Priestley] [Jenkins_Watts] [Diggle_Fisher] [Coates_Diggle] [Potscher_Reschenhofera] [Potscher_Reschenhoferb] .

The critical probabilities associated with these different spectral statistics are estimated or approximated under the assumptions (e.g., the null hpothesis) that the two “true” underlying multi-channel power spectra are the same or have the same shape depending on the selected test statistic.

See below for more details on the different test statistics available in comp_spectrum_diff_3d and their associated null hypothesis, e.g., same underlying “true” power spectrum or same shape of the two “true” power spectra.

The two input multi-channel power spectra are read from two tridimensional variables (as specified by the -nv= and -dv= arguments) readed from one or two NetCDF datasets (as specified by the -nf= and -df= arguments).

It is assumed that the two estimated multi-channel power spectra have been computed by the Welch’s method [Welch] by calls to comp_cross_spectrum_3d or comp_cross_spectrum2_3d with the -conflim argument. This optional argument is required for computing the DOFs of the Power Spectral Density (PSD) estimates, which are used by comp_spectrum_diff_3d, excepted when -a=ks_perm or chi2_perm is specified in the call to comp_spectrum_diff_3d. Note also that the methods used to estimate the two multi-channel power spectra must be identical and that the two multi-channel power spectra must not have been smoothed in the frequency domain.

The multi-channel test statistics and their associated critical probabilities are stored in an output NetCDF dataset.

If you want to get a pointwise tolerance interval of the given (Log) spectral ratios for two estimated multi-channel power spectra, use comp_spectrum_ratio_3d instead of comp_spectrum_diff_3d.

For additional details on these spectral computations, see [Welch] [Diggle] [Priestley] [Jenkins_Watts] [Coates_Diggle] [Diggle_Fisher] [Potscher_Reschenhoferb] .

Further Details¶

Usage¶

$ comp_spectrum_diff_3d \
       -nf=numerator_netcdf_file \
       -nv=numerator_netcdf_variable \
       -dv=denominator_netcdf_variable \
       -df=denominator_netcdf_file       (optional) \
       -m=input_mesh_mask_netcdf_file    (optional) \
       -g=grid_type                      (optional : n, t, u, v, w, f) \
       -x=lon1,lon2                      (optional) \
       -y=lat1,lat2                      (optional) \
       -np=numerator_period              (optional) \
       -dp=denominator_period            (optional) \
       -t=freq1,freq2                    (optional) \
       -a=type_of_statistic              (optional : minmax, range, chi2, chi2_perm, ks_perm) \
       -nb=number_of_shuffles            (optional) \
       -o=output_netcdf_file             (optional) \
       -mi=missing_value                 (optional) \
       -ksnonorm                         (optional) \
       -double                           (optional) \
       -bigfile                          (optional) \
       -hdf5                             (optional) \
       -tlimited                         (optional)

By default¶

-df=

the denominator_netcdf_file is the same as the numerator_netcdf_file

-m=

an input_mesh_mask_netcdf_file is not used

-g=

the grid_type is set to n, which means that the 2-D grid-mesh associated with the input numerator_netcdf_variable and denominator_netcdf_variable is assumed to be regular or Gaussian

-x=

the whole longitude domain associated with the numerator_netcdf_variable and denominator_netcdf_variable

-y=

the whole latitude domain associated with the numerator_netcdf_variable and denominator_netcdf_variable

-np=

the numerator_period argument is not used

-dp=

the denominator_period argument is not used

-t=

the whole set of frequencies included in the two input NetCDF files, but excluding the zero and Nyquist frequencies

-a=

the type_of_statistic is set to chi2, e.g., a CHI2 statistic is used as the spectral test statistic

-nb=

the number_of_shuffles is set to 999, meaning that 999 random interchanges are performed if a randomization test is requested (e.g., if -a=chi2_perm or -a=ks_perm). For other values of the -a= argument, the -nb= argument has no effect

-o=

output_netcdf_file name is set to spectrum_diff_numerator_netcdf_variable.denominator_netcdf_variable.nc where numerator_netcdf_variable and denominator_netcdf_variable are specified with the -nv= and -dv= arguments.

-mi=

the missing_value for the output NetCDF variables is set to 1.e+20

-ksnonorm

the Kolmogorov-Smirnov statistic is computed from the normalized cumulative periodograms if -a=ks_perm is specified. However, if -ksnonorm is also specified, the cumulative periodograms are not normalized before computing the Kolmogorov-Smirnov statistic. For other values of the -a= argument, the -ksnonorm argument has no effect

-double

the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers

-bigfile

a NetCDF classical format file is created. However, if -bigfile is activated, the output NetCDF file is a 64-bit offset format file

-hdf5

a NetCDF classical format file is created . However, if -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file

-tlimited

the time dimension (e.g., the frequency axis) is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the frequency dimension is defined as limited in the output NetCDF file

Remarks¶

The -nf=numerator_netcdf_file argument specifies the NetCDF file containing the multi-channel PSD estimates and their associated degrees of freedom for the numerator of the spectral ratio or for the first real multi-channel time series if a Kolmogorov-Smirnov statistic is used.

It is assumed that this NetCDF file has been created by NCSTAT procedures performing spectral computations like comp_cross_spectrum_3d or comp_cross_spectrum2_3d with the -conflim argument activated (e.g., this NetCDF file must contain two NetCDF variables named edof and freq).

The -df=denominator_netcdf_file argument specifies the NetCDF file containing the multi-channel PSD estimates and their associated degrees of freedom for the denominator of the spectral ratio or for the second real multi-channel time series if a Kolmogorov-Smirnov statistic is used.

If this argument is not specified, it is assumed that the denominator_netcdf_file is the same as the numerator_netcdf_file.

It is also assumed that this NetCDF file has been created by NCSTAT procedures performing spectral computations like comp_cross_spectrum_3d or comp_cross_spectrum2_3d with the -conflim argument activated (e.g., this NetCDF file must contain two NetCDF variables named edof and freq).

The -nv=numerator_netcdf_variable and -dv=denominator_netcdf_variable arguments specify the NetCDF variables containing the multi-channel PSD estimates for the numerator and denominator of the spectral ratio, respectively (or the first and second PSD estimates when a Komogorov-Smirnov statistic is used).

The geographical shapes of the numerator_netcdf_variable and denominator_netcdf_variable must agree.

In addition, the multi-channel PSD estimates stored in numerator_netcdf_variable and denominator_netcdf_variable must have been normalized in the same way and not been smoothed in the frequency domain (for details, see the arguments -nsm= and -nonormpsd in the NCSTAT procedures comp_cross_spectrum_3d or comp_cross_spectrum2_3d in order to verify the conditions of use of the statistical tests available in comp_spectrum_diff_3d.

Finally, if -a=chi2_perm or -a=ks_perm, the two estimated multi-channel power spectra must have been obtained by averaging exactly the same number of (modified) periodograms.

The optional argument -m=input_mesh_mask_netcdf_file specifies the land-sea mask to apply to the numerator_netcdf_variable and denominator_netcdf_variable for transforming these tridimensional NetCDF variables as rectangular matrices before computing the spectral ratio of the multi-channel PSD estimates or the differences between their cumulative periodograms.

By default, it is assumed that each cell in the 2-D grid-mesh associated with the input tridimensional NetCDF variables is a valid power spectrum (e.g., missing values are not present in the PSD estimates).

The geographical shapes of the numerator_netcdf_variable (in the numerator_netcdf_file), the denominator_netcdf_variable (in the denominator_netcdf_file) and the mask (in the input_mesh_mask_netcdf_file) must agree if an input_mesh_mask_netcdf_file is used.

Refer to comp_clim_3d or comp_mask_3d for creating a valid input_mesh_mask_netcdf_file NetCDF file for regular or gaussian grids before using comp_spectrum_diff_3d.

If -g= is set to t, u, v, w or f it is assumed that the input NetCDF variables are from an experiment with the NEMO model (ORCA configuration and R2, R4 or R05 resolutions). This argument is also used to determine the name of the mesh_mask variable if an input_mesh_mask_netcdf_file is used.

If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing the whole geographical domain associated with the the numerator_netcdf_variable and denominator_netcdf_variable is used.

The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

Refer to comp_mask_3d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_spectrum_diff_3d.

If the -np=numerator_period argument is specified, the multi-channel PSD estimates in the numerator_netcdf_variable have been computed for different periods separately (as determined by the value of the -p=period_length argument in comp_cross_spectrum_3d). In that case, the -np= argument selects the period in the numerator_netcdf_variable which must be used for computing the spectral ratio or the difference between the cumulative periodograms of the two multi-channel PSD estimates.

If the -dp=denominator_period argument is specified, the multi-channel PSD estimates in the denominator_netcdf_variable have been computed for different periods separately (as determined by the value of the -p=period_length argument in comp_cross_spectrum_3d. In that case, the -dp= argument selects the period in the denominator_netcdf_variable which must be used for computing the spectral ratio or the difference between the cumulative periodograms of the two multi-channel PSD estimates.

If the -t=freq1,freq2 argument is missing, the spectral ratios or differences between the cumulative periodograms of the two multi-channel PSD estimates are computed for the whole set of frequencies stored in the numerator_netcdf_variable and denominator_netcdf_variable (e.g., the multi-channel power spectra must have been estimated for the same set of frequencies).

The selected frequency interval is a vector of two integers specifying the first and last frequencies for which the spectral ratios or differences between the cumulative periodograms must be computed. The indices are relative to 1.

However, in all cases, the zero and Nyquist frequencies are excluded from the computations of the statistics.

The -a= argument selects the type of test statistic and/or the method for computing critical probabilities associated with the test statistic:

If -a=minmax, the minimum and maximum values of the spectral ratios of the two estimated multi-channel spectra are used as a test statistic. The critical probabilities are derived from the associated probabilities of obtaining, respectively, a value less (for the minimum ratio) and higher (for the maximum ratio) than attained under the null hypothesis of a common underlying population spectrum for the two multi-channel time series. The computed critical probabilities are only approximate and are conservative. The null hypothesis is that the two “true” underlying multi-channel spectra are the same. For more details, see Chapter 4 of [Diggle] .

If -a=range, the range of the Log spectral ratios of the two estimated multi-channel spectra is used as a test statistic. The null hypothesis is that of a common shape of the two multi-channel power spectra of the two multi-channel time series. For more details and theory, see [Coates_Diggle] [Potscher_Reschenhofera] [Potscher_Reschenhoferb] .

If -a=chi2, a standard CHI2 statistic computed from the Log spectral ratios of the two multi-channel power spectra is used as a test statistic. The null hypothesis is that of a common spectrum of the two multi-channel series and asymptotic theory is used to derive the associated critical probabilities. For more details and theory, see [Priestley] [Jenkins_Watts] .

If -a=chi2_perm, a CHI2 statistic computed from the Log spectral ratios is also used, but the distribution of this statistic under the null hypothesis of a common power spectrum for the two multi-channel series is approximated by recalculating the CHI2 statistic for some large number (e.g., see the desciption of the -nb= argument below) of random interchanges of periodogram or PSD ordinates at each frequency for the two estimated multi-channel power spectra. This statistical randomization test relies on the assumptions that the different spectral ordinates have the same sampling distribution and are independent of each other [Priestley] . This means, in particular, that the PSD ordinates corresponding to the zero and Nyquist frequencies must be excluded and that the two estimated multi-channel power spectra have not been obtained by smoothing the periodograms in the frequency domain. For more details and theory, see [Priestley] [Diggle_Fisher] .

If -a=ks_perm, a Kolmogorov-Smirnov statistic is used, but the distribution of this statistic under the null hypothesis of a common shape for the two “true” power spectra of the two multi-channel series is approximated by calculating the Kolmogorov-Smirnov statistic for some large number (e.g., as determined by the -nb= argument) of random interchanges of periodogram or PSD ordinates at each frequency for the two estimated multi-channel spectra. For more details and theory, see [Priestley] [Diggle_Fisher] .

If -a=chi2_perm or ks_perm, the number of frequencies used (e.g., as determined by the -t= argument)

must be greater or equal to 10.

The default is -a=chi2.

The -nb=number_of_shuffles argument specifies the number of random interchanges of periodogram or PSD ordinates at each frequency in order to approximate the randomization distribution of the Kolmogorov-Smirnov or CHI2 statistics under the null hypothesis if -a=chi2_perm or ks_perm.

The default value is 999.

The -ksnonorm argument specifies that the Kolmogorov-Smirnov statistics must be computed from non-normalized cumulative periodograms if -a=ks_perm. In that case the null hypothesis which is tested is that the “true” underlying population power spectra of the two multi-channel time series are the same instead of having the same shape.

By default, the Kolmogorov-Smirnov statistics are calculated from normalized cumulative periodograms and the null hypothesis which is tested is that of a common shape for the two “true” underlying multi-channel spectra (see the description of the -a= argument for more details).

The -mi=missing_value argument specifies the missing value indicator for the output NetCDF variables in the output_netcdf_file.

If the -mi= argument is not specified, the missing_value in the output NetCDF variables is set to 1.e+20.

The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file.

By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g., -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher.

If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g., -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher.

If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

For more details on power spectrum analysis, theory for the statistical tests used in comp_spectrum_diff_3d and examples of use in the climate literature, see

“The use of Fast Fourier Transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms”, by P.D. Welch, IEEE trans. on audio and electroacoustics, 15:2, 70-73, 1967. DOI: https://doi.org/10.1109/TAU.1967.1161901

“Time series: a biostatistical introduction”, by P.J. Diggle, Clarendon Press, Oxford, Chap. 4, 1990. http://www.oupcanada.com/catalog/9780198522263.html

“Tests for comparing two estimated spectral densities”, by D.S. Coates and P.J. Diggle, Journal of Time series Analysis, 7:1, 7-20, 1986. DOI: https://doi.org/10.1111/j.1467-9892.1986.tb00482.x

“Nonparametric comparison of cumulative periodograms”, by P.J. Diggle and N.I. Fisher, Applied Statistics, 40:3, 423-434, 1991. DOI: https://doi.org/10.2307/2347522

“Spectral Analysis and its Applications”, by G.M.Jenkins and D.G., San Francisco: Holden-Day. ISBN-10: 0816244642, 1968.

“Spectral Analysis and Time Series”, by M.B. Priestley, London: Academic Press, 1981. ISBN-10: 0125649223

“Discriminating between two spectral densities in case of replicated observations”, by B.M. Potscher and E. Reschenhofer, Journal of Time series Analysis, 9:3, 221-224, 1988. DOI: https://doi.org/10.1111/j.1467-9892.1988.tb00466.x

“Distribution of the Coates-Diggle test statistic in case of replicated observations”, by B.M. Potscher and E. Reschenhofer, Statistics, 20:3, 417-421, 1989. DOI: https://doi.org/10.1080/02331888908802190

“Impact of intra-daily SST variability on ENSO characteristics in a coupled model” by S. Masson et al., 2012, Climate Dynamics, Vol. 39:681-707, doi: 10.1007/s00382-011-1247-2

“The role of the intra-daily SST variability in the Indian monsoon variability in a global coupled model” by P. Terray et al., 2012, Climate Dynamics, Vol. 39:729-754, doi: 10.1007/s00382-011-1240-9

“Southern Hemisphere extra-tropical forcing: A new paradigm for El Nino-Southern Oscillation” by P. Terray, 2011, Climate Dynamics, Vol. 36:2171-2199, doi: 10.1007/s00382-010-0825-z

Outputs¶

comp_spectrum_diff_3d creates an output NetCDF file that contains the selected test statistics for the two input multi-channel power spectra and the associated critical probabilities under the assumption that the two “true” underlying multi-channel power spectra are the same or have the same shape depending on the selected test statistic (see the description of the -a= argument above for more details on the available test statistics and the null hypothesis used to derive the critical probabilities for each type_of_statistic.

The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat and nlon are the lengths of the spatial dimensions of the two input NetCDF variables numerator_netcdf_variable and denominator_netcdf_variable, and nfreq is the number of frequencies at which the two input multi-channel power spectra have been estimated) :

frequency(nfreq) : the frequencies at which the input PSD estimates have been computed.

periodicity(nfreq) : the corresponding periods in number of time observations.

numerator_netcdf_variable_denominator_netcdf_variable_stat(nlat,nlon) : the multi-channel test statistics computed from the two input multi-channel power spectra stored in the two input NetCDF variables numerator_netcdf_variable and denominator_netcdf_variable. The type of the computed statistic, as determined by the -a= argument, is stored as a a global attribute in the output NetCDF dataset.

numerator_netcdf_variable_denominator_netcdf_variable_prob(nlat,nlon) : the multi-channel critical probabilities associated with the computed multi-channel test statistics. These critical probabilities are computed or estimated (depending on the type of statistic) under the null hypothesis that the two “true” multi-channel power spectra are identical or have the same shape.

If the -a=minmax argument is used, these critical probabilities are conservative and are simply computed as two times the minimum of the critical probabilities associated with the minimum and maximum values of the spectral ratios computed from the two input multi-channel power spectra. The minimum and maximum values of the spectral ratios and their associated critical probabilities are also stored in the output NetCDF dataset.

numerator_netcdf_variable_denominator_netcdf_variable_minratio _stat(nlat,nlon) : the multi-channel minimum values of the spectral ratios computed from the two input multi-channel power spectra stored in the two input NetCDF variables numerator_netcdf_variable and denominator_netcdf_variable.

This variable is stored only if the -a=minmax argument has been specified when calling comp_spectrum_diff_3d.

numerator_netcdf_variable_denominator_netcdf_variable_minratio_prob(nlat,nlon) : the multi-channel critical probabilities associated with the minimum values of the multi-channel spectral ratios. These critical probabilities give the probabilities of obtaining minimum values less than attained under the null hypothesis of a common power spectra for the two (multi-channel) time series.

This variable is stored only if the -a=minmax argument has been specified when calling comp_spectrum_diff_3d.

numerator_netcdf_variable_denominator_netcdf_variable_maxratio _stat(nlat,nlon) : the multi-channel maximum values of the spectral ratios computed from the two input multi-channel power spectra stored in the two input NetCDF variables numerator_netcdf_variable and denominator_netcdf_variable.

This variable is stored only if the -a=minmax argument has been specified when calling comp_spectrum_diff_3d and is used to derive the conservative critical probabilities stored in the NetCDF variable numerator_netcdf_variable_denominator_netcdf_variable_prob .

numerator_netcdf_variable_denominator_netcdf_variable_maxratio_prob(nlat,nlon) : the multi-channel critical probabilities associated with the maximum values of the multi-channel spectral ratios. These critical probabilities give the probabilities of obtaining maximum values higher than attained under the null hypothesis of a common power spectra for the two (multi-channel) time series.

This variable is stored only if the -a=minmax argument has been specified when calling comp_spectrum_diff_3d and is used to derive the conservative critical probabilities stored in the NetCDF variable numerator_netcdf_variable_denominator_netcdf_variable_prob .

All the multi-channel spectral statistics and their associated critical probabilities involving the two input NetCDF variables numerator_netcdf_variable and denominator_netcdf_variable are packed in tridimensional variables whose first and second dimensions are exactly the same as those associated with these input NetCDF variables even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, the output NetCDF variables are filled with missing values.

Examples¶

For testing that the two multi-channel power spectra stored in a NetCDF variable called sst extracted, respectively, from the files spectrum_sst.monthly.1979_2005.nc and spectrum_sst.monthly.1950_1978.nc are the same (more precisely that the two “true” underlying power spectra are identical) using a standard CHI2 statistic, use the following command :
$ comp_spectrum_diff_3d  -fn=spectrum_sst.monthly.1979_2005.nc \
                         -fd=spectrum_sst.monthly.1950_1978.nc \
                         -nv=sst \
                         -dv=sst \
                         -o=spectrum_diff_sst_1979_2005.1950_1978.nc

Table of Contents

Previous topic

Next topic

This Page