comp_spectvar_4d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

28/05/2024

Purpose

Compute variance estimates in a selected frequency band for a real multi-channel time series by windowed filtering in the frequency domain. These variance estimates correspond exactly to the variance of the filtered multi-channel time series output by comp_fftfilter_4d when similar parameters are used in both procedures. Alternatively, comp_spectvar_4d can also compute spectral variance estimates in a given frequency band from a spectrum analysis of a real multi-channel time series as output by comp_spectrum_1d or comp_cross_spectrum_3d. The input multi-channel time series is extracted from a fourdimensional variable readed from a NetCDF dataset.

At the user option, the multi-channel time series can be detrended before computing the power spectra and the variance estimates in the selected frequency band (see the -tr= and -tr2= arguments). The variance estimates and power spectra are computed with the help of the Fast Fourier Transform (FFT) of the real multi-channel time series and windowed filtering in the frequency domain.

The user can select different data windows to apply to the time series for the computations of the Power Spectral Density (PSD) estimates (see the -win2= argument) and also different frequency windows to convolve with the selected filter response in the frequency domain (see the -win= argument) in order to get more accurate variance estimates.

Furthermore, in order to obtain stable PSD and variance estimates, the real time series can be segmented and PSD estimates computed on, eventually overlapping, segments can be averaged at the user option (see the -sl= and -nooverlap= arguments). The stability of the PSD and final variance estimates is increased by this averaging process. That is, the greater the number of segments, the more stable, the resulting PSD and final variance estimates. See the documentation of comp_spectrum_1d or comp_cross_spectrum_3d for more details on this method known as the Welch’s method [Welch] .

Optionally, theses PSD estimates may also been smoothed in the frequency domain by Daniell weights (e.g., a simple moving average; see the -nsm= argument) as in a conventional spectrum analysis.

By default, the variance estimates are returned in units, which are the square of the data, and, at the user option, as percentage of the total variance in the selected frequency band if the -normvar argument is specified in the call to the procedure.

Finally, the variances estimates can be computed separately for different times intervals (see the -p= argument).

The variance estimates are stored in an output NetCDF dataset.

For more information on spectral analysis and filtering in the frequency domain, see the references below [Welch] [Bloomfield] [Diggle] [Iacobucci_Noullez] .

Further Details

Usage

$ comp_spectvar_4d \
       -f=input_netcdf_file \
       -v=netcdf_variable \
       -m=input_mesh_mask_netcdf_file           (optional) \
       -g=grid_type                             (optional : n, t, u, v, w, f) \
       -x=lon1,lon2                             (optional) \
       -y=lat1,lat2                             (optional) \
       -z=level1,level2                         (optional) \
       -t=time1,time2                           (optional) \
       -o=output_netcdf_file                    (optional) \
       -p=period_length                         (optional) \
       -sl=segment_length                       (optional : 16,32,64,128,...) \
       -pl=minimum_period                       (optional) \
       -ph=maximum_period                       (optional) \
       -tr=trend_removal                        (optional : 0,1,2,3) \
       -tr2=trend_removal_on_segment            (optional : 0,1,2,3) \
       -win=window_parameter                    (optional : 0.5 > 1.) \
       -win2=window_choice                      (optional : 1,2,3,4,5,6) \
       -tap=tapered_percentage                  (optional : 0.0 > 1.) \
       -nsm=Daniell_filter_length               (optional) \
       -ngp=number_of_grid_points               (optional) \
       -mi=missing_value                        (optional) \
       -nooverlap                               (optional) \
       -normvar                                 (optional) \
       -double                                  (optional) \
       -bigfile                                 (optional) \
       -hdf5                                    (optional) \
       -tlimited                                (optional)

By default

-m=
an input_mesh_mask_netcdf_file is not used
-g=
the grid_type is set to n which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian
-x=
the whole longitude domain associated with the netcdf_variable
-y=
the whole latitude domain associated with the netcdf_variable
-z=
the whole vertical resolution associated with the netcdf_variable
-t=
the whole time period associated with the netcdf_variable
-o=
the output_netcdf_file is named spectvar_netcdf_variable.nc
-p=
the period_length is set to time2 - time1 + 1 , which means that the time series is considered as continuous with only one time period
-sl=
the segment_length is set to 0, which means that the time series (or each time period if the -p= argument is used) is not segmented to stabilize the the PSD and final variance estimates
-pl=
the minimum_period is set to 0, which means that no filtering is done for the shorter periods
-ph=
the maximum_period is set to 0, which means that no filtering is done for the longer periods
-tr=
the trend_removal is set to 1, which means that the mean of the whole time series is removed before the spectral computations
-tr2=
the trend_removal_on_segment is set to 0, which means that no trend processing is done on each time segment before the spectral computations
-win=
the window_parameter is set to 1., which means that the rectangular window is convolved with the ideal filter response in the frequency domain in order to compute the final variance estimates
-win2=
the window_choice is set to 2, which means that the square window is applied to the time series and used in the spectral computations
-tap=
the tapered_percentage is set to 0.2, which means that if a split-cosine-bell window is used for the spectral computations (e.g., if the window_choice is set to 6 with the -win2= argument), 20% of the data are tapered
-nsm=
the Daniell_filter_length is set to 0, which means that the PSD estimates are not smoothed in the frequency domain by a moving average before computing the variance estimates in the selected frequency band
-ngp=
the number_of_grid_points is set to the number of points in the selected domain
-mi=
the missing_value for the output NetCDF variable is set to 1.e+20
-nooverlap
normally, if the -sl= argument is used, the time segments are overlapped. However, if -nooverlap is specified, the time segments are not overlapped
-normvar
normally, the raw variance in the selected frequency band is output. However, if -normvar is specified, the percentage of total variance in the selected frequency band for the real multi-channel time series is also computed and stored in the output NetCDF file.
-double
the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable for which a frequency variance analysis must be performed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.

  2. The geographical shapes of the netcdf_variable (in the input_netcdf_file) and the NetCDF mesh_mask variable (in the input_mesh_mask_netcdf_file) must agree if the -m= argument is used.

  3. If -g= is set to t, u, v, w or f it is assumed that the NetCDF variable is from an experiment with the NEMO model (ORCA configuration and R2, R4 or R05 resolutions).

    If -g= is set to n, it is assumed that the 3-D grid-mesh is regular or Gaussian.

    This argument is also used to determine the name of the NetCDF mesh_mask variable if an input_mesh_mask_netcdf_file is used as specified with the -m= argument

  4. If the -x=lon1,lon2, -y=lat1,lat2 and -z=level1,level2 arguments are missing, the whole geographical domain and vertical resolution associated with the netcdf_variable is used to select the multi-channel time series.

    The longitude, latitude or level range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_4d for transforming geographical coordinates as indices before using comp_spectvar_4d.

  5. If the -t=time1,time2 argument is missing the whole time period associated with the netcdf_variable is used to decompose the time series.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

    time2 - time1 + 1 must be an even integer if the -sl= and -p= arguments are not used and of any length if one of these arguments is used.

  6. If the -p= argument is specified, the frequency variance estimates are computed separately for each time period of length period_length (as determined by the value of the -p= argument).

    The whole selected time period (e.g., time2 - time1 + 1 ) must be a multiple of the period_length. The period_length must also be of even length if the -sl= argument is not used.

  7. The -sl=segment_length argument specifies an integer used to segment the time series. The segment_length must be a positive even integer less or equal to the length of the time series (e.g., time2 - time1 + 1 ) or the period_length if the -p= argument is used.

    Spectral computations are at ( segment_length/2 ) + 1 Fourier frequencies and suggested values for segment_length are 16, 32, 64, 128, … (e.g., the spectral estimates are taken at frequencies (i-1) / segment_length for i=1,2, … , (segment_length/2) + 1.

    By default, segment_length is set to 0, e.g., the time series is not segmented and the segment length is equal to time2 - time1 + 1 or period_length if the -p= argument is used.

  8. The -pl= argument specifies the minimum period of oscillation of the filtered time series. The minimum_period is expressed in number of time observations.

    Do not use the -pl= argument or use -pl=0 for high-pass filtering frequencies corresponding to periods shorter than -ph=PH .

    The -pl= argument is a positive integer equal to 0 or greater than 2.

  9. The -ph= argument specifies the maximum period of oscillation of the filtered time series. The maximum_period is expressed in number of time observations. Do not use the -ph= argument or use -ph=0 for low-pass filtering frequencies corresponding to periods longer than -pl=PL . For example, -pl=6 (or 18) and -ph=32 (or 96) select periods between 1.5 and 8 years for quarterly (monthly) time series.

    The -ph= argument is a positive integer equal to 0 or greater than 2 and less than the length of the time series or the period_length if the -p= argument is used.

    The -ph= argument must also be greater or equal to the -pl= argument if both are specified.

  10. Setting -pl=PL, -ph=PH and PH``<``PL is also allowed and performs band rejection of periods between PH and PL. In that case, the meaning of the -pl= and -ph= arguments reversed.

  11. Setting -pl= and -ph= to the same value P is allowed. In this case, an -ideal- band-pass filter with peak response near one at the single period P is applied to the time series to get the variance estimates at this particular fequency.

  12. Setting both -pl=0 and -ph=0 is not allowed since in that case, no frequency filtering is done.

  13. The -tr= argument specifies pre-filtering processing of the real multi-channel time series before estimating the power spectrum of the time series. If:

    • -tr=+1, the mean of the time series is removed before the frequency analysis
    • -tr=+2, the drift from the time series is removed before the frequency analysis. The drift for the time series is estimated using the formula : drift = ( tseries(ntime) - tseries(1) )/( ntime - 1 )
    • -tr=+3, the least-squares line from the time series is removed before the frequency analysis.

    For other values of the -tr= argument, nothing is done before estimating the spectrum.

    If the -p= argument is present, the pre-filtering processing is applied to each time period, separately.

    The -tr= argument must be an integer and the default value for the -tr= argument is 0, e.g., nothing is done before the spectrum computations.

  14. The -tr2= argument specifies pre-filtering processing of the real multi-channel time series before estimating the power spectrum on each time segment (as specified by the -sl= argument). If:

    • -tr2=+1, the mean of the time segment is removed before computing the power spectrum on the segment
    • -tr2=+2, the drift from the time segment is removed before computing the power spectrum on the segment
    • -tr2=+3, the least-squares line from the time segment is removed before computing the power spectrum on the segment.

    For other values of the -tr2= argument, nothing is done before estimating the spectrum on each time segment.

    The -tr2= argument must be an integer and the default value for the -tr2= argument is 0, e.g., nothing is done before the spectral computations on each time segment.

  15. The -win=window_parameter argument specifies the form of the window which will be convolved with the ideal filter response as in the comp_fftfilter_4d procedure.

    Set -win=0.5 for using a Hanning window, -win=0.54 for using a Hamming window or -win=1. for rectangular window.

    The -win=window_parameter argument is a real number greater or equal to O.5 and less or equal to 1..

    By default, rectangular window filtering is used (i.e., -win=1.).

  16. The -win2=window_choice argument specifies the form of the data window used in the computations of the power spectrum as the -win=window_choice argument in the comp_spectrum_1d procedure. If:

    • -win2=1 The Bartlett window is used
    • -win2=2 The square window is used
    • -win2=3 The Welch window is used
    • -win2=4 The Hann window is used
    • -win2=5 The Hamming window is used
    • -win2=6 A split-cosine-bell window is used.

    The window_choice must be an integer between 1 and 6.

    The default value for the window_choice is 2, e.g., a square window window is used for the spectral computations.

  17. The -tap=tapered_percentage argument specifies the total percentage of the data to be tapered if the window_choice is set to 6 with the -win2= argument. The tapered_percentage is a real number which must be greater than 0. and less or equal to 1..

    The default value for the tapered_percentage is 0.2 (e.g., 20% of the data are tapered, 10% at the start of the series and 10% at the end).

  18. The -nsm=Daniell_filter_length argument specifies that the PSD estimates must be computed by smoothing the periodogram with Daniell weights (e.g., a simple moving average) as in the comp_spectrum_1d procedure.

    On entry, the -nsm= argument gives the length of the Daniell filter to be applied. The Daniell_filter_length must be an odd integer, greater than 2 and less or equal to ( segment_length/2 ) + 1 if the -sl= argument is specified.

    By default, the PSD estimates are not smoothed in the frequency domain.

  19. If the -nooverlap argument is specified and the -sl= argument is also specified, comp_spectvar_4d segments the multi-channel time series without any overlapping (in each time period if the -p= argument is used).

    By default, comp_spectvar_4d overlaps the segments by one half of their length (which is equal to the value of the -sl= argument).

    In both cases, each segment will be FFT’d, and the resulting periodograms will be averaged together to obtain PSD estimates at the ( segment_length/2 ) + 1 frequencies.

    If the -sl= argument is not specified, the -nooverlap argument has no effect.

  20. The -normvar argument specifies that the percentage of variance in the selected frequency

    band must be also computed and output in addition of the raw variance estimates.

  21. The -ngp= argument can be used if you have memory problems when running comp_spectvar_4d on very large datasets. By default, the number_of_grid_points is set to the number of cells in the selected domain. In case of memory problems, you can use the -ngp= argument with a lower value. This will reduce the memory used by the operator.

  22. The -mi=missing_value argument specifies the missing value indicator for the output variables in the output_netcdf_file.

    If the -mi= argument is not specified, the missing_value is set to 1.e+20.

  23. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file.

    By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  24. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g., -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher.

    If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  25. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher.

    If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  26. It is assumed that the data has no missing values excepted those associated with a constant land-sea-depth mask.

  27. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  28. For more details on power spectrum analysis and on windowed filtering in the frequency domain, see

Outputs

comp_spectvar_4d creates an output NetCDF file that contains the variance estimates in the selected frequency band for the multi-channel time series associated with the input NetCDF variable. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlev, nlat and nlon are the lengths of the geographical dimensions of the input NetCDF variable and nperiod is 1 if the -p= argument is not used or is equal to ntime / period_length if the -p= argument is used, where ntime = time2 - time1 + 1 ) :

  1. netcdf_variable_spectvar(nperiod,nlev,nlat,nlon) : the corresponding variance estimates for each time period and each of the time series of the 3-D grid-mesh associated with the input NetCDF variable.

  2. netcdf_variable_spectvar_percent(nperiod,nlev,nlat,nlon) : the corresponding percentages of total variance for each time period and each of the time series of the 3-D grid-mesh associated with the input NetCDF variable.

    This variable is stored only if the -normvar argument has been specified when calling comp_spectvar_4d.

The variance estimates are packed in fourdimensional variables whose first, second and third dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x=, -y= and -z= arguments. However, outside the selected domain, the output NetCDF variable is filled with missing values.

Examples

  1. For estimating variance in the biennial time scale (e.g., between 18 and 30``months) for a real multi-channel monthly time series readed from a fourdimensional NetCDF variable called ``uwnd extracted from the file uwnd.monthly.mean_ncep2.nc, which includes monthly time series, and store the results in the NetCDF file var_qbo_uwnd_ncep2.nc, use the following command :

    $ comp_spectvar_4d \
      -f=uwnd.monthly.mean_ncep2.nc \
      -v=uwnd \
      -m=mesh_mask_uwnd_ncep2.nc \
      -pl=18 \
      -ph=30 \
      -o=var_qbo_uwnd_ncep2.nc
    
Flag Counter