comp_cor_4d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

13/09/2018

Purpose

Compute correlation and regression coefficients between an index time series and a fourdimensional variable extracted from a NetCDF dataset and perform statistical tests on these correlation and regression coefficients.

The procedure first transforms the input fourdimensional NetCDF variable as a ntime by nv rectangular matrix of observed variables stored columnwise (e.g. the selected cells of the 3-D grid-mesh associated with the fourdimensional NetCDF variable) and then computes measures of association between each of these variables, say X, and the input index time series, say Y.

If X(:) and Y(:) are vectors of ntime observations, the (Pearson) correlation coefficient of the two variables X and Y is defined by

COR = COV( X, Y )/[ STD(X). STD(X) ]

where

  • COV( X, Y ) is the covariance between X and Y (e.g. dot_product( X(:) - MEAN(X) , Y(:) - MEAN(Y) )/ntime )
  • MEAN(X) and MEAN(Y) are the means of X and Y, respectively, as computed by comp_clim_4d.
  • STD(X) and STD(Y) are the standard deviations of X and Y, respectively, as computed by comp_clim_4d.

From this definition, it follows that the correlation is a pure number invariant under changes of scale and origin of the variables X and Y. Furthermore, it can be shown that COR cannot be less than -1 or greater than 1.

The Pearson correlation coefficient can be used to assess if the relationship between two variables is more or less linear (e.g. if the the two variables are “proportional” to each other). More precisely, abs( COR ) = 1 if and only if there exists a linear relationship between X and Y, e.g., if

X(i) = a. Y(i) + b , for i= 1 to ntime , with a> 0 if COR = 1and a< 0 if COR = -1.

If the correlation coefficient is high in absolute value, but not equal to 1 or -1, it can also be “summarized” by a straight line (sloped upwards or downwards), which is called the regression line between X and Y. This line is also a least squares line, because it can be determined such that the sum of the squared distances of all the data points from the line is the lowest possible in the scatter plot of X and Y.

The coefficients a and b of the regression line are called, respectively, the regression and intercept coefficients for predicting the dependent variable X by the independent variable Y. These coefficients are computed by comp_cor_4d (intercept coefficients are computed only if the argument -intercept is specified). Furthermore, the regression lines for predicting the index time series Y from each of the X variables from the 3-D grid-mesh associated with the fourdimensional NetCDF variable can also be estimated if the optional argument -rg=reg2 is specified.

As mentioned before, the correlation coefficient COR represents the linear relationship between two variables. If this correlation coefficient is squared, then the resulting value (the coefficient of determination) will represent the proportion of common variation (or shared variance) between the two variables (i.e., the “strength” of the relationship). More precisely, this value gives you the percentage of variance of the dependent variable X explained by the independent variable Y. In order to evaluate the correlation between variables, it is important to know this “strength” as well as the significance of the correlation which can be assess with the help of statistical tests.

If X and Y are independently distributed, their covariance, and hence their correlation, is zero, but the converse is not generally true. However, in the case of X and Y follow a bivariate normal distribution, the nullity of the correlation coefficient implies the independence of the variables X and Y.

Furthermore, in the case of a random sample of ntime observation pairs from such bivariate normal population with a zero correlation coefficient, the distribution of the variate

T = COR.sqrt( [ntime - 2] / [1 - COR.COR] )

has a Student-Fisher t distribution with ntime - 2 degrees of freedom (call it t[ntime-2] in what follows) [vonStorch_Zwiers].

Given the sample correlation COR, we can thus test the hypothesis of no correlation in the bivariate normal parent population with the help of the Student-Fisher t distribution [vonStorch_Zwiers]. Monte Carlo simulations suggest that this test remains valid if the couple X and Y does not follow a bivariate normal distribution and the number of observations is big enough (e.g. ntime > 30); but in this case it is not a test of the independence of the two variables X and Y.

If the correlation coefficient in the parent distribution is not assumed to be zero, the distribution of COR has a complicated form. In that case, for example for computing confidence intervals for the correlation coefficient or testing if the parent correlation is some number different of zero, it is better to use the monotonic transformation

Z = (1/2).log( [ 1 + COR ]/[1 - COR ] )

, which is called the Fisher z transformation and is the inverse of the hyperbolic tangent function. The Fisher z transform produces an asymptotically normal variate with variance equal to 1/(ntime - 3) [vonStorch_Zwiers].

The critical probabilities associated with the correlation coefficients are estimated by

PROB = P( abs( t[ ntime-2 ] )>abs( T ) )

if the argument -a=student is specified when calling the procedure (this is the default value for this argument).

If your sample of ntime observation pairs cannot be assumed to be normal, PROB can also be estimated by permutation methods (e.g. by using the argument -a=permute when calling comp_cor_4d). More precisely, in the case of ntime independent and identically distributed observations from a bivariate population of unknown form, we may consider the permutation of Y coordinates to test for the hypothesis of no correlation in the parent population. Let S( X, Y ) be the set of points obtained by permuting the coordinates of Y(:) in all ntime! possible ways. Then, there is no correlation between X and Y in each element of S( X , Y ) since any permutation uniformly makes sets of uncorrelated data. Hence, by randomly permuting the order of the elements of the Y(:) vector and recomputing the correlation coefficient between X(:) and this permuted vector many times (as determined by the -nb=number_of_shuffles argument), we can estimate the permutation distribution of abs( COR ), conditionally on the Y(:) vector, and compute critical probabilities PROB useful for testing the hypothesis of no correlation in the parent population (e.g. by counting the number of times the permuted values of abs( COR ) exceed the magnitude of the original correlation in the observed sample).

If your sample of ntime observation pairs cannot be assumed to be a random bivariate sample and the observations are auto-correlated in time, bootstrap procedures both in the time or frequency domains are available for estimating the critical probabilities, PROB, associated with the correlations coefficients [Davison_Hinkley] .

If the argument -a=bootstrap a blockwise bootstrap procedure in the time domain is used to estimate the critical probabilities, PROB. This is useful when the observations are serially correlated. In this algorithm, we consider the population of subsamples or overlapping blocks of length bootstrap_block_length formed from the time observations in the Y(:) vector. These overlapping blocks form a finite set defined by

BLK(i) = Y( i + 1 : i + bootstrap_block_length ) for i = 0 to ntime - bootstrap_block_length

Blockwise bootstrap is then realized by resampling randomly the blocks BLK(i) and gluing them together to form a kind of surrogate time series of length ntime. Finally, the correlation coefficient between X(:) and this surrogate time series is computed. This procedure is iterated many times, as determined by the -nb=number_of_shuffles argument, to estimate the bootstrap distribution of abs( COR ) and compute critical probabilities PROB useful for testing the hypothesis of no correlation in the parent population (e.g. by counting the number of times the bootstrap values of abs( COR ) exceed the magnitude of the original correlation in the observed sample). The -bp=, -bs= and -bl= arguments allow the user to determine the exact form of the blockwise bootstrap algorithm. The -bp= argument is particularly useful if your time series are cyclostationary since it forces all the blocks to start at specific observations which are the same day, month or season. The -bl= argument allows the user to choose the size of the blocks. See the remarks below for more details.

If the argument -a= is set to theiler or scramble, a frequency bootstrap procedure is used to generate independent surrogate time series with the same spectral characteristics as the original time series. The basic idea of these methods is to create simulated series by manipulating the Discrete Fourier Transform (DFT) of a given time series instead of resampling randomly blocks of this series in the time domain. There are many different ways for how to do these manipulations in the literature. Two of these methods are currently implemented in comp_cor_3d, the Theiler method [Theiler_etal] [Ebisuzaki] and the Davison and Hinkley method [Davison_Hinkley] [Braun_Kulperger] . Both methods assume that the input time series are stationary (e.g. they do not contain pure harmonic components such as a seasonal cycle or a well-defined trend).

The Theiler method (used when the argument -a= is set to theiler when calling comp_cor_4d) consists of randomly shifting the phases in the DFT of the Y(:) vector and back-transforms it to obtain a bootstrap sample in the time domain. As an illustration, assuming that ntime is odd and the DFT of Y(:) is the complex vector Z(:), then define the complex vector O(:) of ntime elements as

  • O( 1 ) = 1
  • O( k ) = exp( i.2.pi.U(k) ) for k = 2 to [ ntime + 1 ] / 2
  • O( k ) = - O( ntime - k ) for k = [ ntime + 1 ] / 2 + 1 to ntime

, where i = sqrt( -1 ) and the U(k) are a random sample drawn from an uniform distribution on [0, 1]. Then, the Theiler surrogate series is obtained by multiplying element by element the complex vectors Z(:) and O(:) and, finally, taking the inverse DFT of this new complex vector to obtain a bootstrap sample in the time domain. By construction, this simulated series is real-valued, independent of the vector X(:), but its sample mean and periodogram are identical to those of the vector Y(:).

The Davison and Hinkley approach (used when the argument -a= is set to scramble when calling comp_cor_4d) amounts also to a randomization of the phases of the Fourier coefficients of the DFT of Y(:), but also includes an additional step that modifies the amplitudes of these Fourier coefficients. Let, again, the vector U(:) be a random sample of ntime observations drawn from an uniform distribution on [0, 1], and, define, the complex vector O(:) as

  • O( 1 ) = 0
  • O( k ) = Z( k ) . exp( i.2.pi.U( k ) ) for k = 2 to ntime

and form the complex vector A(:) as

  • A( 1 ) = Z( 1 )
  • A( k ) = sqrt( 0.5 ) . [ O( k ) + conjg( O( ntime - k ) ) ] for k = 2 to ntime

Then, the Davison and Hinkley surrogate series is obtained by taking the inverse DFT of the complex vector A(:). Again, by construction, this simulated series is real-valued, independent of the vector X(:), with a sample mean identical to that of the vector Y(:), but, now its periodogram will be different from that of the vector Y(:). Note, however, that the mean spectrum obtained by averaging the periodograms of many Davison and Hinkley surrogate series will tend on average to the periodogram of the vector Y(:).

Finally, Davison and Hinkley [Davison_Hinkley] also discuss how to generate surrogate datasets with non-Gaussian distributions. Their approach can be used both in the context of the Theiler and, Davison and Hinkley methods, and is used to derived critical probabilities if the argument -a= is set to theiler2 or scramble2. These variations of the original methods are useful when the observations are serially correlated and have also an asymmetric marginal distribution.

For more details on both the Theiler or Davison and Hinkley surrogate methods, consult the references cited below.

By default, comp_cor_4d computes the sample correlation and regression coefficients, the associated critical probabilities for testing the nullity of the correlation coefficients and the z transforms of the correlation coefficients between the index time series and each point in the time series of the 3-D grid-mesh associated with the input fourdimensional NetCDF variable. The intercept coefficients of the regression lines between X and Y are also computed if the optional argument -intercept is specified when calling comp_cor_4d.

Moreover, these statistics may be computed by taking into account the periodicity of the input fourdimensional NetCDF variable if you suspect that the time series are cyclostationary (by using the -p=periodicity argument when calling the procedure). All the results are finally stored in an output NetCDF dataset, after repacking the statistics on the original 3-D grid of the input fourdimensional NetCDF variable.

Finally, if the NetCDF variable is tridimensional use comp_cor_3d instead of comp_cor_4d.

This procedure is parallelized if OpenMP is used. Moreover, this procedure computes the correlation and regression coefficients with only one pass through the data and an out-of-core strategy which is highly efficient on huge datasets.

Further Details

Usage

$ comp_cor_3d \
  -f=input_netcdf_file \
  -v=netcdf_variable \
  -vi=index_netcdf_variable \
  -fi=input_index_netcdf_file              (optional) \
  -m=input_mesh_mask_netcdf_file           (optional) \
  -g=grid_type                             (optional : n, t, u, v, w, f) \
  -x=lon1,lon2                             (optional) \
  -y=lat1,lat2                             (optional) \
  -z=level1,level2                         (optional) \
  -t=time1,time2                           (optional) \
  -p=periodicity                           (optional) \
  -a=type_of_analysis                      (optional : student, permute, bootstrap, \
                                                       theiler, theiler2, \
                                                       scramble, scramble2) \
  -rg=type_of_regression                   (optional : reg1, reg2) \
  -o=output_netcdf_file                    (optional) \
  -ti=itime1,itime2                        (optional) \
  -pi=iperiodicity,istep                   (optional) \
  -ni=index_for_2d_index_netcdf_variable   (optional) \
  -nb=number_of_shuffles                   (optional) \
  -bp=bootstrap_periodicity                (optional) \
  -bs=bootstrap_season                     (optional) \
  -bl=bootstrap_block_length               (optional) \
  -sm=smoothing_factor                     (optional) \
  -mi=missing_value                        (optional) \
  -regstd                                  (optional) \
  -intercept                               (optional) \
  -double                                  (optional) \
  -bigfile                                 (optional) \
  -hdf5                                    (optional) \
  -tlimited                                (optional)

By default

-fi=
the same as the -f= argument
-m=
an input_mesh_mask_netcdf_file is not used
-g=
the grid_type is set to n which means that the 3-D grid-mesh associated with the input netcdf_variable is assumed to be regular or Gaussian
-x=
the whole longitude domain associated with the netcdf_variable
-y=
the whole latitude domain associated with the netcdf_variable
-z=
the whole vertical resolution associated with the netcdf_variable
-t=
the whole time period associated with the netcdf_variable
-p=
the periodicity is set to 1.
-ti=
the whole time period associated with the index_netcdf_variable
-pi=
this parameter is not used
-ni=
if the index_netcdf_variable is bidimensional, the first time series is used
-a=
type_of_analysis is set to student
-rg=
type_of_regression is set to reg1
-nb=
number_of_shuffles is set to 99
-bp=
this parameter is set 1.
-bs=
this parameter is not used
-bl=
the bootstrap_block_length is set 1.
-sm=
no smoothing is applied to the index_netcdf_variable
-mi=
the missing_value is set to 1.e+20 in the output_netcdf_file
-o=
output_netcdf_file name is set to cor_netcdf_variable.index_netcdf_variable.nc
-regstd
the regression coefficients are computed in units of the input NetCDF variables. If -regstd is activated, the regression coefficients are computed in units of the netcdf_variable by standard-deviation of the index_netcdf_variable
-intercept
the intercept coefficients of the regressions are not computed. If -intercept is activated, the intercept coefficients of the regressions are computed and stored in the output_netcdf_file
-double
the results are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the results are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable for which a correlation analysis must be computed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.

  2. The optional argument -m=input_mesh_mask_netcdf_file specifies the land-sea mask to apply to netcdf_variable for transforming this fourdimensional NetCDF variable as a rectangular matrix of observed variables before computing the correlation analysis. By default, it is assumed that each cell in the 3-D grid-mesh associated with the input fourdimensional NetCDF variable is a valid time series (e.g. missing values are not present).

    The geographical shapes and vertical resolution of the netcdf_variable (in the input_netcdf_file) and the mask (in the input_mesh_mask_netcdf_file) must agree if an input_mesh_mask_netcdf_file is used.

    Refer to comp_clim_4d or comp_mask_4d for creating a valid input_mesh_mask_netcdf_file NetCDF file for regular or gaussian grids before using comp_cor_4d.

  3. If -g= is set to t, u, v, w or f it is assumed that the input NetCDF variable is from an experiment with the ORCA model (R2, R4 or R05 resolutions). This argument is also used to determined the name of the mesh_mask_variable if an input_mesh_mask_netcdf_file is used.

  4. If the -x=lon1,lon2, -y=lat1,lat2 and -z=level1,level2 arguments are missing the whole geographical domain and vertical resolution associated with the netcdf_variable is used.

    The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_4d for transforming geographical coordinates as indices or generating an appropriate mesh-mask before using comp_cor_4d.

  5. If the -t=time1,time2 argument is missing, data in the whole time period associated with the netcdf_variable is taken into account. The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

    The selected time period (e.g. time2 - time1 + 1) must be a whole multiple of the periodicity if the -p= argument is specified.

  6. The -p=periodicity argument gives the periodicity of the input data for the netcdf_variable. For example, with monthly data -p=12 should be specified, with yearly data -p=1 may be used, etc.

    Note that the output NetCDF file will have periodicity time observations.

  7. The -vi=index_netcdf_variable specifies a time series for the correlation analysis. If the -vi=index_netcdf_variable is present, the -fi= argument must also be present and this argument specifies the NetCDF dataset which contains the index_netcdf_variable. However, if the NetCDF dataset which contains the index_netcdf_variable is the same as the NetCDF dataset specified by the -f= argument, it is not necessary to specify the -fi= argument.

  8. The -ni= argument specifies the index (e.g. an integer) for selecting the time series if the index_netcdf_variable specified in the -vi= argument is a 2D NetCDF variable. By default, the first time series is used, which is equivalent to set index_for_2d_netcdf_variable to 1.

  9. If the -ti=itime1,itime2 argument is missing, data in the whole time period associated with the index_netcdf_variable is taken into account. The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

  10. The -pi= argument gives the periodicity and select the time step for the index_netcdf_variable. For example, to compute correlations with the January monthly time series extracted from the index_netcdf_variable which is assumed to be sampled every month, -pi=12,1 should be specified, with yearly data -pi=1,1 may be used, etc.

  11. The selected time periods for the netcdf_variable and index_netcdf_variable must agree. This means that the following equality must be verified

    (time2 - time1 + 1)/periodicity = ceiling((itime2 - itime1 - istep + 2)/iperiodicity),

    otherwise, an error message will be issued and the program will stop.

  12. The -a= argument selects the method for computing critical probabilities associated with the correlation coefficients:

    • If -a=student, a classical Student-Fisher t test is used.
    • If -a=permute, a permutation test is used.
    • If -a=bootstrap, a moving block bootstrap test is used.
    • If -a=theiler, a phase-scrambled bootstrap (Theiler method) test is used.
    • If -a=theiler2, a phase-scrambled bootstrap (Theiler method) test is used, but when phase-scrambling the index time series exact empirical margins are used instead of normal margins as in the theiler option.
    • If -a=scramble, a phase-scrambled bootstrap (Davison-Hinkley method) test is used.
    • If -a=scramble2, a phase-scrambled bootstrap (Davison-Hinkley method) test is used, but when phase-scrambling the index time series exact empirical margins are used instead of normal margins as in the scramble option.
  13. The -nb=number_of_shuffles argument specifies the number of shuffles for the phase-scrambled, bootstrap or permutation tests if -a=permute, bootstrap, theiler, theiler2, scramble or scramble2.

  14. The -bp=bootstrap_periodicity argument specifies that the index, i, of the first observation of each selected block in the moving block bootstrap algorithm verifies the condition i = 1 + bootstrap_periodicity.j where j is a random positive integer. bootstrap_periodicity must be greater than zero and less than the length of the time series. By default, bootstrap_periodicity is set to 1.

  15. The -bs=bootstrap_season argument specifies that the input time series is a repetition of the same season for different years and bootstrap_season specifies the length of the season. bootstrap_season must be greater than zero and the length of the time series must be a multiple of bootstrap_season. If the optional argument bootstrap_periodicity is used, bootstrap_season must also be greater or equal to bootstrap_periodicity. By default, bootstrap_season is set to the length of the time series.

  16. The -bl=bootstrap_block_length argument specifies the size of the blocks in the moving block bootstrap algorithm. bootstrap_block_length must be greater than zero and less than the length of the time series. If the optional argument bootstrap_periodicity is used, bootstrap_block_length must also be greater or equal to bootstrap_periodicity. Moreover, if the optional argument bootstrap_season is used, bootstrap_block_length must also be less than bootstrap_season. By default, bootstrap_block_length is set to 1 or to bootstrap_periodicity if this optional argument is used.

  17. The -rg= argument selects the method for computing the regression coefficients:

    • If -rg=reg1, the coefficients of the regression equation for predicting the netcdf_variable by the index_netcdf_variable are computed. This is the default.
    • If -rg=reg2, the coefficients of the regression equation for predicting the index_netcdf_variable by the netcdf_variable are computed.
  18. The -intercept argument specifies that the intercept coefficients of the regression equation must be computed and stored in the output NetCDF file. By default, the intercept coefficients are not computed.

  19. The -regstd argument specifies that the regression coefficients of the regression equation must be expressed in terms of units of the input NetCDF variable by standard-deviation of the index_netcdf_variable. By default, the regression coefficients are expressed in units of the input NetCDF variables.

  20. -sm=smoothing_factor means that the time series associated with the index_netcdf_variable (e.g. the -vi= argument) must be smoothed with a moving average of approximately 2.smoothing_factor + 1 terms before computing the correlations with the netcdf_variable (e.g. the -v= argument). smoothing_factor must be a strictly positive integer (>zero).

  21. The -mi=missing_value argument specifies the missing value indicator associated with the netcdf_variables in the output_netcdf_file. If the -mi= argument is not specified missing_value is set to 1.e+20.

  22. The -double argument specifies that the results are stored as double-precision floating point numbers in the output NetCDF file. By default, the results are stored as single-precision floating point numbers in the output NetCDF file.

  23. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g. -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  24. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  25. It is assumed that the data has no missing values.

  26. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  27. For more details on correlation and regression analysis in the climate literature, see

    • “Statistical Analysis in Climate Research”, by von Storch, H., and Zwiers, F.W., Cambridge University press, Cambridge, UK, Chapter 8, 484 pp., 2002. ISBN: 9780521012300

    For more details on frequency or time series bootstrap procedures, see

    • “Testing for nonlinearity in time series: the method of surrogate data.” by Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., and Farmer, J.D. Physica D, vol. 58, 77-94, 1992. doi: 10.1016/0167-2789(92)90102-s
    • “A method to estimate the statistical significance of a correlation when the data are serially correlated”, by Ebisuzaki, W., Journal of climate, vol. 10, 2147-2153, 1997. doi: 10.1175/1520-0442(1997)010<2147:AMTETS>2.0.CO;2
    • “Properties of a fourier bootstrap method for time series”, by Braun, W.J., and Kulperger, R.J., Communications in Statistics - Theory and Methods, vol 26, 1329-1336, 1997. doi: 10.1080/03610929708831985
    • “Bootstrap methods and their application”, by Davison, A.C., and Hinkley, D.V., Cambridge Univesity press, Cambridge, UK, 1997. doi: 10.1017/CBO9780511802843

Outputs

comp_cor_4d creates an output NetCDF file that contains the correlation and regression statistics and critical probabilities associated with these coefficients, taking into account eventually the periodicity of the data as determined by the -p=periodicity argument. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlev, nlat and nlon are the length of the vertical and spatial dimensions of the input NetCDF variable) and periodicity time observations, if -rg=reg1 :

  1. netcdf_variable_index_netcdf_variable_cor(periodicity,nlev,nlat,nlon) : the Pearson correlation coefficients between each point in the time series of the 3-D grid-mesh associated with the input NetCDF variable and the index_netcdf_variable time series.

  2. netcdf_variable_index_netcdf_variable_prob(periodicity,nlev,nlat,nlon) : the critical probabilities associated with two-sided tests of the correlation coefficients (e.g. the absolute value of the correlation is tested). These critical probabilities are computed under the null hypothesis that the corresponding correlation coefficients in the parent population are zero.

    The -a=type_of_analysis argument determines how these critical probabilities are computed.

  3. netcdf_variable_index_netcdf_variable_z(periodicity,nlev,nlat,nlon) : the Fisher z Transforms of the correlation coefficients for each point in the time series of the 3-D grid-mesh associated with the input NetCDF variable and the index_netcdf_variable time series.

  4. netcdf_variable_index_netcdf_variable_reg(periodicity,nlev,nlat,nlon) : the regression coefficients for predicting each point in the time series of the 3-D grid-mesh associated with the input NetCDF variable by the index_netcdf_variable time series.

    By default, the regression coefficients are expressed in units of the input NetCDF variable netcdf_variable by unit of the index_netcdf_variable time series. However, if the -regstd argument is specified the regression coefficients are expressed in terms of units of the input NetCDF variable netcdf_variable by standard-deviation of the index_netcdf_variable time series. Finally, if -rg=reg2 is specified the roles of the input NetCDF variables netcdf_variable and index_netcdf_variable are interchanged and the fitted regression models are for predicting the index_netcdf_variable by each time series of the 3-D grid-mesh associated with the input NetCDF variable netcdf_variable .

  5. netcdf_variable_index_netcdf_variable_int(periodicity,nlev,nlat,nlon) : the intercept coefficients in the regression models for predicting each point in the time series of the 3-D grid-mesh associated with the input NetCDF variable by the index_netcdf_variable time series.

    This variable is stored only if the -intercept argument has been specified when calling comp_cor_4d. Finally, if -rg=reg2 is specified the roles of the input NetCDF variables netcdf_variable and index_netcdf_variable are interchanged and the fitted regression models are for predicting the index_netcdf_variable by each time series of the 3-D grid-mesh associated with the input NetCDF variable netcdf_variable .

  6. netcdf_variable_index_netcdf_variable_nobs(periodicity) : the number of observations used to compute the correlation and regression coefficients.

All these statistics, excepted the netcdf_variable_index_netcdf_variable_nobs variable, are packed in fourdimensional variables whose first, second and third dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x=, -y= and -z= arguments. However, outside the selected domain, the output NetCDF variables are filled with missing values.

If -rg=reg2 , the naming convention for the variables is reversed, the index_netcdf_variable will be listed first and the netcdf_variable will appear after. For example, the name of the NetCDF variable storing the correlation coefficient will be index_netcdf_variable_netcdf_variable_cor instead of netcdf_variable_index_netcdf_variable_cor if -rg=reg2 .

Examples

  1. For computing monthly lead correlations from a fourdimensional NetCDF variable votemper in the NetCDF file ST7_1m_00101_20012_grid_T_votemper.nc and a December-January Nino34 SST index in the NetCDF file ST7_sst_nino34_dj.nc and store the results in a NetCDF file named cor_ST7_1m_votemper_nino34_dj_grid_T.nc, use the following commands (note that the critical probabilities associated with the correlations are estimated with the help of the Theiler method using 999 surrogate time series and cyclostationarity is assumed for the sosstsst variable since -p=12 is specified ) :

    $ comp_cor_4d \
      -f=ST7_1m_00101_20012_grid_T_votemper.nc \
      -v=votemper \
      -m=mesh_mask_ST7.nc \
      -p=12 \
      -fi=sst_nino34_dj.nc \
      -vi=sosstsst \
      -a=theiler \
      -nb=999 \
      -o=cor_ST7_1m_votemper_nino34_dj_grid_T.nc