comp_clim_3d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

12/09/2018

Purpose

Compute a climatology (e.g. means and standard-deviations) from a tridimensional variable extracted from a NetCDF dataset and, optionally, the mask and scale factors of the 2-D grid-mesh associated with the input NetCDF variable.

Mean and standard-deviation are computed for each point in the time series of the 2-D grid-mesh associated with the input NetCDF variable. These means and standard-deviations may be computed by taking into account the periodicity of the data. These statistics are stored in an output NetCDF dataset.

The mean is a simple, but informative, measure, of the central tendency of a variable [vonStorch_Zwiers]. The standard-deviation is a conventional measure of variation of a variable [vonStorch_Zwiers]. If X(:) is a vector of ntime observations for one grid-point in the time series of the 2-D grid-mesh, the mean and standard-deviation statistics for this grid-point are estimated, respectively, by

  • MEAN = sum( X(:) ) / ntime
  • STD = sqrt( sum( [X(:)-MEAN]**2 ) / ntime )

Note that the divisor used in calculating standard-deviation is the number of observations, this is in contrast with the formulae used in comp_stat_3d, which uses the number of degrees of freedom.

If your data contains missing values, use comp_clim_miss_3d instead of comp_clim_3d to estimate means and standard deviations from your gappy dataset.

If you need more univariate statistics on your input variable such as skewness, kurtosis, etc., refer to comp_stat_3d. Finally, if the NetCDF variable is fourdimensional use comp_clim_4d instead of comp_clim_3d.

This procedure is parallelized if OpenMP is used and the NCSTAT software has been built with the _PARALLEL_READ CPP macro. Moreover, this procedure computes the means and standard-deviations with only one pass through the data and an out-of-core strategy which is highly efficient on huge datasets.

Optionally, a mesh-mask NetCDF dataset may also be created. This dataset will contain a presence-absence mask and scale factor variables, which may be used to compute the surface of each cell in the 2-D grid-mesh associated with the input NetCDF variable. This mesh-mask NetCDF dataset will be used by other NCSTAT procedures such as comp_serie_3d, comp_eof_3d, etc.

Further Details

Usage

$ comp_clim_3d \
  -f=input_netcdf_file \
  -v=netcdf_variable \
  -p=periodicity                    (optional) \
  -x=lon1,lon2                      (optional) \
  -y=lat1,lat2                      (optional) \
  -t=time1,time2                    (optional) \
  -c=output_climatology_netcdf_file (optional) \
  -m=output_mesh_mask_netcdf_file   (optional) \
  -yl=latl1,latl2                   (optional) \
  -mi=missing_value                 (optional) \
  -fmsk=input_mesh_mask_netcdf_file (optional) \
  -vmsk=mesh_mask_netcdf_variable   (optional) \
  -val=mask_value                   (optional) \
  -rel=mask_relation                (optional : eq, gt, ge, lt, le) \
  -ntr=number_of_time_records       (optional) \
  -double                           (optional) \
  -bigfile                          (optional) \
  -hdf5                             (optional) \
  -tlimited                         (optional)

By default

-p=
the periodicity is equal to 1
-x=
the whole longitude domain associated with the netcdf_variable
-y=
the whole latitude domain associated with the netcdf_variable
-t=
the whole time period associated with the netcdf_variable
-c=
the output_climatology_netcdf_file is named clim_netcdf_variable.nc
-m=
the output_mesh_mask_netcdf_file is not created
-yl=
it is assumed that the domain is the whole globe when computing the scale factors
-mi=
the missing_value for the STD variable in the output NetCDF file is set to 1.e+20
-fmsk=
an input_mesh_mask_netcdf_file is not used when computing the presence-absence mask
-vmsk=
a mesh_mask_netcdf_variable is not used when computing the presence-absence mask
-val=
this argument is set to 1 when computing the presence-absence mask
-rel=
this argument is set to eq when computing the presence-absence mask
-ntr=
the number_of_time_records read in each iteration of the procedure is set to the periodicity
-double
the standard-deviations are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the standard-deviations are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable for which statistics must be computed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file input_netcdf_file.

  2. The -p=periodicity argument gives the periodicity of the input data. For example, with monthly data -p=12 should be specified, with yearly data -p=1 may be used, etc. By default, the periodicity is set to 1. Note that the output NetCDF file will have periodicity time observations.

  3. If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing, the whole geographical domain associated with the netcdf_variable is used to construct the climatology.

    The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values for lon1 are not allowed.

    Refer to comp_mask_3d for transforming geographical coordinates as indices before using comp_clim_3d.

  4. If the -t=time1,time2 argument is missing, the whole time period associated with the netcdf_variable is used to construct the climatology and to compute the statistics.

    The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

  5. It is assumed that the input data has no missing values (excepted missing values associated with a constant land-sea mask as indicated by a missing_value or _FillValue attribute).

    If it is the case, use comp_clim_miss_3d instead of comp_clim_3d.

  6. If the -m=output_mesh_mask_netcdf_file argument is present and the -yl= argument is missing, it is assumed that the whole geographical domain associated with the NetCDF variable is the earth and that the 2-D grid-mesh is regular or Gaussian when computing the scale factors.

    If the domain is not the whole globe, the -yl= argument must be specified, otherwise the first and last columns (elements) of the first two scale factors are wrong.

    The -yl= argument specifies the latitude limits of the domain in degrees (latl1 and latl2 must be real numbers).

  7. If the -m=output_mesh_mask_netcdf_file argument is present, and if the -fmsk= and -vmsk= arguments are also specified, the presence-absence mask in the output_mesh_mask_netcdf_file is computed from the input mesh_mask_netcdf_variable (as specified by the -vmsk= and -fmsk= arguments) as follows :

    • output_mask(i,j) = 1 if input_mask(i,j) .mask_relation. mask_value is true
    • output_mask(i,j) = 0 otherwise

    where mask_relation is determined from the -rel= argument and mask_value from the -val= argument (mask_value is a real number).

    By default, mask_relation is eq and mask_value is 1. . Both the -fmsk= and -vmsk= arguments must be present, otherwise the procedure will stop with an error message.

  8. If the -m=output_mesh_mask_netcdf_file argument is present and some scale factors can not be computed, these scale factors are set to 1.

  9. The -mi=missing_value argument specifies the missing value indicator for the standard-deviation (STD) variable in the output_climatology_netcdf_file. If the -mi= argument is not specified and the NetCDF variable has a missing_value or _FillValue attribute, the missing_value is set to 1.e+20. This argument is not used if the NetCDF variable specified in the -v= argument has no missing_value or _FillValue attribute.

  10. The -ntr=number_of_time_records argument specifies how many time records are read in each iteration of the loop for reading the input NetCDF variable. By default, the number_of_time_records read in each iteration of the procedure is set to the periodicity (as specified by the -p= argument). On very large dataset, it may be useful to reduce the number_of_time_records in order to decrease the memory used by the procedure.

  11. The -double argument specifies that the standard-deviation variable must be stored as double-precision floating point numbers instead of single-precision floating point numbers in the output_climatology_netcdf_file.

  12. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros (e.g. -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 or _USE_NETCDF4 CPP macros.

  13. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 CPP macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  14. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations. Moreover, the number of specified parameters must not be greater than the total number of allowed parameters.

  15. For more details on the use of univariate statistics in the climate literature, see

    • “Statistical Analysis in Climate Research”, by von Storch, H., and Zwiers, F.W., Cambridge University press, Cambridge, UK, Chapter 2, 484 pp., 2002. ISBN: 9780521012300

Outputs

comp_clim_3d creates an output NetCDF file that contains the means, standard-deviations and number of observations of the input NetCDF variable, taking into account eventually the periodicity of the data as determined by the -p=periodicity argument. The output NetCDF dataset contains the following NetCDF variables (in the description below, nlat and nlon are the length of the dimensions of the input NetCDF variable) :

  1. netcdf_variable_mean(periodicity,nlat,nlon) : the means for each point in the time series of the 2-D grid-mesh associated with the input NetCDF variable.
  2. netcdf_variable_std(periodicity,nlat,nlon) : the standard-deviations for each point in the time series of the 2-D grid-mesh associated with the input NetCDF variable.
  3. netcdf_variable_nobs(periodicity) : the number of observations used to compute the statistics.

The means and standard-deviations are packed in tridimensional variables whose first and second dimensions are exactly the same as those associated with the input NetCDF variable netcdf_variable even if you restrict the geographical domain with the -x= and -y= arguments. However, outside the selected domain, these output NetCDF variables are filled with missing values.

Optionally, comp_clim_3d can also create an output mesh-mask NetCDF file that contains the following NetCDF variables :

  1. netcdf_variable_nmask(nlat,nlon) : a presence-absence or land-sea 2-D mask associated with the input NetCDF variable.
  2. netcdf_variable_e1n(nlat,nlon) : the first scale factor associated with the 2-D grid-mesh of the input NetCDF variable.
  3. netcdf_variable_e2n(nlat,nlon) : the second scale factor associated with the 2-D grid-mesh of the input NetCDF variable.

Multiplying the two scale factors together gives the surface of each cell in the 2-D grid-mesh associated with the input NetCDF variable.

Examples

  1. For computing monthly means and standard-deviations from the NetCDF file ST7_1m_00101_20012_grid_T_sosstsst.nc, which includes a NetCDF variable sosstsst and store the results in the NetCDF file clim_ST7_1m_00101_20012_grid_T_sosstsst.nc, use the following command :

    $ comp_clim_3d \
      -f=ST7_1m_00101_20012_grid_T_sosstsst.nc \
      -v=sosstsst \
      -p=12 \
      -c=clim_ST7_1m_00101_20012_grid_T_sosstsst.nc
    
  2. For computing monthly means and standard-deviations from the NetCDF file sst.mnmean.nc, which includes a NetCDF variable sst and store the results in a NetCDF file named clim_sst.nc and, in addition, generate an associated mesh_mask_netcdf_file named mesh_mask_sst.nc, use the following command :

    $ comp_clim_3d \
      -f=sst.mnmean.nc \
      -v=sst \
      -p=12 \
      -m=mesh_mask_sst.nc