comp_norm_miss_3d

Authors

Pascal Terray (LOCEAN/IPSL)

Latest revision

13/09/2018

Purpose

Select, transform and normalize time series from a tridimensional variable with missing values extracted from a NetCDF dataset.

The procedure allows a large variety of transformation on the input tridimensional NetCDF variable such as:

  1. removing and applying scale_factor and add_offset attributes if they are present
  2. changing the missing_value attribute (with the -mi= argument)
  3. applying a given mesh-mask given in input of the procedure (with the -m= argument)
  4. selecting only specific time steps in the output file (with the -t=, -l= and -p= arguments)
  5. centering or standardizing the time series associated with selected cells of the 2-D grid-mesh associated with the input tridimensional NetCDF variable (with the -a= argument) with the help of an input climatology (specified with the -c= argument)
  6. reducing the spatial dimensions of the output NetCDF dataset (with the -compact argument)

An output NetCDF file containing the transformed tridimensional variable is created.

If your data does not contain missing values use comp_norm_3d instead of comp_norm_miss_3d to transform your dataset.

Further Details

Usage

$ comp_norm_miss_3d \
  -f=input_netcdf_file \
  -v=netcdf_variable \
  -m=input_mesh_mask_netcdf_file    (optional) \
  -g=grid_type                      (optional : n, t, u, v, w, f) \
  -r=resolution                     (optional : r2, r4) \
  -b=nlon_orca, nlat_orca           (optional) \
  -x=lon1,lon2                      (optional) \
  -y=lat1,lat2                      (optional) \
  -t=time1,time2                    (optional) \
  -c=input_climatology_netcdf_file  (optional) \
  -a=type_of_transformation         (optional : scp, cov, cor, spa, tim) \
  -d=type_of_distance               (optional : dist2, ident) \
  -o=output_netcdf_file             (optional) \
  -p=periodicity                    (optional) \
  -l=selected_time_period           (optional) \
  -cv=climatology_netcdf_variable   (optional) \
  -mi=missing_value                 (optional) \
  -double                           (optional) \
  -bigfile                          (optional) \
  -hdf5                             (optional) \
  -compact                          (optional) \
  -tlimited                         (optional)

By default

-g=
the grid_type is set to n which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian
-r=
if the input netcdf_variable is from the NEMO or ORCA model (e.g. if -g= argument is not set to n) the resolution is assumed to be r2
-b=
if -g= is not set to n, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, are determined from the -r= argument. However, you may override this choice by default with the -b= argument
-x=
the whole longitude domain associated with the netcdf_variable
-y=
the whole latitude domain associated with the netcdf_variable
-t=
the whole time period associated with the netcdf_variable
-a=
the type_of_transformation is set to scp. This means that the time series in the 2-D grid-mesh associated with the input netcdf_variable are written as raw data without any centering or standardization
-d=
the type_of_distance is set to dist2.
-o=
the output_netcdf_file is named norm_netcdf_variable.nc
-p=
the periodicity is equal to the periodicity of the climatology if -a=cov or -a=cor or to time2 - time1 + 1 if -a=scp
-l=
the whole time period as specified by the -t=time1,time2 argument
-cv=
this argument have the same value as the -v= argument
-mi=
the missing_value in the output NetCDF file is set to 1.e+20
-double
the data are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the data are stored as double-precision floating point numbers
-bigfile
a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
-hdf5
a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
-compact
the output NetCDF file is not compacted
-tlimited
the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file

Remarks

  1. The -v=netcdf_variable argument specifies the NetCDF variable which must be transformed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.

  2. The optional argument -m=input_mesh_mask_netcdf_file specifies the land-sea mask to apply to netcdf_variable for transforming this tridimensional NetCDF variable. By default, it is assumed that each cell in the 2-D grid-mesh associated with the input tridimensional NetCDF variable is a valid time series which must be written in the output NetCDF file.

    The geographical shapes of the netcdf_variable (in the input_netcdf_file) and the mask (in the input_mesh_mask_netcdf_file) must agree if an input_mesh_mask_netcdf_file is used.

    Refer to comp_clim_3d or comp_mask_3d for creating a valid input_mesh_mask_netcdf_file NetCDF file for regular or gaussian grids before using comp_norm_miss_3d.

  3. If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing the whole geographical domain associated with the netcdf_variable is used.

    The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to 1. Negative values are allowed for lon1. In this case the longitude domain is from nlon+lon1+1 to lon2 where nlon is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.

    Refer to comp_mask_3d for transforming geographical coordinates as indices before using comp_norm_miss_3d.

  4. If the -t=time1,time2 argument is missing, data in the whole time period associated with the netcdf_variable is taken into account. The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to 1.

  5. If -g= is set to t, u, v, w or f it is assumed that the NetCDF variable is from an experiment with the ORCA model. In this case, the duplicate points from the ORCA grid are removed before the transformation, as far as possible, and, in particular, if the whole globe is used as the geographical domain. On output, the duplicate points are restored when writing the output file, if and only if, the whole globe is used as the geographical domain. If -g= is set to n, it is assumed that the grid has no duplicate points.

  6. If -g= is set to t, u, v, w or f (i.e. if the NetCDF variable is from an experiment with the ORCA model), the -r= argument gives the resolution used. If:

    • -r=r2 the NetCDF variable is from an experiment with the ORCA R2 model
    • -r=r4 the NetCDF variable is from an experiment with the ORCA R4 model.
  7. If the NetCDF variable is from an experiment with the ORCA model, but the resolution is not R2 or R4, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.

  8. The -a= argument specifies if the data are centered or standardized with an input climatology (specified with the -c= argument):

    • -a=scp means that the raw data are output
    • -a=cov means that the anomalies are output
    • -a=cor means that the standardized anomalies are output
    • -a=spa means that the standardized anomalies are output, but the anomalies are standardized by the standard-deviation averaged over the specified domain for each selected time step
    • -a=tim means that the standardized anomalies are output, but the anomalies are standardized by the standard-deviation averaged over the selected time steps for each grid-point.
  9. The input_climatology_netcdf_file specified with the -c= argument is needed only if -a=cov, -a=cor, -a=spa or -a=tim.

  10. If -a=cov, -a=cor, -a=spa or -a=tim, the selected time period must agree with the climatology. This means that the first selected time observation (time1 if the -t= argument is present) must correspond to the first day, month, season of the climatology specified with the -c= argument.

  11. The geographical shapes of the netcdf_variable (in the input_netcdf_file), the mask (in the input_mesh_mask_netcdf_file), the scale factors (in the input_mesh_mask_netcdf_file), and the climatology (in the input_climatology_netcdf_file) must agree.

  12. The -d=type_of_distance argument is used only if -a=spa is specified. If:

    • -d=dist2, the anomalies are standardized by a weighted standard-deviation. The sum of squares associated with a grid-point is weighted accordingly to the surface associated with that grid-point when computing the standard-deviation over the domain
    • -d=ident, the anomalies are standardized by a simple arithmetic standard-deviation.
  13. The -l= argument selects the indices of the time steps which must be included in the output NetCDF file. The indices of the time steps are counted from the start of the (selected) time period (e.g. time1 in the -t=time1,time2 argument or 1 if this argument is missing). The argument list can be specified in two forms:

    • -l=n1,n2,…nn allows to standardize and select for n1, n2, … and nn time steps.

      If periodicity is defined (with -p= option or -a= is set to cov, cor or spa), n1, n2, … nn time steps are selected for each period separately (see second example).

    • -l=n1:n2 allows to standardize and select time steps from n1 to n2 (or from n1 to n2 for each period separately, if periodicity is defined with -p= option or -a= is set to cov, cor or spa).

    The two forms of the -l= argument may be combined and repeated any number of times. Duplicate time steps are not allowed.

    Be careful with time period limits when specifying the -l= argument.

  14. If the -p= argument is specified and -a=cov, -a=cor, -a=spa or -a=tim, the periodicity deduced from the climatology (given by the -c= argument) overrides the -p= argument.

  15. If the variable used to compute the climatology has not the same name has the variable specified by the -v= argument, use the -cv= argument to specify the variable name for the climatology.

  16. It is assumed that the specified netcdf_variable has a scalar missing_value or _FillValue attribute and that missing values in the data are identified by the value of this missing or _FillValue attribute.

  17. The -mi=missing_value argument specifies the missing value indicator associated with the netcdf_variable (specified by the -v= argument) in the output_netcdf_file. missing_value must be a real number outside of the range of the netcdf_variable. If the -mi= argument is not specified missing_value is set to 1.e+20.

  18. The -double argument specifies that the output NetCDF variable must be stored as double-precision floating point numbers instead of single-precision floating point numbers in the output_netcdf_file.

  19. The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 macros (e.g. -D_USE_NETCDF36 or -D_USE_NETCDF4) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 CPP or _USE_NETCDF4 macros.

  20. The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 macro (e.g. -D_USE_NETCDF4) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.

  21. If the -compact argument is specified and if a domain is selected (with the -x= and -y= arguments) then only data for the selected domain will be output. By default, the whole grid is stored (with missing values outside the selected domain).

  22. Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations.

Outputs

comp_norm_miss_3d creates an output NetCDF file that contains the coordinate NetCDF variables of the input NetCDF dataset input_netcdf_file and the transformed NetCDF variable. This NetCDF variable will have the same dimensions and name as the input NetCDF variable in the file input_netcdf_file (in the description below, nlat and nlon are the length of the spatial dimensions of the input NetCDF variable)

  1. netcdf_variable(ntime,nlat,nlon) : the transformed NetCDF variable as specified by the -m=, -a= and -mi= arguments.

By default, the whole grid associated with the input NetCDF variable is stored (with missing values outside the selected domain). Note, however, that if the argument -compact is used the geographical dimensions of the output NetCDF variable will be reduced to the selected domain as specified by the -x= and -y= arguments (e.g. in this case nlat=lat2-lat1+1 and nlon=lon2-lon1+1 ). The number of time steps written in the output NetCDF file ( e.g. ntime is determined from the -t=, -l= and -p= arguments.

Examples

  1. For computing time series of (monthly) anomalies from a NetCDF variable sst with missing values extracted from a file named Hadsst2_1m_190001_200512_sst.nc and store the results in the NetCDF file anoma_Hadsst2_1m_190001_200512_sst.nc, use the following command :

    $ comp_norm_miss_3d \
      -f=Hadsst2_1m_190001_200512_sst.nc \
      -v=sst \
      -a=cov \
      -c=clim_ST7_1m_00101_20012_grid_T_sosstsst.nc \
      -o=anoma_Hadsst2_1m_190001_200512_sst.nc