comp_norm_miss_3d¶
Authors¶
Pascal Terray (LOCEAN/IPSL)
Latest revision¶
28/05/2024
Purpose¶
Select, transform and normalize a muli-channel time series from a tridimensional NetCDF variable with missing values extracted from a NetCDF dataset.
The procedure allows a large variety of transformation on the input tridimensional NetCDF variable such as:
- removing and applying scale_factor and add_offset attributes if they are present
- changing the missing_value attribute (with the -mi= argument)
- applying a given mesh-mask given in input of the procedure (with the -m= argument)
- selecting only specific time steps in the output file (with the -t=, -l= and -p= arguments)
- centering or standardizing the time series associated with selected cells of the 2-D grid-mesh associated with the input tridimensional NetCDF variable (with the -a= argument) with the help of an input climatology (specified with the -c= argument)
- reducing the spatial dimensions of the output NetCDF dataset (with the -compact argument)
An output NetCDF file containing the transformed tridimensional variable is created.
If your data does not contain missing values use comp_norm_3d instead of comp_norm_miss_3d to transform your dataset.
Further Details¶
Usage¶
$ comp_norm_miss_3d \
-f=input_netcdf_file \
-v=netcdf_variable \
-m=input_mesh_mask_netcdf_file (optional) \
-g=grid_type (optional : n, t, u, v, w, f) \
-r=resolution (optional : r2, r4) \
-b=nlon_orca, nlat_orca (optional) \
-x=lon1,lon2 (optional) \
-y=lat1,lat2 (optional) \
-t=time1,time2 (optional) \
-c=input_climatology_netcdf_file (optional) \
-a=type_of_transformation (optional : scp, cov, cor, spa, tim) \
-d=type_of_distance (optional : dist2, ident) \
-o=output_netcdf_file (optional) \
-p=periodicity (optional) \
-l=selected_time_period (optional) \
-cv=climatology_netcdf_variable (optional) \
-mi=missing_value (optional) \
-double (optional) \
-bigfile (optional) \
-hdf5 (optional) \
-compact (optional) \
-tlimited (optional)
By default¶
- -g=
- the grid_type is set to
n
, which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian- -r=
- if the input netcdf_variable is from the NEMO model (e.g., if -g= argument is not set to
n
) the resolution is assumed to ber2
- -b=
- if -g= is not set to
n
, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, are determined from the -r= argument. However, you may override this default choice with the -b= argument- -x=
- the whole longitude domain associated with the netcdf_variable
- -y=
- the whole latitude domain associated with the netcdf_variable
- -t=
- the whole time period associated with the netcdf_variable
- -a=
- the type_of_transformation is set to
scp
. This means that the time series associated with the input netcdf_variable are output as raw data without any centering or standardization- -d=
- the type_of_distance is set to
dist2
- -o=
- the output_netcdf_file is named
norm_
netcdf_variable.nc
- -p=
- the periodicity is equal to the periodicity of the climatology if -a=
cov
or -a=cor
or totime2
-time1
+1
if -a=scp
- -l=
- the whole time period as specified by the -t=
time1,time2
argument- -cv=
- this argument, which specifies the name of the NetCDF variable used to compute the input_climatology_netcdf_file, has by default the same value as the -v= argument
- -mi=
- the missing_value in the output NetCDF file is set to
1.e+20
- -double
- the data are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the data are stored as double-precision floating point numbers
- -bigfile
- a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
- -hdf5
- a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
- -compact
- the output NetCDF file is not compacted
- -tlimited
- the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file
Remarks¶
The -v=netcdf_variable argument specifies the NetCDF variable which must be transformed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.
The optional argument -m=input_mesh_mask_netcdf_file specifies the land-sea mask to apply to netcdf_variable before transforming this tridimensional NetCDF variable. By default, it is assumed that each cell in the 2-D grid-mesh associated with the input tridimensional NetCDF variable is a valid time series which must be written in the output NetCDF file. In other words, an input_mesh_mask_netcdf_file must be specified if the input netcdf_variable has masked values over all the land or ocean grid-points.
Note also that an input_mesh_mask_netcdf_file is required if -a=
spa
and -d=dist2
are specified in order to get the scale factors of the 2-D grid-mesh associated with the input netcdf_variable as these scale factors are used to compute the weighted standard-deviation over the selected domain with this combination of input arguments.The geographical shapes of the netcdf_variable (in the input_netcdf_file) and the mask (in the input_mesh_mask_netcdf_file) must agree if an input_mesh_mask_netcdf_file is used.
Refer to comp_clim_miss_3d or comp_mask_3d for creating a valid input_mesh_mask_netcdf_file NetCDF file for regular or gaussian grids before using comp_norm_miss_3d.
If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing the whole geographical domain associated with the netcdf_variable is used.
The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to
1
. Negative values are allowed for lon1. In this case the longitude domain is fromnlon
+lon1+1
to lon2 wherenlon
is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.Refer to comp_mask_3d for transforming geographical coordinates as indices before using comp_norm_miss_3d.
If the -t=time1,time2 argument is missing, data in the whole time period associated with the netcdf_variable is taken into account. The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
.If -g= is set to
t
,u
,v
,w
orf
it is assumed that the NetCDF variable is from an experiment with the NEMO model. In this case, the duplicate points from the NEMO grid are removed before the transformation, as far as possible, and, in particular, if the whole globe is used as the geographical domain. On output, the duplicate points are restored when writing the output file, if and only if, the whole globe is used as the geographical domain.If -g= is set to
n
, it is assumed that the grid is regular or gaussian and has no duplicate points.If -g= is set to
t
,u
,v
,w
orf
(i.e., if the NetCDF variable is from an experiment with the NEMO model), the -r= argument gives the used resolution. If
- -r=
r2
the input NetCDF variable is from an experiment with the ORCA R2 configuration of the NEMO model- -r=
r4
the input NetCDF variable is from an experiment with the ORCA R4 configuration of the NEMO model.If the NetCDF variable is from an experiment with the NEMO model, but the resolution is not R2 or R4, the dimensions of the NEMO grid must be specified explicitly with the -b= argument.
The -a= argument specifies if the data must be centered or standardized with an input climatology (specified with the -c= argument):
- -a=
scp
means that the raw data are output- -a=
cov
means that the anomalies are output- -a=
cor
means that the standardized anomalies are output- -a=
spa
means that the standardized anomalies are output, but the anomalies are standardized by the standard-deviation averaged over the specified domain for each selected time step- -a=
tim
means that the standardized anomalies are output, but the anomalies are standardized by the standard-deviation averaged over the selected time steps for each grid-point.The input_climatology_netcdf_file specified with the -c= argument is needed only if -a=
cov
, -a=cor
, -a=spa
or -a=tim
.If -a=
cov
, -a=cor
, -a=spa
or -a=tim
, the selected time period must agree with the climatology. This means that the first selected time observation (time1 if the -t= argument is present) must correspond to the first day, month, season of the climatology specified with the -c= argument.The geographical shapes of the netcdf_variable (in the input_netcdf_file), the mask (in the input_mesh_mask_netcdf_file), the scale factors (in the input_mesh_mask_netcdf_file), and the climatology (in the input_climatology_netcdf_file) must agree.
The -d=type_of_distance argument is used only if -a=
spa
is specified. If:
- -d=
dist2
, the anomalies are standardized by a weighted standard-deviation. The sum of squares associated with a grid-point is weighted accordingly to the surface associated with that grid-point when computing the standard-deviation over the domain- -d=
ident
, the anomalies are standardized by a simple arithmetic mean of the standard-deviations in the domain.The -l= argument selects the indices of the time steps which must be included in the output NetCDF file. The indices of the time steps are counted from the start of the (selected) time period (e.g., time1 in the -t=time1,time2 argument or
1
if this argument is missing). The argument list can be specified in two forms:
-l=n1,n2,…nn allows to standardize and select for n1, n2, … and nn time steps.
If periodicity is defined (with -p= option or if -a= is set to
cov
,cor
orspa
), n1, n2, … nn time steps are selected for each period separately (see second example).-l=n1:n2 allows to standardize and select time steps from n1 to n2 (or from n1 to n2 for each period separately, if periodicity is defined with -p= option or if -a= is set to
cov
,cor
orspa
).The two forms of the -l= argument may be combined and repeated any number of times. Duplicate time steps are not allowed.
Be careful with time period limits when specifying the -l= argument.
If the -p= argument is specified and -a=
cov
, -a=cor
, -a=spa
or -a=tim
, the periodicity deduced from the climatology (given by the -c= argument) overrides the -p= argument.If the variable used to compute the input climatology has not the same name has the variable specified by the -v= argument, use the -cv= argument to specify the variable name for the climatology.
The -mi=missing_value argument specifies the missing value indicator associated with the netcdf_variable (specified by the -v= argument) in the output_netcdf_file. missing_value must be a real number outside of the range of the netcdf_variable.
If the -mi= argument is not specified missing_value is set to
1.e+20
.The -double argument specifies that the output NetCDF variable must be stored as double-precision floating point numbers instead of single-precision floating point numbers in the output_netcdf_file.
The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 macros (e.g.,
-D_USE_NETCDF36
or-D_USE_NETCDF4
) and linked to the NetCDF 3.6 library or higher.If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 CPP or _USE_NETCDF4 macros.
The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 macro (e.g.,
-D_USE_NETCDF4
) and linked to the NetCDF 4 library or higher.If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.
If the -compact argument is specified and if a domain is selected (with the -x= and -y= arguments) then only data for the selected domain will be output. By default, the whole grid is stored (with missing values outside the selected domain).
Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations.
It is assumed that the specified netcdf_variable has a scalar missing_value or _FillValue attribute and that missing values in the data are identified by the value of this missing_value or _FillValue attribute.
Outputs¶
comp_norm_miss_3d creates an output NetCDF file that contains the coordinate NetCDF variables of the input NetCDF dataset input_netcdf_file and the transformed NetCDF variable. This NetCDF variable will have the same dimensions and name as the input NetCDF variable in the file input_netcdf_file (in the description below, nlat and nlon are the lengths of the spatial dimensions of the input NetCDF variable)
- netcdf_variable
(ntime,nlat,nlon)
: the transformed NetCDF variable as specified by the -m=, -a= and -mi= arguments.By default, the whole 2-D grid associated with the input NetCDF variable is stored (with missing values outside the selected domain). Note, however, that if the argument -compact is used the geographical dimensions of the output NetCDF variable will be reduced to the selected domain as specified by the -x= and -y= arguments (e.g., in this case
nlat
=lat2-lat1+1
andnlon
=lon2-lon1+1
). The number of time steps written in the output NetCDF file (e.g.,ntime
is determined from the -t=, -l= and -p= arguments.
Examples¶
For computing time series of (monthly) anomalies from a NetCDF variable
sst
with missing values extracted from a file namedHadsst2_1m_190001_200512_sst.nc
and store the results in the NetCDF fileanoma_Hadsst2_1m_190001_200512_sst.nc
, use the following command :$ comp_norm_miss_3d \ -f=Hadsst2_1m_190001_200512_sst.nc \ -v=sst \ -a=cov \ -c=clim_ST7_1m_00101_20012_grid_T_sosstsst.nc \ -o=anoma_Hadsst2_1m_190001_200512_sst.nc