comp_norm_3d¶
Authors¶
Eric Maisonnave (CERFACS) and Pascal Terray (LOCEAN/IPSL)
Latest revision¶
19/11/2017
Purpose¶
Select, transform and normalize time series from a tridimensional NetCDF variable extracted from a NetCDF dataset.
The procedure allows a large variety of transformations on the input tridimensional NetCDF variable such as:
- removing and applying scale_factor and add_offset attributes if they are present
- changing the missing_value attribute (with the -mi= argument)
- applying a given mesh-mask given in input of the procedure (with the -m= argument)
- selecting only specific time steps in the output file (with the -t=, -l= and -p= arguments)
- centering or standardizing the time series associated with selected cells of the 2-D grid-mesh associated with the input tridimensional NetCDF variable (with the -a= argument) with the help of an input climatology (specified with the -c= argument)
- reducing the spatial dimensions of the output NetCDF dataset (with the -compact argument).
An output NetCDF file containing the transformed tridimensional variable is created.
If your data contains missing values use comp_norm_miss_3d instead of comp_norm_3d to transform your dataset.
Finally, if the NetCDF variable is fourdimensional use comp_norm_4d instead of comp_norm_3d.
Further Details¶
Usage¶
$ comp_norm_3d \
-f=input_netcdf_file \
-v=netcdf_variable \
-m=input_mesh_mask_netcdf_file (optional) \
-g=grid_type (optional : n, t, u, v, w, f) \
-r=resolution (optional : r2, r4) \
-b=nlon_orca, nlat_orca (optional) \
-x=lon1,lon2 (optional) \
-y=lat1,lat2 (optional) \
-t=time1,time2 (optional) \
-c=input_climatology_netcdf_file (optional) \
-a=type_of_transformation (optional : scp, cov, cor, spa, tim) \
-d=type_of_distance (optional : dist2, ident) \
-o=output_netcdf_file (optional) \
-p=periodicity (optional) \
-l=selected_time_period (optional) \
-cv=climatology_netcdf_variable (optional) \
-mi=missing_value (optional) \
-double (optional) \
-bigfile (optional) \
-hdf5 (optional) \
-compact (optional) \
-tlimited (optional)
By default¶
- -g=
- the grid_type is set to
n
which means that the 2-D grid-mesh associated with the input NetCDF variable is assumed to be regular or Gaussian- -r=
- if the input netcdf_variable is from the NEMO or ORCA model (e.g. if -g= argument is not set to
n
) the resolution is assumed to ber2
- -b=
- if -g= is not set to
n
, the dimensions of the 2-D grid-mesh, nlon_orca and nlat_orca, are determined from the -r= argument. However, you may override this choice by default with the -b= argument- -x=
- the whole longitude domain associated with the netcdf_variable
- -y=
- the whole latitude domain associated with the netcdf_variable
- -t=
- the whole time period associated with the netcdf_variable
- -a=
- the type_of_transformation is set to
scp
. This means that the time series in the 2-D grid-mesh associated with the input netcdf_variable are written as raw data without any centering or standardization- -d=
- the type_of_distance is set to
dist2
.- -o=
- the output_netcdf_file is named
norm_
netcdf_variable.nc
- -p=
- the periodicity is equal to the periodicity of the climatology if -a=
cov
or -a=cor
or totime2
-time1
+1
if -a=scp
- -l=
- the whole time period as specified by the -t=
time1,time2
argument- -cv=
- this argument have the same value as the -v= argument
- -mi=
- the missing_value in the output NetCDF file is set to
1.e+20
- -double
- the data are stored as single-precision floating point numbers in the output NetCDF file. If -double is activated, the data are stored as double-precision floating point numbers
- -bigfile
- a NetCDF classical format file is created. If -bigfile is activated, the output NetCDF file is a 64-bit offset format file
- -hdf5
- a NetCDF classical format file is created. If -hdf5 is activated, the output NetCDF file is a NetCDF-4/HDF5 format file
- -compact
- the output NetCDF file is not compacted
- -tlimited
- the time dimension is defined as unlimited in the output NetCDF file. However, if -tlimited is activated, the time dimension is defined as limited in the output NetCDF file
Remarks¶
The -v=netcdf_variable argument specifies the NetCDF variable which must be transformed and the -f=input_netcdf_file argument specifies that this NetCDF variable must be extracted from the NetCDF file, input_netcdf_file.
The optional argument -m=input_mesh_mask_netcdf_file specifies the land-sea mask to apply to netcdf_variable for transforming this tridimensional NetCDF variable. By default, it is assumed that each cell in the 2-D grid-mesh associated with the input tridimensional NetCDF variable is a valid time series which must be written in the output NetCDF file.
The geographical shapes of the netcdf_variable (in the input_netcdf_file) and the mask (in the input_mesh_mask_netcdf_file) must agree if an input_mesh_mask_netcdf_file is used.
Refer to comp_clim_3d or comp_mask_3d for creating a valid input_mesh_mask_netcdf_file NetCDF file for regular or gaussian grids before using comp_norm_3d.
If the -x=lon1,lon2 and -y=lat1,lat2 arguments are missing the whole geographical domain associated with the netcdf_variable is used.
The longitude or latitude range must be a vector of two integers specifying the first and last selected indices along each dimension. The indices are relative to
1
. Negative values are allowed for lon1. In this case the longitude domain is fromnlon
+lon1+1
to lon2 wherenlon
is the number of longitude points in the grid associated with the NetCDF variable and it is assumed that the grid is periodic.Refer to comp_mask_3d for transforming geographical coordinates as indices before using comp_norm_3d.
If the -t=time1,time2 argument is missing, data in the whole time period associated with the netcdf_variable is taken into account. The selected time period is a vector of two integers specifying the first and last time observations. The indices are relative to
1
.If -g= is set to
t
,u
,v
,w
orf
it is assumed that the NetCDF variable is from an experiment with the ORCA model. In this case, the duplicate points from the ORCA grid are removed before the transformation, as far as possible, and, in particular, if the whole globe is used as the geographical domain. On output, the duplicate points are restored when writing the output file, if and only if, the whole globe is used as the geographical domain. If -g= is set ton
, it is assumed that the grid has no duplicate points.If -g= is set to
t
,u
,v
,w
orf
(i.e. if the NetCDF variable is from an experiment with the ORCA model), the -r= argument gives the resolution used. If:
- -r=
r2
the NetCDF variable is from an experiment with the ORCA R2 model- -r=
r4
the NetCDF variable is from an experiment with the ORCA R4 model.If the NetCDF variable is from an experiment with the ORCA model, but the resolution is not R2 or R4, the dimensions of the ORCA grid must be specified explicitly with the -b= argument.
The -a= argument specifies if the data are centered or standardized with an input climatology (specified with the -c= argument):
- -a=
scp
means that the raw data are output- -a=
cov
means that the anomalies are output- -a=
cor
means that the standardized anomalies are output- -a=
spa
means that the standardized anomalies are output, but the anomalies are standardized by the standard-deviation averaged over the specified domain for each selected time step- -a=
tim
means that the standardized anomalies are output, but the anomalies are standardized by the standard-deviation averaged over the selected time steps for each grid-point.The input_climatology_netcdf_file specified with the -c= argument is needed only if -a=
cov
, -a=cor
, -a=spa
or -a=tim
.If -a=
cov
, -a=cor
, -a=spa
or -a=tim
, the selected time period must agree with the climatology. This means that the first selected time observation (time1 if the -t= argument is present) must correspond to the first day, month, season of the climatology specified with the -c= argument.The geographical shapes of the netcdf_variable (input_netcdf_file), the mask (input_mesh_mask_netcdf_file), the scale factors (input_mesh_mask_netcdf_file), and the climatology (input_climatology_netcdf_file) must agree.
The -d=type_of_distance argument is used only if -a=
spa
is specified. If:
- -d=
dist2
, the anomalies are standardized by a weighted standard-deviation. The sum of squares associated with a grid-point is weighted accordingly to the surface associated with that grid-point when computing the standard-deviation over the domain- -d=
ident
means that the anomalies are standardized by a simple arithmetic standard-deviation.The -l= argument selects the indices of the time steps which must be included in the output NetCDF file. The indices of the time steps are counted from the start of the (selected) time period (e.g. time1 in the -t=time1,time2 argument or
1
if this argument is missing). The argument list can be specified in two forms:
-l=n1,n2,…nn allows to standardize and select for n1, n2, … and nn time steps.
If periodicity is defined (with -p= option or if -a= is set to
cov
,cor
orspa
), n1, n2, … nn time steps are selected for each period separately (see second example below)-l=n1:n2 allows to standardize and select time steps from n1 to n2 (or from n1 to n2 for each period separately, if periodicity is defined with -p= option or if -a= is set to
cov
,cor
orspa
).The two forms of the -l= argument may be combined and repeated any number of times. Duplicate time steps are not allowed.
Be careful with time period limits when specifying the -l= argument.
If the -p= argument is specified and -a=
cov
, -a=cor
, -a=spa
or -a=tim
, the periodicity deduced from the climatology (given by the -c= argument) overrides the -p= argument.If the variable used to compute the climatology has not the same name has the variable specified by the -v= argument, use the -cv= argument to specify the variable name for the climatology.
The -mi=missing_value argument specifies the missing value indicator associated with the netcdf_variable (specified by the -v= argument) in the output_netcdf_file. missing_value must be a real number outside of the range of the netcdf_variable. If the -mi= argument is not specified missing_value is set to
1.e+20
.The -double argument specifies that the output NetCDF variable must be stored as double-precision floating point numbers instead of single-precision floating point numbers in the output_netcdf_file.
The -bigfile argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF36 or _USE_NETCDF4 macros (e.g.
-D_USE_NETCDF36
or-D_USE_NETCDF4
) and linked to the NetCDF 3.6 library or higher. If this argument is specified, the output_netcdf_file will be a 64-bit offset format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF36 CPP or _USE_NETCDF4 macros.The -hdf5 argument is allowed only if the NCSTAT software has been compiled with the _USE_NETCDF4 macro (e.g.
-D_USE_NETCDF4
) and linked to the NetCDF 4 library or higher. If this argument is specified, the output_netcdf_file will be a NetCDF-4/HDF5 format file instead of a NetCDF classic format file. However, this argument is recognized in the procedure only if the NCSTAT software has been built with the _USE_NETCDF4 CPP macro.If the -compact argument is specified and if a domain is selected (with the -x= and -y= arguments) then only data for the selected domain will be output. By default, the whole grid is stored (with missing values outside the selected domain).
Duplicate parameters are allowed, but this is always the last occurrence of a parameter which will be used for the computations.
It is assumed that the data has no missing values. If it is the case, use comp_norm_miss_3d instead of comp_norm_3d.
Outputs¶
comp_norm_3d creates an output NetCDF file that contains the coordinate NetCDF variables of the input NetCDF dataset input_netcdf_file and the transformed NetCDF variable. This NetCDF variable will have the same dimensions and name as the input NetCDF variable in the file input_netcdf_file (in the description below, nlat and nlon are the length of the spatial dimensions of the input NetCDF variable) :
- netcdf_variable
(ntime,nlat,nlon)
: the transformed NetCDF variable as specified by the -m=, -a= and -mi= arguments.By default, the whole grid associated with the input NetCDF variable is stored (with missing values outside the selected domain). Note, however, that if the argument -compact is used the geographical dimensions of the output NetCDF variable will be reduced to the selected domain as specified by the -x= and -y= arguments (e.g. in this case
nlat
=lat2-lat1+1
andnlon
=lon2-lon1+1
). The number of time steps written in the output NetCDF file ( e.g.ntime
) is determined from the -t=, -l= and -p= arguments.
Examples¶
For computing time series of (monthly) anomalies from a NetCDF variable
sosstsst
stored in a fileST7_1m_00101_20012_grid_T_sosstsst.nc
, apply a specific mask to the resulting time series and, finally, store the results in the NetCDF fileanoma_ST7_1m_00101_20012_grid_T_sosstsst.nc
, use the following command (note that the output file is compacted):$ comp_norm_3d \ -f=ST7_1m_00101_20012_grid_T_sosstsst.nc \ -v=sosstsst \ -g=t \ -m=meshmask.indopacific.nc \ -a=cov \ -c=clim_ST7_1m_00101_20012_grid_T_sosstsst.nc \ -o=anoma_ST7_1m_00101_20012_grid_T_sosstsst.nc \ -compactFor selecting the first 120 days of each year (with a 365 days calendar) from the daily NetCDF file
ST7_1d_00101_20012_grid_T_sosstsst.nc
, which includes a NetCDF variablesosstsst
, and store the results in the NetCDF fileselect_ST7_1d_00101_20012_grid_T_sosstsst.nc
, use the following command :$ comp_norm_3d \ -f=ST7_1d_00101_20012_grid_T_sosstsst.nc \ -v=sosstsst \ -m=meshmask.orca2.nc \ -g=t \ -p=365 \ -l=1:120 \ -o=select_ST7_1d_00101_20012_grid_T_sosstsst.nc