Usage

Basics

This section describes how to use NCSTAT operators and gives an overview of the available functionalities.

After a successful compilation of NCSTAT, it is not mandatory, but recommended to update your executable search path with the directory containing the NCSTAT executables if this directory is not already in your executable search path. For example, to do this if your Shell is sh or bash, execute the following command (preferably in your startup file):

$ export PATH="$EXECDIR:$PATH"

where EXECDIR is a Shell variable containing the name/path you have specified in your $NCSTATDIR/make.inc file for the location of the NCSTAT executables.

Assuming that you have updated your search path as described above, all the NCSTAT operators can be directly executed at the command line and follow this syntax:

NCSTAT_operator -arg1 -arg2 ... -argN

The last three characters of each NCSTAT operator indicates implicitly, the NetCDF variables, which can be processed by this NCSTAT operator. For example, comp_clim_3d can process tridimensional NetCDF variables; comp_clim_4d can process fourdimensional NetCDF variables and so on.

The number and meaning of the arguments, -arg1 -arg2 ... -argN, depend on the NCSTAT_operator, but all the arguments begin with a - and can be specified in any order. There are two categories of arguments in the NCSTAT operators:

  • the arguments, which are used to switch on/off a flag and, thus don’t take any values
  • the arguments, which take a value (e.g., a string, an integer, a real number). In that case, the name of the argument is followed by = and you must give the value or values immediately after, with no space between = and the value or values (several values must be separated by a comma).

In order to know quickly, the available arguments for a given NCSTAT operator, just execute this NCSTAT operator without any argument:

$ NCSTAT_operator

This will print on the screen the purpose of the NCSTAT operator, the list of available arguments for this operator and, finally, if this operator is parallelized with OpenMP (taking into account how you have compiled NCSTAT). As an illustration, if you execute:

$ comp_clim_3d

You will obtain the following informations on the screen (output may differ slightly depending on your compilation options):

Purpose :

Compute a climatology from a tridimensional variable
extracted from a NetCDF dataset.

Usage :

comp_clim_3d -f=input_netcdf_file
             -v=netcdf_variable
             -p=periodicity                    (optional)
             -x=lon1,lon2                      (optional)
             -y=lat1,lat2                      (optional)
             -t=time1,time2                    (optional)
             -c=output_climatology_netcdf_file (optional)
             -m=output_mesh_mask_netcdf_file   (optional)
             -yl=latl1,latl2                   (optional)
             -mi=missing_value                 (optional)
             -fmsk=input_mesh_mask_netcdf_file (optional)
             -vmsk=mesh_mask_netcdf_variable   (optional)
             -val=mask_value                   (optional)
             -rel=mask_relation                (optional : eq, gt, ge, lt, le)
             -ntr=number_of_time_records       (optional)
             -double                           (optional)
             -bigfile                          (optional)
             -hdf5                             (optional)
             -tlimited                         (optional)

By default :

-p= the periodicity is set to 1
-x= the whole longitude domain associated with the netcdf_variable
-y= the whole latitude domain associated with the netcdf_variable
-t= the whole time period associated with the netcdf_variable
-c=clim_netcdf_variable.nc
-m= the mesh-mask NetCDF file is not created
-yl= it is assumed that the domain is the whole globe or that the grid is regular
-mi= the missing_value for the STD variable is equal to 1.e+20
-fmsk= an input_mesh_mask_netcdf_file is not used
-vmsk= an input mesh_mask_netcdf_variable is not used
-val=1.
-rel=eq
-ntr= the number_of_time_records is equal to the periodicity
-double   : by default, the STD variable is stored as single floating numbers
-bigfile  : by default, a NetCDF classical format file is created
-hdf5     : by default, a NetCDF classical format file is created
-tlimited : by default, the time dimension is defined as unlimited

This procedure is parallelized with OpenMP

As you can see, some arguments are mandatory and other are optional. The mandatory arguments always take a value, but optional arguments can be of both types. Finally, you know if this operator supports OpenMP parallelism or not.

Common arguments

The following arguments are available for almost all the NCSTAT operators and have the same meaning across the operators.

The common arguments with values of the NCSTAT operators are:

Common arguments with values of the NCSTAT operators
Argument
Meaning or use
-f=
specify an input NetCDF file
-v=
specify an input NetCDF variable
-c=
specify an input NetCDF climatology file produced by comp_clim_3d or similar operators
-x=
specify a longitude domain associated with an input NetCDF variable
-y=
specify a latitude domain associated with an input NetCDF variable
-z=
specify a vertical extension associated with an input NetCDF variable
-t=
specify a time interval associated with an input NetCDF variable
-p=
specify the periodicity of the input data
-m=
specify a mesh-mask NetCDF file produced by comp_clim_3d, comp_mask_3d or similar operators
-o=
specify an output NetCDF file
-mi=
specify a missing indicator value/attribute in the data

Note that for the -x=, -y= and -t= arguments, the latitude/longitude domains and the time interval are specified as integer indices and that the coordinate NetCDF variables, if they exist, are not used, even if they are always copied from the input NetCDF file in the output NetCDF files.

You can use the operators comp_mask_3d and comp_mask_4d to transform geographical coordinates as integer indices for use with the NCSTAT operators. Alternatively, you can select the region of interest for your analysis with a NCSTAT operator by specifying a (land-sea) mask with the -m= argument on input of the NCSTAT operator and the appropriate mask can also be constructed with the help of the comp_mask_3d and comp_mask_4d operators.

The common arguments without any value of the NCSTAT operators are:

Common “switch” arguments of the NCSTAT operators
Argument
Meaning or use
-double
specify that the results of a NCSTAT operator must be stored as double floating-point numbers in the output NetCDF files
-tlimited
specify that the time dimension must be defined as limited in the output NetCDF files
-bigfile
specify that the output files must be 64-bit offset format NetCDF files
-hdf5
specify that the output files must be NetCDF-4/HDF5 format NetCDF files

Parallel execution

Users may request a specific number of OpenMP threads to distribute the work done by the NCSTAT operators, when OpenMP support has been activated at compilation of NCSTAT. As a general rule, don’t request more OpenMP threads than the number of processors available on your machine (excluding also processors used for hyperthreading), this will result in large loss of performance. Keep also in mind that the efficiency of shared memory parallelism as implemented in NCSTAT with OpenMP also depends heavily on the workload of your shared memory computer at runtime.

More generally, threading performance of the NCSTAT operators will depend on a variety of factors including the compiler, the version of the OpenMP library, the processor type, the number of cores, the amount of available memory, whether hyperthreading is enabled and the mix of applications that are executing concurrently with the NCSTAT operator.

At the simplest level, the number of OpenMP threads used by the NCSTAT operators can be controlled by setting the OMP_NUM_THREADS OpenMP environment variable to the desired number of threads and the number of threads will be the same throughout the execution of the commands. The OMP_NUM_THREADS OpenMP environment variable must be defined before the use of the NCSTAT operators to activate OpenMP parallelism.

Setting OpenMP environment variables is done the same way you set any other environment variables, and depends upon which Shell you use:

Setting the number of OpenMP threads to be used
Shell
Command line
csh/tcsh
setenv OMP_NUM_THREADS 8
sh/bash
export OMP_NUM_THREADS=8

In some cases, an OpenMP application will perform better if its OpenMP threads are bound to processors/cores (this is called “thread affinity”, “thread binding” or “processor affinity”) because this can result in better cache utilization, thereby reducing costly memory accesses. OpenMP version 3.1 API provides an environment variable to turn processor binding “on” or “off”. For example, to turn “on” thread binding you can use:

$ export OMP_PROC_BIND=TRUE   #if you are using a sh/bash Shell

Keep also in mind, that the OpenMP standard does not specify how much stack space an OpenMP thread should have (this stack space is used to store private variables and arrays used only by each OpenMP thread). Consequently, OpenMP implementations will differ in the default thread stack size and the default thread stack size can be easily exhausted for moderate/large applications on some systems. Threads that exceed their stack allocation may give a segmentation fault or the application may continue to run while data is being corrupted. If your OpenMP environment supports the OpenMP 3.0 OMP_STACKSIZE environment variable, you can use it to set the thread stack size prior to program execution. For example:

$ export OMP_STACKSIZE=10M   #if you are using a sh/bash Shell
$ export OMP_STACKSIZE=3000k #if you are using a sh/bash Shell

More generally, the run-time behaviour of NCSTAT operators is also determined by setting some other OpenMP environment variables (e.g., OMP_NESTED, OMP_MAX_ACTIVE_LEVELS or OMP_DYNAMIC for example) just before the execution of the operators. However, starting with the OpenMP 5.0 API, the use of the OMP_NESTED variable is deprecated and must be replaced by the use of the OMP_MAX_ACTIVE_LEVELS OpenMP variable, already available in the OpenMP 3.0 API. See OpenMP Environment Variables for more details on the use of the OMP_MAX_ACTIVE_LEVELS variable to turn off/on nested OpenMP parallelism and the level of nested parallelism in Fortran or C programs using OpenMP.

See also the official OpenMP documentation available at OpenMP or the more friendly tutorial OpenMP tutorial for more details and examples.

Note, in particular, that the STATPACK subroutines and functions used in the NCSTAT code may use OpenMP nested parallelism if the OMP_NESTED variable is set to TRUE or if the OMP_MAX_ACTIVE_LEVELS is set to 1, but that the usage of OpenMP nested parallelism is not recommended if you have compiled the NCSTAT operators or the STATPACK library with BLAS support and you have linked with a multi-threaded version of BLAS, such as [gotoblas], [openblas] or vendor BLAS like Intel MKL [mkl]. In such cases, it is better to first desactivate OpenMP nested parallelism before executing the NCSTAT operators by using first the command:

$ export OMP_NESTED=FALSE   #if you are using a sh/bash Shell

or the command:

$ export OMP_MAX_ACTIVE_LEVELS=1   #if you are using a sh/bash Shell

and also to let OpenMP controls the multi-threading in the BLAS library, if possible.

In the case of OpenBLAS [openblas] or GotoBLAS [gotoblas], this can be done by using the makefile USE_OPENMP=1 option when compiling OpenBLAS or GotoBLAS. Consult the OpenBLAS manual for more details [openblas]. On the other hand, if your OpenBLAS or GotoBLAS library has already been compiled with multi-threading enabled, but no support for OpenMP (this is the default setting), it is better to desactivate OpenBLAS or GotoBLAS multi-threading before execution of your application because, otherwise, OpenMP will not control the multi-threading in the BLAS library and this will likely results in significant loss of performance. To do this, use a command like (for OpenBLAS):

$ export OPENBLAS_NUM_THREADS=1   #if you are using a sh/bash Shell

or (for GotoBLAS):

$ export GOTO_NUM_THREADS=1   #if you are using a sh/bash Shell

Similarly, for Intel MKL BLAS [mkl], it is better to let OpenMP controls the multi-threading in the MKL BLAS. This can be done simply by undefining the Shell variable MKL_NUM_THREADS:

$ unset MKL_NUM_THREADS   #if you are using sh/bash Shell

Finally, if you suspect an error in the computations performed by a parallelized NCSTAT operator on your machine, a good strategy is to compare the results of a serial execution of this operator (i.e., by setting OMP_NUM_THREADS to 1) with those of a parallel execution of this operator (i.e., by setting OMP_NUM_THREADS to a value greater than 1). If the results differ, a second step is to recompile NCSTAT without the preprocessor cpp macro _PARALLEL_READ in order to detect if the origin of the problem is due to the OpenMP parallel reading of the NetCDF files and to some incompatibilities between the compiler options used to build the NetCDF library and NCSTAT.

NCSTAT summary

This section re-organizes the NCSTAT operators by task, with a brief note indicating what each operator does. More details about each operator can be found in Chapter 2, where the full documentation for each NCSTAT operator is given in alphabetical order.

NCSTAT operators for compression/decompression of a NetCDF variable:

NCSTAT operators for compression/decompression
Operator
OpenMP
Description
no
pack a tridimensional NetCDF variable
no
unpack a tridimensional NetCDF variable

NCSTAT operators for computing mesh-mask files:

NCSTAT operators for computing mesh-mask files
Operator
OpenMP
Description
no
compute a mesh-mask from a tridimensional NetCDF variable
no
compute a mesh-mask from a fourdimensional NetCDF variable

NCSTAT operators for univariate statistics and computation of mesh-mask files:

NCSTAT operators for univariate statistics
Operator
OpenMP
Description
yes
compute means and standard-deviations from a tridimensional NetCDF variable
yes
compute means and standard-deviations from a fourdimensional NetCDF variable
yes
compute means and standard-deviations from a tridimensional NetCDF variable with missing values
yes
compute means and standard-deviations from a fourdimensional NetCDF variable with missing values
yes
compute univariate statistics from a tridimensional NetCDF variable
yes
compute univariate statistics from a fourdimensional NetCDF variable
yes
compute univariate statistics from a tridimensional NetCDF variable with missing values

NCSTAT operators for composite analysis:

NCSTAT operators for composite analysis
Operator
OpenMP
Description
yes
compute a composite analysis from a tridimensional NetCDF variable
yes
compute a composite analysis from a fourdimensional NetCDF variable
yes
compute a composite analysis from a tridimensional NetCDF variable with missing values

NCSTAT operators for transforming and time averaging of time series:

NCSTAT operators for computing, transforming and time averaging of time series
Operator
OpenMP
Description
no
transform multi-channel time series from a tridimensional NetCDF variable
no
transform multi-channel time series from a fourdimensional NetCDF variable
no
transform multi-channel time series from a tridimensional NetCDF variable with missing values
no
transform multi-channel time series from a fourdimensional NetCDF variable with missing values
no
compute time-averages of multi-channel time series from a tridimensional NetCDF variable
no
compute time-averages of multi-channel time series from a fourdimensional NetCDF variable
no
compute time-averages of multi-channel time series from a tridimensional NetCDF variable with missing values
no
compute time-averages of multi-channel time series from a fourdimensional NetCDF variable with missing values

NCSTAT operators for computing time series, cross-sections and vertical integrals:

NCSTAT operators for computing time series, cross-sections and vertical integrals
Operator
OpenMP
Description
yes
compute a time series from a tridimensional NetCDF variable
yes
compute a time series from a fourdimensional NetCDF variable
yes
compute a time series from a tridimensional NetCDF variable with missing values
yes
compute a index time series from two unidimensional NetCDF variables
yes
compute a cross-section from a tridimensional NetCDF variable
yes
compute a cross-section from a fourdimensional NetCDF variable
yes
compute a cross-section from a tridimensional NetCDF variable with missing values
yes
compute vertical integrals from a fourdimensional NetCDF variable
yes
compute vertical integrals from a fourdimensional NetCDF variable with missing values

NCSTAT operators for decomposing time series:

NCSTAT operators for decomposing time series
Operator
OpenMP
Description
no
decompose a time series from a unidimensional NetCDF variable by the STL method
yes
decompose multi-channel time series from a tridimensional NetCDF variable by the STL method
yes
decompose multi-channel time series from a fourdimensional NetCDF variable by the STL method
no
estimate a trend from a unidimensional NetCDF variable by the LOESS method
yes
estimate multi-channel trend from a tridimensional NetCDF variable by the LOESS method
yes
estimate multi-channel trend from a fourdimensional NetCDF variable by the LOESS method

NCSTAT operators for filtering time series:

NCSTAT operators for filtering time series
Operator
OpenMP
Description
no
filter a time series from a unidimensional NetCDF variable in a selected frequency band by windowed filtering
yes
filter multi-channel time series from a tridimensional NetCDF variable in a selected frequency band by windowed filtering
yes
filter multi-channel time series from a fourdimensional NetCDF variable in a selected frequency band by windowed filtering
no
filter a time series from a unidimensional NetCDF variable in a selected frequency band by Lanczos filtering
yes
filter multi-channel time series from a tridimensional NetCDF variable in a selected frequency band by Lanczos filtering
yes
filter multi-channel time series from a fourdimensional NetCDF variable in a selected frequency band by Lanczos filtering
no
filter a time series from a unidimensional NetCDF variable in a selected frequency band by linear symmetric filtering
yes
filter multi-channel time series from a tridimensional NetCDF variable in a selected frequency band by linear symmetric filtering
yes
filter multi-channel time series from a fourdimensional NetCDF variable in a selected frequency band by linear symmetric filtering
no
estimate the transfer function of a Lanczos or linear symmetric filter

NCSTAT operators for correlation and regression analysis:

NCSTAT operators for correlation and regression analysis
Operator
OpenMP
Description
yes
compute correlation and regression from an index time series and a unidimensional NetCDF variable
yes
compute correlation and regression from an index time series and a tridimensional NetCDF variable
yes
compute correlation and regression from an index time series and a fourdimensional NetCDF variable
yes
compute correlation and regression from an index time series and a tridimensional NetCDF variable with missing values
no
compute trend and regression from an index time series and a unidimensional NetCDF variable
yes
compute trend and regression from an index time series and a tridimensional NetCDF variable
yes
compute trend and regression from an index time series and a fourdimensional NetCDF variable

NCSTAT operators for multivariate statistics:

NCSTAT operators for multivariate statistics
Operator
OpenMP
Description
yes
compute a Principal Component Analysis (PCA) from a tridimensional NetCDF variable
yes
compute a Principal Component Analysis (PCA) from a fourdimensional NetCDF variable
yes
compute a Principal Component Analysis (PCA) from a tridimensional NetCDF variable with missing values
yes
compute a Maximum Covariance Analysis (MCA) from two tri- or fourdimensional NetCDF variables
no
compute a PCA or MCA approximation of a tridimensional NetCDF variable
no
compute a PCA or MCA approximation of a fourdimensional NetCDF variable
yes
compute a PCA or MCA projection of a tridimensional NetCDF variable
yes
compute a PCA or MCA projection of a fourdimensional NetCDF variable

NCSTAT operators for specialized multivariate statistics:

NCSTAT operators for orthogonal rotation of a partial PCA model
Operator
OpenMP
Description
yes
perform an orthogonal rotation of a (partial) PCA model computed from a tridimensional NetCDF variable
yes
perform an orthogonal rotation of a (partial) PCA model computed from a fourdimensional NetCDF variable
yes
perform an orthogonal rotation of selected standardized Principal Components (PC) time series from a tridimensional NetCDF variable
yes
perform an orthogonal rotation of selected standardized Principal Components (PC) time series from a fourdimensional NetCDF variable
yes
perform an orthogonal rotation of selected standardized Principal Components (PC) time series from a tridimensional NetCDF variable
yes
perform an orthogonal rotation of selected standardized Principal Components (PC) time series from a fourdimensional NetCDF variable

NCSTAT operators for power and cross-power spectrum analysis:

Operator
OpenMP
Description
no
compute a power spectrum analysis from a unidimensional NetCDF variable
no
compute a power or cross-power spectrum analysis from two unidimensional NetCDF variables
yes
compute a power or cross-power spectrum analysis from an index time series and a tridimensional NetCDF variable
yes
compute a power or cross-power spectrum analysis from an index time series and a tridimensional NetCDF variable
no
compute variance estimates in a selected frequency band from a unidimensional NetCDF variable
yes
compute variance estimates in a selected frequency band from a tridimensional NetCDF variable
yes
compute variance estimates in a selected frequency band from a fourdimensional NetCDF variable
no
compute power spectrum density ratios and their confidence intervals from power spectrum analyses of two unidimensional NetCDF variables
no
compute power spectrum density ratios and their confidence intervals from power spectrum analyses of two tridimensional NetCDF variables
no
perform nonparametric statistical tests about the difference and shape of two power spectra of two tridimensional NetCDF variables
Flag Counter