Usage

Basics

This section describes how to use NCSTAT operators and gives an overview of the available functionalities.

After a successful compilation of NCSTAT, it is not mandatory, but recommended to update your executable search path with the directory containing the NCSTAT executables if this directory is not already in your executable search path. For example, to do this if your Shell is sh or bash, execute the following command (preferably in your startup file):

$ export PATH="$EXECDIR:$PATH"

where EXECDIR is a Shell variable containing the name/path you have specified in your $NCSTATDIR/make.inc file for the location of the NCSTAT executables.

Assuming that you have updated your search path as described above, all the NCSTAT operators can be directly executed at the command line and follow this syntax:

NCSTAT_operator -arg1 -arg2 ... -argN

The last three characters of each NCSTAT operator indicates implicitly, the NetCDF variables, which can be processed by this NCSTAT operator. For example, comp_clim_3d can process tridimensional NetCDF variables; comp_clim_4d can process fourdimensional NetCDF variables and so on.

The number and meaning of the arguments, -arg1 -arg2 ... -argN, depend on the NCSTAT_operator, but all the arguments begin with a - and can be specified in any order. There are two categories of arguments in the NCSTAT operators:

  • the arguments, which are used to switch on/off a flag and, thus don’t take any values
  • the arguments, which take a value (e.g. a string, an integer, a real number). In that case, the name of the argument is followed by = and you must give the value or values immediately after, with no space between = and the values.

In order to know quickly, the available arguments for a given NCSTAT operator, just execute this NCSTAT operator without any argument:

$ NCSTAT_operator

This will print on the screen the purpose of the NCSTAT operator, the list of available arguments for this operator and, finally, if this operator is parallelized with OpenMP (taking into account how you have compiled NCSTAT). As an illustration, if you execute:

$ comp_clim_3d

You will obtain the following informations on the screen (output may differ slightly depending on your compilation options):

Purpose :

Compute a climatology from a tridimensional variable
extracted from a NetCDF dataset.

Usage :

comp_clim_3d -f=input_netcdf_file
             -v=netcdf_variable
             -p=periodicity                    (optional)
             -x=lon1,lon2                      (optional)
             -y=lat1,lat2                      (optional)
             -t=time1,time2                    (optional)
             -c=output_climatology_netcdf_file (optional)
             -m=output_mesh_mask_netcdf_file   (optional)
             -yl=latl1,latl2                   (optional)
             -mi=missing_value                 (optional)
             -fmsk=input_mesh_mask_netcdf_file (optional)
             -vmsk=mesh_mask_netcdf_variable   (optional)
             -val=mask_value                   (optional)
             -rel=mask_relation                (optional : eq, gt, ge, lt, le)
             -ntr=number_of_time_records       (optional)
             -double                           (optional)
             -bigfile                          (optional)
             -hdf5                             (optional)
             -tlimited                         (optional)

By default :

-p= the periodicity is set to 1
-x= the whole longitude domain associated with the netcdf_variable
-y= the whole latitude domain associated with the netcdf_variable
-t= the whole time period associated with the netcdf_variable
-c=clim_netcdf_variable.nc
-m= the mesh-mask NetCDF file is not created
-yl= it is assumed that the domain is the whole globe
-mi= the missing_value for the STD variable is equal to 1.e+20
-fmsk= an input_mesh_mask_netcdf_file is not used
-vmsk= an input mesh_mask_netcdf_variable is not used
-val=1.
-rel=eq
-ntr= the number_of_time_records is equal to the periodicity
-double   : by default, the STD variable is stored as single floating numbers
-bigfile  : by default, a NetCDF classical format file is created
-hdf5     : by default, a NetCDF classical format file is created
-tlimited : by default, the time dimension is defined as unlimited

This procedure is parallelized with OpenMP

As you can see, some arguments are mandatory and other are optional. The mandatory arguments always take a value, but optional arguments can be of both types. Finally, you know if this operator supports OpenMP parallelism or not.

Common arguments

The following arguments are available for almost all the NCSTAT operators and have the same meaning accross the operators.

The common arguments with values of the NCSTAT operators:

Common arguments with values of the NCSTAT operators
Argument
Meaning or use
-f=
use to specify an input NetCDF file
-v=
use to specify an input NetCDF variable
-c=
use to specify an input NetCDF climatology file produced by comp_clim_3d or similar operators
-x=
use to specify a longitude domain associated with an input NetCDF variable
-y=
use to specify a latitude domain associated with an input NetCDF variable
-z=
use to specify a vertical extension associated with an input NetCDF variable
-t=
use to specify a time interval associated with an input NetCDF variable
-p=
use to specify the periodicity of the input data
-m=
use to specify a mesh-mask NetCDF file produced by comp_clim_3d, comp_mask_3d or similar operators
-o=
use to specify an output NetCDF file
-mi=
use to specify a missing indicator value/attribute in the data

Note that for the -x=, -y= and -t= arguments, the latitude/longitude domains and the time interval are specified as integer indices and that the coordinate NetCDF variables, if they exist, are not used.

You can use the operators comp_mask_3d and comp_mask_4d to transform geographical coordinates as integer indices for use with the NCSTAT operators.

The common arguments without any value of the NCSTAT operators are:

Common “switch” arguments of the NCSTAT operators
Argument
Meaning or use
-double
use to specify that the results of a NCSTAT operator must be stored as double floating-point numbers in the output NetCDF files
-tlimited
use to specify that the time dimension must be defined as limited in the output NetCDF files
-bigfile
use to specify that the output files must be 64-bit offset format NetCDF files
-hdf5
use to specify that the output files must be NetCDF-4/HDF5 format NetCDF files

Parallel execution

Users may request a specific number of OpenMP threads to distribute the work done by the NCSTAT operators, when OpenMP support has been activated at compilation of NCSTAT. As a general rule, don’t request more OpenMP threads than the number of processors available on your machine (excluding also processors used for hyperthreading), this will result in large loss of performance. Keep also in mind that the efficiency of shared memory parallelism as implemented in NCSTAT with OpenMP also depends heavily on the workload of your shared memory computer at runtime.

More generally, threading performance of the NCSTAT operators will depend on a variety of factors including the compiler, the version of the OpenMP library, the processor type, the number of cores, the amount of available memory, whether hyperthreading is enabled and the mix of applications that are executing concurrently with the NCSTAT operator.

At the simplest level, the number of OpenMP threads used by the NCSTAT operators can be controlled by setting the OMP_NUM_THREADS OpenMP environment variable to the desired number of threads and the number of threads will be the same throughout the execution of the commands. The OMP_NUM_THREADS OpenMP environment variable must be defined before the use of the NCSTAT operators to activate OpenMP parallelism.

Setting OpenMP environment variables is done the same way you set any other environment variables, and depends upon which Shell you use:

Setting the number of OpenMP threads to be used
Shell
Command line
csh/tcsh
setenv OMP_NUM_THREADS 8
sh/bash
export OMP_NUM_THREADS=8

In some cases, an OpenMP application will perform better if its OpenMP threads are bound to processors/cores (this is called “thread affinity”, “thread binding” or “processor affinity”) because this can result in better cache utilization, thereby reducing costly memory accesses. OpenMP version 3.1 API provides an environment variable to turn processor binding “on” or “off”. For example, to turn “on” thread binding you can use:

$ export OMP_PROC_BIND=TRUE   #if you are using a sh/bash Shell

Keep also in mind, that the OpenMP standard does not specify how much stack space an OpenMP thread should have. Consequently, implementations will differ in the default thread stack size and the default thread stack size can be easily exhausted for moderate/large applications on some systems. Threads that exceed their stack allocation may give a segmentation fault or the application may continue to run while data is being corrupted. If your OpenMP environment supports the OpenMP 3.0 OMP_STACKSIZE environment variable, you can use it to set the thread stack size prior to program execution. For example:

$ export OMP_STACKSIZE=10M   #if you are using a sh/bash Shell
$ export OMP_STACKSIZE=3000k #if you are using a sh/bash Shell

More generally, the run-time behaviour of NCSTAT operators is also determined by setting some other OpenMP environment variables (e.g. OMP_NESTED or OMP_DYNAMIC for example) just before the execution of the operators. See the official OpenMP documentation available at OpenMP or the more friendly tutorial OpenMP tutorial for more details and examples.

Note, in particular, that the STATPACK subroutines and functions used in the NCSTAT code may use OpenMP nested parallelism if the OMP_NESTED variable is set to TRUE, but that the usage of OpenMP nested parallelism is not recommended if you have compiled the NCSTAT operators or the STATPACK library with BLAS support and you have linked with a multi-threaded version of BLAS, such as [gotoblas], [openblas] or vendor BLAS like Intel MKL [mkl]. In such cases, it is better to first desactivate OpenMP nested parallelism before executing the NCSTAT operators by using first the command:

$ export OMP_NESTED=FALSE   #if you are using a sh/bash Shell

and also to let OpenMP controls the multithreading in the BLAS library, if possible.

In the case of OpenBLAS [openblas] or GotoBLAS [gotoblas], this can be done by using the makefile USE_OPENMP=1 option when compiling OpenBLAS or GotoBLAS. Consult the OpenBLAS manual for more details [openblas]. On the other hand, if your OpenBLAS or GotoBLAS library has already been compiled with multithreading enabled, but no support for OpenMP (this is the default setting), it is better to desactivate OpenBLAS or GotoBLAS multithreading before execution of your application because, otherwise, OpenMP will not control the multithreading in the BLAS library and this will likely results in significant loss of performance. To do this, use a command like (for OpenBLAS):

$ export OPENBLAS_NUM_THREADS=1   #if you are using a sh/bash Shell

or (for GotoBLAS):

$ export GOTO_NUM_THREADS=1   #if you are using a sh/bash Shell

Similarly, for Intel MKL BLAS [mkl], it is better to let OpenMP controls the multithreading in the MKL BLAS. This can be done simply by undefining the Shell variable MKL_NUM_THREADS:

$ unset MKL_NUM_THREADS   #if you are using sh/bash Shell

Finally, if you suspect an error in the computations performed by a parallelized NCSTAT operator on your machine, a good strategy is to compare the results of a serial execution of this operator (i.e. by setting OMP_NUM_THREADS to 1) with those of a parallel execution of this operator (i.e. by setting OMP_NUM_THREADS to a value greater than 1). If the results differ, a second step is to recompile NCSTAT without the preprocessor cpp macro _PARALLEL_READ in order to detect if the origin of the problem is due to the OpenMP parallel reading of the NetCDF files and to some incompatibilities between the compiler options used to build the NetCDF library and NCSTAT.

NCSTAT summary

This section re-organizes the NCSTAT operators by task, with a brief note indicating what each operator does. More details about each operator can be found in Chapter 2, where the full documentation for each NCSTAT operator is given in alphabetical order.

NCSTAT operators for computing mesh-mask files:

NCSTAT operators for computing mesh-mask files
Operator
OpenMP
Description
no
compute a mesh-mask from a tridimensional NetCDF variable
no
compute a mesh-mask from a fourdimensional NetCDF variable

NCSTAT operators for univariate statistics:

NCSTAT operators for univariate statistics
Operator
OpenMP
Description
yes
compute means and standard-deviations from a tridimensional NetCDF variable
yes
compute means and standard-deviations from a fourdimensional NetCDF variable
yes
compute means and standard-deviations from a tridimensional NetCDF variable with missing values
yes
compute means and standard-deviations from a fourdimensional NetCDF variable with missing values
yes
compute univariate statistics from a tridimensional NetCDF variable
yes
compute univariate statistics from a fourdimensional NetCDF variable
yes
compute univariate statistics from a tridimensional NetCDF variable with missing values

NCSTAT operators for composite analysis:

NCSTAT operators for composite analysis
Operator
OpenMP
Description
yes
compute a composite analysis from a tridimensional NetCDF variable
yes
compute a composite analysis from a fourdimensional NetCDF variable
yes
compute a composite analysis from a tridimensional NetCDF variable with missing values

NCSTAT operators for transforming and time averaging of time series:

NCSTAT operators for transforming and time averaging of time series
Operator
OpenMP
Description
no
transform multichannel time series from a tridimensional NetCDF variable
no
transform multichannel time series from a fourdimensional NetCDF variable
no
transform multichannel time series from a tridimensional NetCDF variable with missing values
no
compute time-averages of multichannel time series from a tridimensional NetCDF variable
no
compute time-averages of multichannel time series from a fourdimensional NetCDF variable
no
compute time-averages of multichannel time series from a tridimensional NetCDF variable with missing values

NCSTAT operators for compression/decompression:

NCSTAT operators for compression/decompression
Operator
OpenMP
Description
no
pack a tridimensional NetCDF variable
no
unpack a tridimensional NetCDF variable

NCSTAT operators for computing time series and cross-sections:

NCSTAT operators for computing time series
Operator
OpenMP
Description
yes
compute a time series from a tridimensional NetCDF variable
yes
compute a time series from a fourdimensional NetCDF variable
yes
compute a time series from a tridimensional NetCDF variable with missing values
yes
compute a index time series from two unidimensional NetCDF variables
yes
compute a cross-section from a tridimensional NetCDF variable
yes
compute a cross-section from a fourdimensional NetCDF variable
yes
compute a cross-section from a tridimensional NetCDF variable with missing values

NCSTAT operators for decomposing time series:

NCSTAT operators for decomposing time series
Operator
OpenMP
Description
no
decompose a time series from a unidimensional NetCDF variable by the STL method
yes
decompose multichannel time series from a tridimensional NetCDF variable by the STL method
yes
decompose multichannel time series from a fourdimensional NetCDF variable by the STL method
no
estimate a trend from a unidimensional NetCDF variable by the LOESS method
yes
estimate multichannel trend from a tridimensional NetCDF variable by the LOESS method
yes
estimate multichannel trend from a fourdimensional NetCDF variable by the LOESS method

NCSTAT operators for filtering time series:

NCSTAT operators for filtering time series
Operator
OpenMP
Description
no
filter a time series from a unidimensional NetCDF variable in a selected frequency band by Lanczos filtering
yes
filter multichannel time series from a tridimensional NetCDF variable in a selected frequency band by Lanczos filtering
yes
filter multichannel time series from a fourdimensional NetCDF variable in a selected frequency band by Lanczos filtering
no
filter a time series from a unidimensional NetCDF variable in a selected frequency band by linear symmetric filtering
yes
filter multichannel time series from a tridimensional NetCDF variable in a selected frequency band by linear symmetric filtering
yes
filter multichannel time series from a fourdimensional NetCDF variable in a selected frequency band by linear symmetric filtering
no
estimate the transfer function of a Lanczos or linear symmetric filter

NCSTAT operators for correlation and regression analysis:

NCSTAT operators for correlation and regression analysis
Operator
OpenMP
Description
yes
compute correlation and regression from an index time series and a unidimensional NetCDF variable
yes
compute correlation and regression from an index time series and a tridimensional NetCDF variable
yes
compute correlation and regression from an index time series and a fourdimensional NetCDF variable
yes
compute correlation and regression from an index time series and a tridimensional NetCDF variable with missing values
no
compute trend and regression from an index time series and a unidimensional NetCDF variable
yes
compute trend and regression from an index time series and a tridimensional NetCDF variable
yes
compute trend and regression from an index time series and a fourdimensional NetCDF variable

NCSTAT operators for multivariate statistics:

NCSTAT operators for multivariate statistics
Operator
OpenMP
Description
yes
compute a Principal Component Analysis (PCA) from a tridimensional NetCDF variable
yes
compute a Principal Component Analysis (PCA) from a fourdimensional NetCDF variable
yes
compute a Principal Component Analysis (PCA) from a tridimensional NetCDF variable with missing values
yes
compute a Maximum Covariance Analysis (MCA) from two tri- or fourdimensional NetCDF variables
no
compute a PCA or MCA approximation of a tridimensional NetCDF variable
no
compute a PCA or MCA approximation of a fourdimensional NetCDF variable
yes
compute a PCA or MCA projection of a tridimensional NetCDF variable
yes
compute a PCA or MCA projection of a fourdimensional NetCDF variable

NCSTAT operators for power spectrum analysis:

Operator
OpenMP
Description
no
compute a Power Spectrum Analysis from a unidimensional NetCDF variable
no
compute Power Spectrum Density ratios from Power Spectrum Analyses of two unidimensional NetCDF variables