Solved: Mac OS X 10.9.1 + MPICH2/3 + Intel Fortran 14.0.1 + Intel MKL 11.1 cause segmentation fault while using Data-Fitting Functions

oacikgoz · ‎01-16-2014

Earlier versions of Intel MKL (such as 10.x) work perfectly fine. I haven't found a single version of MPICH that does not crash the following code. The command line used to compile the code is

mpif90 -fast -I/opt/intel/composer_xe_2013_sp1.1.103/mkl/include main.f90 -L/opt/intel/composer_xe_2013_sp1.1.103/mkl/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lm -o main

Below is the minimal code that can be used to reproduce the error. I'm using 24 threads with mpiexec on the same computer. Using 4/8 threads only "occasionally" crashes the code. MPICH is compiled using clang (gcc) and ifort, since I do not have intel c++ compiler.

INCLUDE "mkl_df.f90"

PROGRAM main

   USE MPI
   USE MKL_DF_TYPE
   USE MKL_DF

   IMPLICIT NONE

   INTEGER :: ierror, rank,nproc,splinestat

   TYPE(DF_TASK) :: taskval
   REAL(8), ALLOCATABLE, DIMENSION(:) :: V
   REAL(8), ALLOCATABLE, DIMENSION(:) :: inc
   INTEGER :: nIgrids,ny

   CALL MPI_INIT(ierror)
   CALL MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierror)
   CALL MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierror)

   nIgrids=120
   ny=1
   ALLOCATE(inc(nIgrids))
   ALLOCATE(V(nIgrids))

inc=0.0D0
V=0.0D0

splinestat = dfdnewtask1d(taskval,nIgrids,inc,DF_QUASI_UNIFORM_PARTITION,ny,V,DF_MATRIX_STORAGE_COLS)
WRITE(*,*) 1,rank,splinestat

CALL MPI_BARRIER(MPI_COMM_WORLD,ierror)

CALL MPI_FINALIZE(ierror)

END PROGRAM main

VictoriyaS_F_Intel · ‎01-17-2014

Hello oacikgoz,

The reason of the crash is following: 'inc' array is initialized with zeroes before calling dfdnewtask1d. MKL Data Fitting has a requirement that array that holds partition values ('inc' in your code sample) should be ordered such that x(i) < x(i+1) for each i = 1 ... nx-1.

Here is a citation from MKL reference giude. See df?newatsk1d function description:

If partition is non-uniform or quasi-uniform, the array should contain nx ordered values.

Best regards,

Victoriya

View solution in original post

Ying_H_Intel · ‎01-16-2014

Hi

Could you please check if the code can run if call other function, for example,

status = vslsconvnewtask1d(task, mode, xshape, yshape, zshape)

Or can it run if without MPI codec. like the code in

/opt/intel/composer_xe_2013_sp1.1.103/mkl_example/datafittingf/ ?

I haven't such system at hand to test, so want to check if it is df issue, or mpi+df?

Best Regards,

Ying

VictoriyaS_F_Intel · ‎01-17-2014

Hello oacikgoz,

The reason of the crash is following: 'inc' array is initialized with zeroes before calling dfdnewtask1d. MKL Data Fitting has a requirement that array that holds partition values ('inc' in your code sample) should be ordered such that x(i) < x(i+1) for each i = 1 ... nx-1.

Here is a citation from MKL reference giude. See df?newatsk1d function description:

If partition is non-uniform or quasi-uniform, the array should contain nx ordered values.

Best regards,

Victoriya

VictoriyaS_F_Intel · ‎01-17-2014

Please see the definition of the partition in the "Matematical Conventions" section:

Concept:

Partition of interpolation interval [a, b] , where

x_i denotes breakpoints.
[x_i, x_i+1) denotes a sub-interval (cell) of size Δ_i=x_i+1-x_i .

Mathematical Notation:

{x_i}_i=1,...,n, where a = x₁ < x₂<... <x_n = b

oacikgoz · ‎01-18-2014

Victoriya Kardakova (Intel) wrote:

Hello oacikgoz,

The reason of the crash is following: 'inc' array is initialized with zeroes before calling dfdnewtask1d. MKL Data Fitting has a requirement that array that holds partition values ('inc' in your code sample) should be ordered such that x(i) < x(i+1) for each i = 1 ... nx-1.

Here is a citation from MKL reference giude. See df?newatsk1d function description:

If partition is non-uniform or quasi-uniform, the array should contain nx ordered values.

Best regards,

Victoriya

Thanks for getting back to me! Ordering the x vector as Victoria suggested fixed the problem, that was very helpful.

But instead of crashing the code, initialization routine should ideally return an error code. I think this is something to fix/implement. Don't you agree?

VictoriyaS_F_Intel · ‎01-20-2014

But instead of crashing the code, initialization routine should ideally return an error code. I think this is something to fix/implement. Don't you agree?

Yes, this is a valid remark. Previously we had the partition ordering check in the library, but it was removed by the reasons of performance. Because when the partition contains millions of points, it's time consuming to check whether the points are properly ordered.

What do you think about the following option:

to have the checks in initialization function in order to have proper error code reporting, as you have suggested, and
to have an ability to switch off those checks by setting DF_CHECK_FLAG parameter to DF_DISABLE_CHECK_FLAG. This could help to speed up things when you are certain about the input parameters?

Here is the description of DF_CHECK_FLAG (see dfieditval):

Use DF_CHECK_FLAG for val_attr in order to control validation of parameters of Data Fitting computational routines such as Construct1d, Interpolate1D/InterpolateEx1d, and SearchCells1D/ SearchCellsEx1D, which can perform better with a small number of interpolation sites or integration limits (fewer than one dozen). The default mode, with checking of parameters enabled, should be used as you develop a Data Fitting-based application. After you complete development you can disable parameter checking in order to improve the performance of your application.

oacikgoz · ‎01-20-2014

In my actual code, I keep changing vector x during runtime (while preserving the memory address of course). That is to say, I initialize data-fitting only once but keep changing x and y vectors and re-run interpolation routine. In my case, ordering check on initialization is completely unnecessary. Having the option of turning it off would be a welcome addition.

But I'm confused, is the above suggestion a proposed addition to the next version of MKL, or is it already implemented? (DF_CHECK_FLAG that is)

VictoriyaS_F_Intel · ‎01-20-2014

DF_CHECK_FLAG functionality is already implemented and available since MKL 11.0.2 release.

Setting DF_CHECK_FLAG parameter to DF_DISABLE_CHECK_FLAG will disable some checks in Data Fitting functions. For example, the checks that input pointers aren't NULL and other parameters' validity checks.

Please note that the user need to be ensured that all the input parameters are valid when using DF_CHECK_FLAG functionality, because library error reporting is affected when DF_DISABLE_CHECK_FLAG is set.

We could use this approach to implement the partition ordering check: the check will be enabled by default, but you will have an ability to disable the check to speed up the computations using DF_CHECK_FLAG. Will this be an appropriate option for you?