- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In attempting to execute 2018's cluster sparse solve examples, I ran across a segmentation fault right after completion of "computing non-zeros for LL^T factorization."
Compiled manually using 64 bit and 32 bit integers. Linker options defined using the MKL Link Line Advisor tool. The baked in make file fails do to errors in execution.
- GNU Compiler
- MPICH2
- OpenMP threading
- System is Intel(R) Xenon CPU running Linux 64 bit.
Compile Line:
mpicc -g -L${MKLROOT}/lib/intel64 -o cl_solvr_unsym_complex cl_solver_unsym_complex_c.c -Wl,--no-as-needed -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -DMKL_ILP64 -m64 -I${MKLROOT}/include
I ran under valgrind and gbd with little success at uncovering the issue. Below is the message I got from the valgrind run.
Percentage of computed non-zeros for LL^T factorization 15 % 95 % 100 % ==22595== Jump to the invalid address stated on the next line ==22595== at 0x0: ??? ==22595== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==22595== ==22595== ==22595== Process terminating with default action of signal 11 (SIGSEGV) ==22595== Bad permissions for mapped region at address 0x0 ==22595== at 0x0: ??? ==22595==
This post, https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/303093, is the closest post I could find to my issue. In light of Michael Chuvelev's response, I checked my linking and compile and ended up with the same results. In a last attempt before posting here, I linked all relevant libraries under MKL's library folder that were using the right integer precision. That too, did not work.
Is there a fix in the works to this sort of error or am I missing a critical step?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Taylor,
Can you provide reproducer to check issue on our side?
Thanks,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
Of course. The source code is in Intel's MKL install directory example zip files. Specifically, examples_cluster_c.tgz. Files cl_solver_unsym_distr_c.c and cl_solver_unsym_complex_c.c are the two I have run. The same comes up with the corresponding files in the examples_cluster_f.tgz.
Thanks,
Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I checked this case on my side, RH7, ILP64 mode,
$ make
mpicc -g -L/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64 cl_solver_unsym_complex_c.c \
-Wl,--no-as-needed -Wl,--start-group \
/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_intel_ilp64.a \
/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_intel_thread.a \
/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_core.a \
/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.a \
/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_scalapack_ilp64.a \
-Wl,--end-group -liomp5 -lpthread -lm -ldl -DMKL_ILP64 -m64 -I/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/include
$ mpicc -v , mpigcc for the Intel(R) MPI Library 2018 for Linux*
mpiexec -n 2 ./a.out
here is the part of the output for brevity...
=== CPARDISO: solving a complex nonsymmetric system ===
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON
..........
...........
The solution of the system is:
x [0] = 0.174768 0.021177
x [1] = -0.176471 -0.294118
x [2] = 0.049322 0.029598
x [3] = 0.042981 -0.031409
x [4] = -0.120859 -0.170860
x [5] = -0.369347 -0.000861
x [6] = 0.091610 0.125362
x [7] = 0.223941 0.139428
Relative residual = 5.551115e-17
TEST PASSED
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mpiexec -version
Intel(R) MPI Library for Linux* OS, Version 2018 Build 20170713 (id: 17594)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady,
Thank you for checking the case on your end. Below are the versions I am working with.
$ mpiexec -version HYDRA build details: Version: 3.0.4 Release Date: Wed Apr 24 10:08:10 CDT 2013 CC: gcc CXX: g++ F77: ifort F90: ifort $ mpicc -v mpicc for MPICH version 3.0.4 Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper Target: x86_64-redhat-linux $ gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper Target: x86_64-redhat-linux Thread model: posix gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
Here is the runtime output I get.
$ mpiexec -n 2 cl_solvr_unsym_complex === CPARDISO: solving a complex nonsymmetric system === 1-based array indexing is turned ON CPARDISO double precision computation is turned ON METIS algorithm at reorder step is turned ON Scaling is turned ON Matching is turned ON Summary: ( reordering phase ) ================ Statistics: =========== Parallel Direct Factorization is running on 2 MPI and 32 OpenMP per MPI process < Linear system Ax = b > number of equations: 8 number of non-zeros in A: 20 number of non-zeros in A (%): 31.250000 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> number of supernodes: 5 size of largest supernode: 4 number of non-zeros in L: 27 number of non-zeros in U: 7 number of non-zeros in L+U: 34 Reordering completed ... Percentage of computed non-zeros for LL^T factorization 95 % 100 % =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 139 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions
Thank you for your time,
Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Taylor,
Can you run
cat /proc/cpuinfo
to check what processor do you use? I run test on my side and it works correctly:
HYDRA build details:
Version: 3.1.4
mpicc -O2 -g -qopenmp -D__cpardiso__ -qopenmp -DMKL_ILP64 -I/nfs/pdx/proj/mkl/MKLQA/mkl_release/mkl2018_20170720/__release_lnx/mkl/include -o cpardiso.exe ./cl_solver_unsym_complex_c.c -Wl,--no-as-needed -Wl,--start-group /nfs/pdx/proj/mkl/MKLQA/mkl_release/mkl2018_20170720/__release_lnx/mkl/lib/intel64/libmkl_intel_ilp64.a /nfs/pdx/proj/mkl/MKLQA/mkl_release/mkl2018_20170720/__release_lnx/mkl/lib/intel64/libmkl_intel_thread.a /nfs/pdx/proj/mkl/MKLQA/mkl_release/mkl2018_20170720/__release_lnx/mkl/lib/intel64/libmkl_core.a /nfs/pdx/proj/mkl/MKLQA/mkl_release/mkl2018_20170720/__release_lnx/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lm -lpthread -qopenmp -lifcore
export OMP_NUM_THREADS=16; export KMP_AFFINITY=compact,granularity=fine; mpiexec -n 2 ./cpardiso.exe
=== CPARDISO: solving a complex nonsymmetric system ===
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000005 s
Time spent in reordering of the initial matrix (reorder) : 0.004953 s
Time spent in symbolic factorization (symbfct) : 0.004758 s
Time spent in data preparations for factorization (parlist) : 0.000001 s
Time spent in allocation of internal data structures (malloc) : 0.000111 s
Time spent in additional calculations : 0.001195 s
Total time spent : 0.011023 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 16 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 8
number of non-zeros in A: 20
number of non-zeros in A (%): 31.250000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 5
size of largest supernode: 4
number of non-zeros in L: 27
number of non-zeros in U: 7
number of non-zeros in L+U: 34
Reordering completed ...
Percentage of computed non-zeros for LL^T factorization
95 % 100 %
=== CPARDISO: solving a complex nonsymmetric system ===
Single-level factorization algorithm is turned ON
Summary: ( factorization phase )
================
Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 0.450965 s
Time spent in allocation of internal data structures (malloc) : 0.000016 s
Time spent in additional calculations : 0.000002 s
Total time spent : 0.450983 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 16 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 8
number of non-zeros in A: 20
number of non-zeros in A (%): 31.250000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 5
size of largest supernode: 4
number of non-zeros in L: 27
number of non-zeros in U: 7
number of non-zeros in L+U: 34
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000001
Factorization completed ...
Solving system...
=== CPARDISO: solving a complex nonsymmetric system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.326002 s
Time spent in additional calculations : 0.670013 s
Total time spent : 0.996015 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 16 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 8
number of non-zeros in A: 20
number of non-zeros in A (%): 31.250000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 5
size of largest supernode: 4
number of non-zeros in L: 27
number of non-zeros in U: 7
number of non-zeros in L+U: 34
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000001
The solution of the system is:
x [0] = 0.174768 0.021177
x [1] = -0.176471 -0.294118
x [2] = 0.049322 0.029598
x [3] = 0.042981 -0.031409
x [4] = -0.120859 -0.170860
x [5] = -0.369347 -0.000861
x [6] = 0.091610 0.125362
x [7] = 0.223941 0.139428
Relative residual = 7.343435e-17
TEST PASSED
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz stepping : 7 microcode : 0x710 cpu MHz : 1291.164 cache size : 20480 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2a pic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat pln pts dtherm tpr_shadow vnmi flexpriority ept vpid xsave opt bogomips : 5199.65 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual
Thanks,
Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hmm, this is sandyBridge, but i checked the same problem on IvyBridge and see no problems.
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 1
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
Stepping: 4
CPU MHz: 1699.140
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 5586.45
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-9
NUMA node1 CPU(s): 10-19
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady and Alex,
Have either of you ran into a case where the executable compiles and links, but crashes when it tries to jump to a NULL address? If so, do you know the cause of the bad address being used and how to fix the issue?
-Taylor

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page