I am new to this forum and intel parallel studio and this may be a very newbie question. I got a university project on a parallel sorting algorithm which i am trying to tune for both MPI performance and threading performance. I have a dual core computer with hyper threading (so 2 physical cores and 4 threads) and i can usually run mpirun -np 4 without any issues to get a quick performance check of my code. When i am trying to run my application with intel parallel studio vtune amplifier like this:
aps .<app name> <parameters>, i get an error like this:
aps Error: [xine:06003] *** Process received signal ***
[xine:06003] Signal: Segmentation fault (11)
[xine:06003] Signal code: Address not mapped (1)
[xine:06003] Failing at address: 0x44000098
[xine:06003] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fcec42c8390]
[xine:06003] [ 1] /usr/lib/libmpi.so.12(MPI_Comm_rank+0x3e)[0x7fcec474c2de]
[xine:06003] [ 2] /opt/intel/vtune_amplifier_2018.2.0.551022/lib64/libmps.so(stat_init_post+0x6a)[0x7fcec4acf139]
[xine:06003] [ 3] /opt/intel/vtune_amplifier_2018.2.0.551022/lib64/libmps.so(MPI_Init+0xe0)[0x7fcec4a65644]
[xine:06003] [ 4] ./terasort[0x40109b]
[xine:06003] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fcec3f0d830]
[xine:06003] [ 6] ./terasort[0x401269]
[xine:06003] *** End of error message ***
aps Error: Cannot run the collection.
When i run the app using mpirun, it runs without issues on both my local computer and university HPC cluster. I think i am missing something really basic here. would really appreciate the help. Thanks!
please specify what MPI you used in the experiments.
From my side it seems that you use OpenMPI and in this case the crash is expected because currently APS doesn't support OpenMPI (no ABI compatibility).
In this case you can leave only hardware counters in your collection : --collection-mode=hwc
if you don't need to profile your application on large scale, you can try to use amplxe-cl (probably advanced hotspots with stacks or concurrency analysis) with --trace-mpi option.
One more comment, I've made some checks and found that --collection-mode=hwc won't help you. If you're interested in APS hardware counters you'll need to make a shell script to wrap your application and unset LD_PRELOAD inside it.