Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

General question of Intel Trace Analyzer and Collector

Chi__YongQiang
Beginner
1,693 Views

Hi:

I'm new to here and may need to ask some question about this tool:  Intel Trace Analyzer and Collector 

Is this software intel Xeno exclusive or it can be multi-platform.

Also if our company want to purchase this tool, where should I ask?

 

Many thanks

Chi

0 Kudos
18 Replies
Gergana_S_Intel
Employee
1,694 Views

Hi Chi,

Thanks for getting in touch.  You should be able to run the Intel Trace Analyzer and Collector on any x86-based platform but it is meant to provide more detailed information on Xeon-based systems.  The better question will be whether the Intel Trace Analyzer and Collector supports other MPI implementations aside from the Intel MPI Library.  The answer is yes, if those implementations are MPICH-based.  More details are available here, in our User's Guide.

The Intel Trace Analyzer and Collector is only available through the Intel Parallel Studio XE Cluster Edition bundle.  Here are more details on how to evaluate or purchase the bundle.

Regards,
~Gergana

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Gergana S. (Intel) wrote:

Hi Chi,

Thanks for getting in touch.  You should be able to run the Intel Trace Analyzer and Collector on any x86-based platform but it is meant to provide more detailed information on Xeon-based systems.  The better question will be whether the Intel Trace Analyzer and Collector supports other MPI implementations aside from the Intel MPI Library.  The answer is yes, if those implementations are MPICH-based.  More details are available here, in our User's Guide.

The Intel Trace Analyzer and Collector is only available through the Intel Parallel Studio XE Cluster Edition bundle.  Here are more details on how to evaluate or purchase the bundle.

Regards,
~Gergana

 

Hi Gergana:

 

Thank you for your reply, I'll take a look.

 

Regards

 

Chi

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Gergana S. (Blackbelt) wrote:

Hi Chi,

Thanks for getting in touch.  You should be able to run the Intel Trace Analyzer and Collector on any x86-based platform but it is meant to provide more detailed information on Xeon-based systems.  The better question will be whether the Intel Trace Analyzer and Collector supports other MPI implementations aside from the Intel MPI Library.  The answer is yes, if those implementations are MPICH-based.  More details are available here, in our User's Guide.

The Intel Trace Analyzer and Collector is only available through the Intel Parallel Studio XE Cluster Edition bundle.  Here are more details on how to evaluate or purchase the bundle.

Regards,
~Gergana

 

Hi Gergana:

 

Sorry to trouble you again.

 

I test it with the compatibility tool and it says OK. I have installed the trail-version of intel Trace Analyzer and Collector on our AMD system and use mvapich2.3.1, which the installation seems successful. Yet when I run the software with the trace-analyzer-collector, it reported an error message such as:

"

/codes/commercial/Intel/Parallel_Studio_XE/install/compilers_and_libraries_2019.3.199/linux/mpi/intel64/bin/mpirun: line 103: 31339 Floating point exception(core dumped) mpiexec.hydra "$@" 0<&0

"

It that something related to the installation or I missed other items to make it works?

 

Regards

 

Chi

0 Kudos
Anatoliy_R_Intel
Employee
1,694 Views

Hi, Chi.

Could you provide your command line and full output that you see?

--

Best regards, Anatoliy.

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Anatoliy R. (Intel) wrote:

Hi, Chi.

Could you provide your command line and full output that you see?

--

Best regards, Anatoliy.

Hi Anatoliy:

Here is what I used:

source /codes/commercial/Intel/Parallel_Studio_XE/install/parallel_studio_xe_2019.3.062/bin/psxevars.csh

setenv FI_PROVIDER tcp

setenv VT_CONFIG VTconfig.conf

mpirun -trace -hostfile local.host.hyp -np 2 pimpleFoam -parallel

and it returns:


/codes/commercial/Intel/Parallel_Studio_XE/install/compilers_and_libraries_2019.3.199/linux/mpi/intel64/bin/mpirun: line 103: 89898 Floating point exception(core dumped) mpiexec.hydra "$@" 0<&0
 

Regards

0 Kudos
Anatoliy_R_Intel
Employee
1,694 Views

Chi, do you observe this error only with -trace option?

Could you also check command lines bellow and provide the output:

1. I_MPI_DEBUG=5 mpirun -verbose -trace -hostfile local.host.hyp -np 2 pimpleFoam -parallel

2. FI_PROVIDER=sockets mpirun -trace -hostfile ...

3. mpirun -trace -hostfile local.host.hyp -np 2 hostname

4. I_MPI_HYDRA_TOPOLIB=ipl mpirun -trace -hostfile ...

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Anatoliy R. (Intel) wrote:

Chi, do you observe this error only with -trace option?

Could you also check command lines bellow and provide the output:

1. I_MPI_DEBUG=5 mpirun -verbose -trace -hostfile local.host.hyp -np 2 pimpleFoam -parallel

2. FI_PROVIDER=sockets mpirun -trace -hostfile ...

3. mpirun -trace -hostfile local.host.hyp -np 2 hostname

4. I_MPI_HYDRA_TOPOLIB=ipl mpirun -trace -hostfile ...

 

Hi Anatoliy:

Thanks for the reply.

This error message shows after I 'source' the intel-trace-analyzer. I think even without the '-trace' option it still shows the same message.

Also, all the command lines / environment settings you provided I tried comes back with the same error message.

Maybe I missed something in the installation?

Many thanks

 

Chi

0 Kudos
Anatoliy_R_Intel
Employee
1,694 Views

Chi, when you source psxevars.csh, you source and Intel MPI and Trace Analyzer and Collector. If it fails without '-trace' option then it is something wrong with Intel MPI. 

Please check that `mpirun -n 1 hostname` also fails.

I don't think that something wrong in the installation. 

 

HYDRA_BSTRAP_XTERM=1 variable can help to show where we get Floating point exception.

Please set this variable and run mpirun. After that you will see xterm windows with launched gdb. Then type `run` in each windows and you will see Floating point exception in one of the windows. Then type `bt`, it will show backtrace. Please send me this backtrace.

0 Kudos
Shaw__Daniel
Beginner
1,694 Views

Anatoliy R. (Intel) wrote:

Chi, when you source psxevars.csh, you source and Intel MPI and Trace Analyzer and Collector. If it fails without '-trace' option then it is something wrong with Intel MPI. 

Please check that `mpirun -n 1 hostname` also fails.

I don't think that something wrong in the installation. 

 

HYDRA_BSTRAP_XTERM=1 variable can help to show where we get Floating point exception.

Please set this variable and run mpirun. After that you will see xterm windows with launched gdb. Then type `run` in each windows and you will see Floating point exception in one of the windows. Then type `bt`, it will show backtrace. Please send me this backtrace.

Hi,

I'm working with Chi on the above. 

Having sourced the psxevars.csh file:

 - mpirun -n 1 hostname does indeed fail

 - With that env variable set, the command still fails, but doesn't seem to launch any xterm windows? (Checked DISPLAY as able to launch other X based windows from that terminal)

Do you have any further guidance? 

Thanks,

Dan

0 Kudos
Anatoliy_R_Intel
Employee
1,694 Views

Hi, Daniel.

Yes, please run the command lines bellow and provide me all output. 

1. mpirun -verbose -n 1 hostname

2. HYDRA_BSTRAP_VALGRIND=1 mpirun -n 1 hostname

3. ${I_MPI_ROOT}/intel64/bin/cpuinfo

 

--

Best regards, Anatoliy.

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Anatoliy R. (Intel) wrote:

Hi, Daniel.

Yes, please run the command lines bellow and provide me all output. 

1. mpirun -verbose -n 1 hostname

2. HYDRA_BSTRAP_VALGRIND=1 mpirun -n 1 hostname

3. ${I_MPI_ROOT}/intel64/bin/cpuinfo

 

--

Best regards, Anatoliy.

 

Hi Anatoliy:

Thanks for the reply.

 

They all come back the same error message:

'/codes/commercial/Intel/Parallel_Studio_XE/install/compilers_and_libraries_2019.3.199/linux/mpi/intel64/bin/mpirun: line 103: 26863 Floating point exception(core dumped) mpiexec.hydra "$@" 0<&0'

 

Even the 'cpuinfo' command line comes back as:

'Floating exception (core dumped)'

 

Regards

 

Chi

0 Kudos
Anatoliy_R_Intel
Employee
1,694 Views

Hi,

 

It seems this is the same error like in https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/807359 and https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/812229.

 

As a workaround you can use legacy hydra process manager. 

Please try to run `PATH=${I_MPI_ROOT}/intel64/bin/legacy:${PATH} mpiexec.hydra ...`

--

Best regards, Anatoliy.

0 Kudos
Anatoliy_R_Intel
Employee
1,694 Views

Hi, Chi

 

Could you also run lscpu?

--

Best regards, Anatoliy.

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Anatoliy R. (Intel) wrote:

Hi, Chi

 

Could you also run lscpu?

--

Best regards, Anatoliy.

 

Hi Anatoliy:

 

The lscpu does work:

 

It shows:

 

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    1
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            23
Model:                 1
Model name:            AMD EPYC 7281 16-Core Processor
Stepping:              2
CPU MHz:               2100.000
CPU max MHz:           2100.0000
CPU min MHz:           1200.0000
BogoMIPS:              4200.04
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             64K
L2 cache:              512K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3
NUMA node1 CPU(s):     4-7
NUMA node2 CPU(s):     8-11
NUMA node3 CPU(s):     12-15
NUMA node4 CPU(s):     16-19
NUMA node5 CPU(s):     20-23
NUMA node6 CPU(s):     24-27
NUMA node7 CPU(s):     28-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb hw_pstate sme retpoline_amd vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr ibpb arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
 

 

Regards

 

Chi

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Anatoliy R. (Intel) wrote:

Hi,

 

It seems this is the same error like in https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog... and https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog....

 

As a workaround you can use legacy hydra process manager. 

Please try to run `PATH=${I_MPI_ROOT}/intel64/bin/legacy:${PATH} mpiexec.hydra ...`

--

Best regards, Anatoliy.

 

Hi:

 

That is working now.

 

Many Thanks.

 

Chi

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Hi Anatoliy:

 

Sorry to trouble you again, since the intel trace collector is running now, I think I got another issue now (should be a separate issue)

The intel tracer seems not able to stop, my program (OpenFOAM solver) stopped but the tracer seems to continue to run and writing files.

 

I change the VERBOSE=3 so that's what it shows in the log file before the program started running:

'

[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": timers...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": filters...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": logging...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": grouping...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": symbols...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": counter...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": statistics...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": PC tracing...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": source code locations...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": function tracing...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": file IO...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": frames...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": API...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": function tracing...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": requests...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": plugin...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": MPI Extension...
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: "init": done
[0 Wed Jun 12 12:50:09 2019] Intel(R) Trace Collector INFO: initialization completed successfully

'

 

And here is when the program stopped but the intel tracer continues to run:

'

Finalising parallel run
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "logging": requests...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "logging": internal info...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "logging": communicators...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "logging": done
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: wall clock: 1.560340e+09s - 1.560341e+09s since Epoch
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: maximum number of events on the same clock tick was 8: insufficient clock resolution
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: 1801016KB of total raw trace data
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "prepare": communicators...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "prepare": grouping...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "prepare": frames...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "prepare": done
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: Writing tracefile simplecFoam.stf in /nfs/elm/bigdisk/IT/PROJECTS/TRACING_APR19/mBikeTempTestCase/CASES/mBikeBenchmarkR049/0.0_0.0_Y0.0_R0.0_S0.0_V0.000_C32.0_Sq0.0_0.0_0.0_8.0_Sc12.8
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "prepare 2": source code locations...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "prepare 2": communicators...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "prepare 2": logging...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "prepare 2": done
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": communicators...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": grouping...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": statistics...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": source code locations...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": file IO...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": global operations...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": requests...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": symbols...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": counter...
[0 Wed Jun 12 13:08:14 2019] Intel(R) Trace Collector INFO: "write header": done

'

 

It keeps running hours after the program stopped and I assume something is nor right.

 

Regards

 

Chi

0 Kudos
Anatoliy_R_Intel
Employee
1,694 Views

Hi, Chi. 

 

Could you try to run with `export VT_KEEP_RAW_EVENTS=on` variable?

--

Best regards, Anatoliy.

0 Kudos
Chi__YongQiang
Beginner
1,694 Views

Anatoliy R. (Intel) wrote:

Hi, Chi. 

 

Could you try to run with `export VT_KEEP_RAW_EVENTS=on` variable?

--

Best regards, Anatoliy.

 

Hi Anatoliy:

 

Thanks for the reply, the same issue disappeared today. Maybe it is our side of mistake in some network setting or the OS update.

I'll update you if that happens again.

 

Again, your time spent on this subject help us a lot to understand how this software works.

 

Best wishes

 

Chi

0 Kudos
Reply