I'm trying to use for the first time the intel trace collector on a cluster machine (with the intel xe 2013 and the itac 8.1.2.033).
I built my program in the standard production mode and in the bash script submitted to the PBS scheduler there were the following commands:
#PBS -l select=14:ncpus=16:mem=120gb:mpiprocs=16
module load intel/cs-xe-2013
mpirun -trace [path]/my_program [arguments]
Reading the log I saw that the program reached its end 4 hours ago, but the job is still running: a .prot file with 17064 bytes and a .sft file with 0 byte have been written.
Did I do something wrong?
Thank you in advance
Thanks for getting in touch. Based on what you've provided here, you're following all the correct steps. Sourcing itacvars.sh and using the -trace option are all correct.
Furthermore, it seems like the Intel Trace Collector starts up successfully as evidenced by the presence of the .prot file (which gets created first when the collector loads).
Now, the majority of overhead for the Intel Trace Collector happens towards the end of your application run. After you make the MPI_Finalize() in your application, the collector starts writing out all the trace data it has collected. Depending on how big your application is, or how much MPI communication you've done, it could take a while. What's the usual runtime of your program?
I would suggest doing a smaller run: either with a smaller dataset or over smaller number of ranks. Let me know how that goes.
I tried a smal job with 16 processes and the behavior is the same. Is there a way to check the correct installation of this tool? I'm only a user of the cluster, but I can report this issue to the support team...
Yes, let's try with a completely different application to make sure everything is installed correctly. Since you seem to have Intel MPI installed, let's compile one of the Hello World applications shipped in the <intel_mpi_install_dir>/test directory and run that with the "-trace" option. Ideally, use the same submission script you have for your existing application but simply replace the executable.
Let me know how that goes.
I used the intel xe 2015 and the hello world test succeeded. The support team of cluster told me that on centos 7 the intel xe 2013 was not supported, maybe that was the problem.
Thanks for giving this a try. So a smaller application creates a trace file but running your application on a single machine still exhibits the problem.
Can you clarify what you mean by "support team of cluster"? Did you submit an issue via the Intel Premier Support site? If yes, let me know the number and I can look it up.
It's possible that the lack of OS is the culprit here. Intel Parallel Studio XE 2013 is a pretty old piece of software so that's not surprising. Do you have the option to upgrade to something newer? Our latest is 2015 Update 3. You can grab it from the Intel Registration Center.
for "support team of cluster" I mean the people that manage the issues about hardware and software configurations. I'm a simple user of cluster, if I have a problem I can only report it to them, I cannot install the intel compiler suite on my own. I wrote here to be sure that the procedure for the intel collector analyzer was right.
Anyway I have some news about my profiling with the intel xe 2015 (itac version 9.0.2.045). The program unexpectedly finished and in the error log there was a problem with the data flush into the /tmp folder that has not enough space.
So I ran again the job setting the variable VT_FLUSH_PREFIX as a directory (shared by all nodes) where there is enough space. Unfortunately in this case the job finished, but the stf has not been created and at the end of the error log there is the following lines:
 Intel(R) Trace Collector INFO: 61.50MB trace data in RAM + 34442.81MB trace data flushed = 34504.31MB total
 Intel(R) Trace Collector INFO: 52.19MB trace data in RAM + 25951.06MB trace data flushed = 26003.25MB total
 Intel(R) Trace Collector INFO: 32.69MB trace data in RAM + 34471.62MB trace data flushed = 34504.31MB total
 Intel(R) Trace Collector INFO: 34.81MB trace data in RAM + 34469.50MB trace data flushed = 34504.31MB total
 Intel(R) Trace Collector INFO: 16.06MB trace data in RAM + 24987.06MB trace data flushed = 25003.12MB total
 Intel(R) Trace Collector INFO: 28.94MB trace data in RAM + 34475.38MB trace data flushed = 34504.31MB total
 Intel(R) Trace Collector INFO: 54.25MB trace data in RAM + 25949.00MB trace data flushed = 26003.25MB total
 Intel(R) Trace Collector INFO: 33.44MB trace data in RAM + 34470.88MB trace data flushed = 34504.31MB total
 Intel(R) Trace Collector INFO: Writing tracefile test.mpi.stf in [path of test.mpi]
node008.15576Exhausted 1048576 MQ irecv request descriptors, which usually indicates a user program error or insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576
The strange thing is that if I check the directory that I set as VT_FLUSH_PREFIX it is empty (it was empty also during the job) whereas the data flushed seems to be many GB.
Do you have any suggestion?
 Intel(R) Trace Collector INFO: Writing tracefile test.mpi.stf
This starts after MPI_Finalize() is called. If your program exits after that immediately file may not be written.
Atleast I found this is a problem in my case for small program.