链接已复制
5 回复数
Hi Matt,
I'm not familiar with Slurm queuing system... Iguess that isa utility for resource management for cluster system.
VTune Amplifier XE can only collect perfomancedata on one node of cluster system, the user may install the product again on other nodes.
If you run VTune with MPI job in a chip, please refer to this article.
Another article for your reference, to install the product on the cluster system.
Regards, Peter
I'm not familiar with Slurm queuing system... Iguess that isa utility for resource management for cluster system.
VTune Amplifier XE can only collect perfomancedata on one node of cluster system, the user may install the product again on other nodes.
If you run VTune with MPI job in a chip, please refer to this article.
Another article for your reference, to install the product on the cluster system.
Regards, Peter
I read that article. It was informative, now I know why it isn't working. Basically it say
Usercan also viewresults viaGUI by using command "amplxe-gui".You will findonly process "python" was displayed, for example:
Root-cause:
mpiexec doesn't run MPI program directly, it run connection to MPI's mpd daemon via socket and pass all parameters, so the program is not child process of mpiexec.
So, I now don't know what to do.
I tried
srun -n 1 amplxe-cl -V
and that works
Then I tried
srun -n 1 amplxe-cl -collect hotspots -r r0002hs -- mycode
and mycode doesn't run. The reason I think is that parameters are not getting passed to my code which srun creates.
Any other ideas?
Usercan also viewresults viaGUI by using command "amplxe-gui".You will findonly process "python" was displayed, for example:
Root-cause:
mpiexec doesn't run MPI program directly, it run connection to MPI's mpd daemon via socket and pass all parameters, so the program is not child process of mpiexec.
So, I now don't know what to do.
I tried
srun -n 1 amplxe-cl -V
and that works
Then I tried
srun -n 1 amplxe-cl -collect hotspots -r r0002hs -- mycode
and mycode doesn't run. The reason I think is that parameters are not getting passed to my code which srun creates.
Any other ideas?
I tried the following in my code.
int pid = getpid();
sprintf(cmd,"amplxe -collect hotspot -target-pid %d", pid)
system(cmd);
and I get an error, this analysis doesn't support system wide profiling or attaching to a process.
but the examples in the doc's on intels site suggests that it does.
from
http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/lin/ug_docs/olh/cli_ref/target-pid.html#target-pid
Thanks
int pid = getpid();
sprintf(cmd,"amplxe -collect hotspot -target-pid %d", pid)
system(cmd);
and I get an error, this analysis doesn't support system wide profiling or attaching to a process.
but the examples in the doc's on intels site suggests that it does.
from
http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/lin/ug_docs/olh/cli_ref/target-pid.html#target-pid
$ amplxe-cl -collect hotspots -target-pid 1234
I am using version 11 Update 1Thanks
Final comment for a while. It looks like one can use lightweight-hotspots collection, however, this requires the kernel modules to be installed which, of course, they are not. I am guessing all the target-pid collectors require this module. I have put in a help ticket on our big iron system, however, they tend not to address these sorts of issues.
Matt
Matt
The attach-to-process (-target-pid) functionality on Linux was added in Update 3. Also, the command to perform the attach doesn't return until the target process finishes - you would need to use fork/exec rather than 'system' to perform a self-attach.
When running the original command you listed ('srun -n 1 amplxe-cl -collect hotspots -r r002hs myapp') - did it produce the result directory? If so, look in the data.0 subdirectory for error messages. If present, that may give some indication why your application is failing to run.
Note that the issue may not be just running under Slurm, but some issue with any of the technologies involved. (Slurm -> MPI -> Linux node )
Mark
When running the original command you listed ('srun -n 1 amplxe-cl -collect hotspots -r r002hs myapp') - did it produce the result directory? If so, look in the data.0 subdirectory for error messages. If present, that may give some indication why your application is failing to run.
Note that the issue may not be just running under Slurm, but some issue with any of the technologies involved. (Slurm -> MPI -> Linux node )
Mark
