Software Archive
Read-only legacy content
17061 Discussions

Distinguishing OFFLOAD_REPORT statements between multiple nodes

Gary_L_
Beginner
401 Views

Hello all,

So I have three nodes, each with 2 Xeon Phi (5110P). I'm using OFFLOAD_REPORT to gather all of its wonderful information; however I'm unable to distinguish which Xeon Phi corresponds to a particular node. For each node, each MIC is numbered 0-N, where N is the number of Xeon Phi on that node (in my case, 2). So for a 3 node system, the offload report generates 3 offload reports with identical tags and MIC ID's. See below for some sample output.

Does anyone know a way to get node-specific information, such as the rank ID or possibly the node's host name? So far I've been unable to find a solution. I typically use OFFLOAD_REPORT=2, however even bumping up to 3 did not provide beneficial information. My compiler ignores values above 3, although I though the report supported up to option 5 although the output was not very readable. I'd have to find the source.

Thanks

Gary

 

Sample Output:

[Offload] [MIC 0] [File]                    eam.c
[Offload] [MIC 0] [Line]                    334
[Offload] [MIC 0] [Tag]                     Tag 5
[Offload] [HOST]  [Tag 5] [CPU Time]        0.000343(seconds)
[Offload] [MIC 0] [Tag 5] [CPU->MIC Data]   4 (bytes)
[Offload] [MIC 0] [Tag 5] [MIC Time]        0.000152(seconds)
[Offload] [MIC 0] [Tag 5] [MIC->CPU Data]   0 (bytes)
[Offload] [MIC 1] [File]                    eam.c
[Offload] [MIC 1] [Line]                    334
[Offload] [MIC 1] [Tag]                     Tag 5
[Offload] [HOST]  [Tag 5] [CPU Time]        0.000328(seconds)
[Offload] [MIC 1] [Tag 5] [CPU->MIC Data]   4 (bytes)
[Offload] [MIC 1] [Tag 5] [MIC Time]        0.000147(seconds)
[Offload] [MIC 1] [Tag 5] [MIC->CPU Data]   0 (bytes)
[Offload] [MIC 0] [File]                    eam.c
[Offload] [MIC 0] [Line]                    334
[Offload] [MIC 0] [Tag]                     Tag 5
[Offload] [MIC 0] [File]                    eam.c
[Offload] [MIC 0] [Line]                    334
[Offload] [MIC 0] [Tag]                     Tag 5
[Offload] [MIC 1] [File]                    eam.c
[Offload] [MIC 1] [Line]                    334
[Offload] [MIC 1] [Tag]                     Tag 5
[Offload] [HOST]  [Tag 5] [CPU Time]        0.000734(seconds)
[Offload] [MIC 0] [Tag 5] [CPU->MIC Data]   4 (bytes)
[Offload] [MIC 0] [Tag 5] [MIC Time]        0.000463(seconds)
[Offload] [MIC 0] [Tag 5] [MIC->CPU Data]   0 (bytes)

 

0 Kudos
1 Solution
Kevin_D_Intel
Employee
401 Views

After experimenting some I’m wondering would the mpirun -l (lower-case “L”) option to prefix all lines printed to stdout with the process rank help with your scenario?

Using this option with Intel® MPI on a host with two Xeon Phi™ cards running two MPI ranks, it produced the output shown. As in your case that shows the overlap with "Tag 5" among ranks, my (smaller) case shows a similar overlap of "Tag 0" from each rank.

$ mpiifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.3.187 Build 20150407

$ mpiifort -fpp mpi_hello.f90
$ export OFFLOAD_REPORT=1
$ mpirun -np 2 -l a.out

[1]  Hello world! I'm rank            1  out of            2  ranks
[0]   Checking for Intel(R) Xeon Phi(TM) (Target CPU) devices...
[0]
[0]     Number of Target devices installed:      2
[0]
[0]  Hello world! I'm rank            0  out of            2  ranks
[1] [Offload] [MIC 1] [File]                    mpi_hello.f90
[1] [Offload] [MIC 1] [Line]                    28
[1] [Offload] [MIC 1] [Tag]                     Tag 0
[1]  Hello from Phi:            1
[1] [Offload] [HOST]  [Tag 0] [CPU Time]        0.476401(seconds)
[1] [Offload] [MIC 1] [Tag 0] [MIC Time]        0.012917(seconds)
[1]
[1]  Final sum from Phi:    5050.000
[0] [Offload] [MIC 0] [File]                    mpi_hello.f90
[0] [Offload] [MIC 0] [Line]                    28
[0] [Offload] [MIC 0] [Tag]                     Tag 0
[0] [Offload] [HOST]  [Tag 0] [CPU Time]        0.569555(seconds)
[0] [Offload] [MIC 0] [Tag 0] [MIC Time]        0.009333(seconds)
[0]
[0]  Final sum from Phi:    5050.000
[0]  Hello from Phi:            0

 

View solution in original post

0 Kudos
5 Replies
Kevin_D_Intel
Employee
401 Views

This is an interesting issue. I am not aware of a means to gather host specific info via the offload specific features.

Could you perhaps retrieve the host name/id within your program and print it out at execution start to help distinguish from which node the report originated?

I’ll ask Development whether it is possible to provide any host specific details at the start of the offload report.

0 Kudos
Gary_L_
Beginner
401 Views

I've been trying to come up with a similar solution (output rank and offload order), however I cannot guarantee the order of print statements explicitly correlates to the order of offload reports. Ideally, yes, these should match; but it is unlikely due to unknown I/O related data races between print statements. Often times the offload reports aren't actually in any specific order, but thanks to the MIC ID and tag system, it is easy to reference them out-of-order. It would be ideal if the OFFLOAD_REPORT data from devices used within an MPI rank could be captured by the algorithm such that the rank ID could be injected in the offload report output. This way the rank, and offload report data are printed in the same line.

0 Kudos
Kevin_D_Intel
Employee
402 Views

After experimenting some I’m wondering would the mpirun -l (lower-case “L”) option to prefix all lines printed to stdout with the process rank help with your scenario?

Using this option with Intel® MPI on a host with two Xeon Phi™ cards running two MPI ranks, it produced the output shown. As in your case that shows the overlap with "Tag 5" among ranks, my (smaller) case shows a similar overlap of "Tag 0" from each rank.

$ mpiifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.3.187 Build 20150407

$ mpiifort -fpp mpi_hello.f90
$ export OFFLOAD_REPORT=1
$ mpirun -np 2 -l a.out

[1]  Hello world! I'm rank            1  out of            2  ranks
[0]   Checking for Intel(R) Xeon Phi(TM) (Target CPU) devices...
[0]
[0]     Number of Target devices installed:      2
[0]
[0]  Hello world! I'm rank            0  out of            2  ranks
[1] [Offload] [MIC 1] [File]                    mpi_hello.f90
[1] [Offload] [MIC 1] [Line]                    28
[1] [Offload] [MIC 1] [Tag]                     Tag 0
[1]  Hello from Phi:            1
[1] [Offload] [HOST]  [Tag 0] [CPU Time]        0.476401(seconds)
[1] [Offload] [MIC 1] [Tag 0] [MIC Time]        0.012917(seconds)
[1]
[1]  Final sum from Phi:    5050.000
[0] [Offload] [MIC 0] [File]                    mpi_hello.f90
[0] [Offload] [MIC 0] [Line]                    28
[0] [Offload] [MIC 0] [Tag]                     Tag 0
[0] [Offload] [HOST]  [Tag 0] [CPU Time]        0.569555(seconds)
[0] [Offload] [MIC 0] [Tag 0] [MIC Time]        0.009333(seconds)
[0]
[0]  Final sum from Phi:    5050.000
[0]  Hello from Phi:            0

 

0 Kudos
Gary_L_
Beginner
401 Views

Thank you! This works perfectly as a solution. 

Gary

0 Kudos
Kevin_D_Intel
Employee
401 Views

Excellent, I'm very happy to hear that and you're very welcome. I think this will help someone else down the road so thank you posting this question.

0 Kudos
Reply