Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Looking for data of traces

Aaron_H_
Beginner
811 Views

Hello I am doing a study in which I would analyze trace data collected by the Intel®Trace Analyzer and Collector. Any directions or leads on where I could find this data would be very much appreciated!

Thank You

-Aaron

 

0 Kudos
10 Replies
James_T_Intel
Moderator
811 Views

That is a very generic question.  Can you provide more detail of what you are seeking?  The Intel® Trace Analyzer and Collector has two components.  The Trace Collector will instrument and collect data on your MPI calls (with an API and capability to instrument user code as well).  This data is the trace.  The trace can then be opened in Trace Analyzer to analyze it.

0 Kudos
Aaron_H_
Beginner
811 Views

I am looking for data that will show how fast machines complete  and return MPI calls in relations to other machines. My goal is to find a record of machines that take longer to complete the MPI calls possibly causing bottlenecks for the other machines that are forced to wait for their completion.

 

0 Kudos
Gergana_S_Intel
Employee
811 Views

Is your question mostly about how to use the Intel® Trace Analyzer and Collector (ITAC)?  If you have an MPI application in mind already, then the data on how long an MPI call takes is easily provided by the ITAC tool.

I would do the following:

  • Run my application (say test.exe) on machine1 with ITAC enabled:
    $ ssh machine1
    $ export VT_LOGFILE_NAME=test.mach1.stf
    $ mpirun -trace -n 4 ./test.exe
  • Run the same application on machine2 with ITAC enabled:
    $ ssh machine2
    $ export VT_LOGFILE_NAME=test.mach2.stf
    $ mpirun -trace -n 4 ./test.exe
  • Compare the two traces via the GUI:
    • Copy both trace files test.mach{1,2}.stf* to a single machine
    • Open one of them ($ traceanalyzer test.mach1.stf)
    • Go to View > Compare and select the other trace file (test.mach2.stf)
    • Review the results via the Event Timeline or the Flat Profile

Note that when transferring the trace files from one machine to another, make sure to copy all *.stf* files (not just the main .stf).  If your application is fairly small, you can even create a single trace file (if you have questions on how to do that, just reply back and I'll give you that option).

If you've never used the GUI before, we have a pretty nice tutorial posted online.  Or you're welcome to ask questions here.

Best regards,
~Gergana

0 Kudos
Aaron_H_
Beginner
811 Views

I don't have a MPI application, I am trying to get a "real-world" pattern of an MPI process where certain machines run slower than others, and possible cause bottlenecks for the other machines.

0 Kudos
Gergana_S_Intel
Employee
811 Views

Ah, that's a different issue.  When you say "certain machines run slower than others", do you mean that all of the machines are involved in running a single parallel MPI application but some are waiting around for a rank/machine to finish before they can continue?  Or do you mean an MPI application will run slower on some machines and faster on others (e.g. differences in architecture, cache size, etc. affect the performance of the application)?

~Gergana

0 Kudos
Aaron_H_
Beginner
811 Views

I am looking for a scenario where many machines are running a single parallel MPI application but some are waiting around for a rank/machine to finish before they can continue, however the reasons that some machines finish faster than others possible due to some of the reasons you listed such as differences in architecture, cache size. 

0 Kudos
Gergana_S_Intel
Employee
811 Views

I can recommend trying one of the example applications we ship with the tool that implements a poisson solver.  In the unoptimized version, the application causes a serialized bottleneck where some ranks have to wait on others to finish before being able to continue.

Unfortunately, I'm not sure that would fulfill the second part of your requirement (even in a bottleneck situation, the cache size or architecture affects how fast a message is passed).  It might, I just haven't tested it.  That would really depend on what type of optimizations the underlying MPI library implements when doing the message passing.

To check out the poisson application, look at <install_dir>//itac/9.0.2.045/examples/poisson/.

Maybe others on this forum know other applications they can recommend.

Regards,
~Gergana

0 Kudos
Aaron_H_
Beginner
811 Views

I don't actually own a copy of the tool, is it possible to look at the results of the tool running on the poisson solver without the actual tool?

0 Kudos
Gergana_S_Intel
Employee
811 Views

You can grab an evaluation copy of the Intel Parallel Studio XE Cluster Edition that includes the tool.  That will allow you to run MPI applications and create traces files with ITAC.  Check out the main Parallel Studio XE page for an eval - they're free for 30 days.

If you don't have access to a cluster setup, I can also send you trace files that I've created in the past and you can view them using the GUI only (so you don't have to compile and run anything).  But that would be limited to whatever environment I used at the time.  For this one, you still need to install the tool.

Finally, if you don't want to install anything, I can send you some screenshots of a specific area you're interested in.  You can go to the main ITAC page to see examples of what those screenshots will be.

Let me know,
~Gergana

0 Kudos
Gergana_S_Intel
Employee
811 Views

Aaron and I resolved this via email.  If anyone else needs access to this, make sure to register for an eval of the "Intel Parallel Studio XE Cluster Edition for Fortran and C++".  That includes all the MPI tools that Intel provides.

Regards,
~Gergana

0 Kudos
Reply