Hello I am doing a study in which I would analyze trace data collected by the Intel®Trace Analyzer and Collector. Any directions or leads on where I could find this data would be very much appreciated!
That is a very generic question. Can you provide more detail of what you are seeking? The Intel® Trace Analyzer and Collector has two components. The Trace Collector will instrument and collect data on your MPI calls (with an API and capability to instrument user code as well). This data is the trace. The trace can then be opened in Trace Analyzer to analyze it.
I am looking for data that will show how fast machines complete and return MPI calls in relations to other machines. My goal is to find a record of machines that take longer to complete the MPI calls possibly causing bottlenecks for the other machines that are forced to wait for their completion.
Is your question mostly about how to use the Intel® Trace Analyzer and Collector (ITAC)? If you have an MPI application in mind already, then the data on how long an MPI call takes is easily provided by the ITAC tool.
I would do the following:
Note that when transferring the trace files from one machine to another, make sure to copy all *.stf* files (not just the main .stf). If your application is fairly small, you can even create a single trace file (if you have questions on how to do that, just reply back and I'll give you that option).
If you've never used the GUI before, we have a pretty nice tutorial posted online. Or you're welcome to ask questions here.
I don't have a MPI application, I am trying to get a "real-world" pattern of an MPI process where certain machines run slower than others, and possible cause bottlenecks for the other machines.
Ah, that's a different issue. When you say "certain machines run slower than others", do you mean that all of the machines are involved in running a single parallel MPI application but some are waiting around for a rank/machine to finish before they can continue? Or do you mean an MPI application will run slower on some machines and faster on others (e.g. differences in architecture, cache size, etc. affect the performance of the application)?
I am looking for a scenario where many machines are running a single parallel MPI application but some are waiting around for a rank/machine to finish before they can continue, however the reasons that some machines finish faster than others possible due to some of the reasons you listed such as differences in architecture, cache size.
I can recommend trying one of the example applications we ship with the tool that implements a poisson solver. In the unoptimized version, the application causes a serialized bottleneck where some ranks have to wait on others to finish before being able to continue.
Unfortunately, I'm not sure that would fulfill the second part of your requirement (even in a bottleneck situation, the cache size or architecture affects how fast a message is passed). It might, I just haven't tested it. That would really depend on what type of optimizations the underlying MPI library implements when doing the message passing.
To check out the poisson application, look at <install_dir>//itac/9.0.2.045/examples/poisson/.
Maybe others on this forum know other applications they can recommend.
You can grab an evaluation copy of the Intel Parallel Studio XE Cluster Edition that includes the tool. That will allow you to run MPI applications and create traces files with ITAC. Check out the main Parallel Studio XE page for an eval - they're free for 30 days.
If you don't have access to a cluster setup, I can also send you trace files that I've created in the past and you can view them using the GUI only (so you don't have to compile and run anything). But that would be limited to whatever environment I used at the time. For this one, you still need to install the tool.
Finally, if you don't want to install anything, I can send you some screenshots of a specific area you're interested in. You can go to the main ITAC page to see examples of what those screenshots will be.
Let me know,
Aaron and I resolved this via email. If anyone else needs access to this, make sure to register for an eval of the "Intel Parallel Studio XE Cluster Edition for Fortran and C++". That includes all the MPI tools that Intel provides.