Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28632 Discussions

Weird performance problem in a parallel program

bradlepc
Beginner
426 Views
Here's a good one. I have a program that runs in parallel using MPI message passing. If you run two instances of the same program on a single host(dual Xeon), the code is incredibly slow while the CPU is almost idle and no paging is going on. Run the same program on two machines, performance is fine.

Using some performance tools, we are able to determine that message passing performance is fine on both configurations, but that the processes are spending a lot of time doing nothing (some sort of wait state). It's as if two independent processes are somehow vying for the same resource, but the obvious ones (memory, network bandwidth, I/O, etc) all look ok.

Any suggestions?
0 Kudos
6 Replies
bradlepc
Beginner
426 Views
I have more information on this. We profiled the program two different ways:
1. (good performance) Two instances of the test program running in parallel, one on each of two different hosts
2. (terrible performance) Two instances of the same test program using the same test case, but both processes on the same host.

The profile of the bad case shows that most of the run time is in COMMITQQ@4. The good case shows the same number of calls to this function, but much less time spent in it.

It appears that COMMITQQ is somehow interacting in a bad way with other instances of itself. Overall bandwidth going to the files is minimal.

Steve Lionel, any suggestions on this?
0 Kudos
Steven_L_Intel1
Employee
426 Views
This is not my area of expertise. I had to look up COMMITQQ as I was unfamiliar with it. My guess is that it has to do with overhead for resource locking on the shared host, but that's about all I can say. Do you really need COMMITQQ called so frequently?

Steve
0 Kudos
bradlepc
Beginner
426 Views
There are no explicit calls to COMMITQQ. I think it's being called by flush(). We're trying to reduce the flush() calls. However, we can't eliminate them, and this code works fine on just about any other platform including identical hardware running Linux. Perhaps more telling is that the exact same executable works fine as long as we don't run 2 of them on the same host at the same time.

I think you're probably on the right track with locking. Do you have any COMMITQQ gurus?

Pete
0 Kudos
bradlepc
Beginner
426 Views
We did some more profiling using a different tool. The problem is definitely flush(). Some calls are ok, some are pathologically long. We'll try to get some specifics on file access, etc. Should have 'em tomorrow.

Pete
0 Kudos
Steven_L_Intel1
Employee
426 Views
We definitely have, at vf-support, someone who understands flush() and who also understands multithread issues. You don't say what compiler version you're running - be sure it's 6.6B before submitting the support request.

Steve
0 Kudos
bradlepc
Beginner
426 Views
Well, we tried to come up with a standalone test program to duplicate this, but were not successful. The code with the problem is enormous and parallel, and so it's tough to know exactly what the required conditions are.

For reference, here is a summary of what we know:
- Program is large and runs in parallel using message passing.
- Program has several files open across the network (NTFS).
- It writes to these files periodically.
- It also calls flush() for these files fairly regularly.
- Some files are shared between the parallel processes, most are not.
- Flush calls on unit 6 (explicitly opened as a file) are often pathologically long when two instances of the program are running on the same host. This problem does not occur when there is only 1 instance per host. Unit 6 refers to a separate file for each process (no sharing, write only). Typical flush time is about 0.2 seconds.

We noted that the program is calling flush() unnecessarily, so we're hoping that cleaning these calls up will get around the problem.

Thanks,

Pete
0 Kudos
Reply