- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am testing my MPI application on 2 KNCs attached to the same host CPU. I observe a *strongly* fluctuating performance (by a factor of 10 or even more) --- for example between 10 and 160 Gflop/s (per card). This variation is observed within a loop doing the same computation in every iteration. When it runs at 160 Gflop/s one loop iteration takes around 0.05 seconds, which means the fluctuations occur at a timescale longer than that.
I am using:
I_MPI_FABRICS_LIST=dapl
I_MPI_DAPL_PROVIDER_LIST=ofa-v2-scif0
Observations:
- If I use the Infiniband card instead but still both cards on the same host, (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u), the performance is consistent.
- If I reduce the number of cores used by my application the performance gets more stable. With 56 cores I still see fluctuations, with 48 cores it is mostly fine, but still visible.
- I did not observe anything peculiar with the "osu" bandwidth-benchmark, apart from a "dip" at 8 kB (which can be reduced by changing I_MPI_DAPL_DIRECT_COPY_THRESHOLD --- but this parameter shows no influence for my actual application).
- I tried 2 hardware setups: (1) a dual socket server board, where (I think) data has to pass through the southbridge and/or QPI(?) and (2) a system with a PLX PCIe switch. The fluctuations happen on both systems.
Is there anything wrong with my configuration? Is this an known issue? Any suggestions?
Thanks
Simon
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And you have mpxyd running in both cases (ofa-v2-scif0 and ofa-v2-mlx4_0-1u), right?
With ofa-v2-mlx4_0-1u, you are using the buffers for the adapter regardless of whether you are going from a coprocessor to another node or to another coprocessor on the same node. With ofa-v2-scif0, you are using RDMA buffers set up in host memory. At least, that is my understanding. In any event, you are definitely following a different path through the host.
I suspect the answer to what is going on would show up if you set log_level to 0x4 (log data operations) and/or 0x10 (log perf) in /etc/mpxyd.conf, if you want to try that. And the solution will probably be modifying some of the buffer settings in that file. I will see if I can find someone who knows more about this to provide some guidance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One more thing - the developers I talk to are going to want to know the host OS version, MPSS version and IMPI version. Could you let me know?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HOST OS : Linux
OS Version : 2.6.32-358.el6.x86_64
Driver Version : 3.3-1
MPSS Version : 3.3
Flash Version : 2.1.02.0390
SMC Firmware Version : 1.16.5078
SMC Boot Loader Version : 1.8.4326
uOS Version : 2.6.38.8+mpss3.3
Intel MPI version 4.1.3.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page