- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone. I have any questions about Barrier Collective Communication. Have anyone ever test Barrier in Intel Micro Benchmark in large number of processes (in case 1 process/node and more than 100 nodes) ? I would like to know about the average time (usec) Does it chance about average time will decrease in large scale ?
Thank you.
Phonlawat
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On one of our internal clusters, I tested with 128 nodes, 1 rank per node. Configuration:
[plain]Dual Intel® Xeon™ E5-2697 v2
8*8GB 1600MHz Reg ECC DDR3
Mellanox* MCX353A-FCAT adapter
Mellanox* MSX6025F-1BFR switch
Red Hat* Enterprise Linux* 6.4
Intel® C++ Composer XE 2013 SP1 Update 2
Intel® MPI Library 4.1 Update 3
Intel® MPI Benchmarks 3.2.4[/plain]
I don't know exactly where the nodes were in the routing.
My numbers:
[plain]2 ranks - 1.44 usec
4 ranks - 3.51 usec
8 ranks - 5.53 usec
16 ranks - 7.29 usec
32 ranks - 9.61 usec
64 ranks - 12.29 usec
128 ranks - 14.65 usec[/plain]
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh Thank you for your information. it seem that you make me a quite sure about large number of processes will increase reasonably and suppose that if your results occur like this
2 ranks - 1.44 usec
4 ranks - 3.51 usec
8 ranks - 5.53 usec
16 ranks - 20 usec
32 ranks - 35 usec
64 ranks - 55 usec
128 ranks - 12 usec
What do you think about this problem ? some node have something wrong ?
Thank You
PHONLAWAT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interesting. How busy is your cluster? I noticed a similar behavior on one run here, but only one. Our cluster was in use by others then (and usually is), so I'm willing to attribute the one run to the IB switches being in use by someone else at that time. If you are consistently getting it, then we can dig further.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh, wow. same situation, i ran Intel Micro Benchmark with MPI_Barrier many many time about 10 round (all the result gave the same way) while someone are using other benchmark program and scientific program (it just a benchmark program) in other machine and I have 2 IB switches which are in same and not same switches. I try to find out about the effect of FDR IB interconnection among multiple programs running simultaneously in same IB switches like this picture are attached below and i found that Infiniband network guarantees full-bisection bandwidth. It can isolate the effect of multiple programs so i have 1 assumption about this problem. First Assumption, i ever use High Performance Linpack(HPL) with intel compiler and i try to run HPL 1 node in the same way in other machine, the results is quite difference. Some machine gave the best performance nearly theoretical performance (Eff~97%) and some machine gave poor performance (Eff~89%). i should try other set of 16 Node and 64 Node and still not sure. Second Assumption, i think that some machine after someone has already finish for running scientific program, his program may be running and it doesn't stop or it stop but something wrong in that machines.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page