- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[2] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[12] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[26] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[30] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
malaga.ncl.res.in:20e0:3f1f6a20: 2052 us(2052 us): open_hca: device mlx4_0 not found
[2] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:20e0:3f1f6a20: 2209 us(157 us): open_hca: device mlx4_0 not found
[2] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
[6] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
malaga.ncl.res.in:20e7:df6dba20: 1967 us(1967 us): open_hca: device mlx4_0 not found
[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:20e7:df6dba20: 2130 us(163 us): open_hca: device mlx4_0 not found
[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
[22] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[4] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[14] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
malaga.ncl.res.in:20ec:99e6aa20: 3857 us(3857 us): open_hca: device mlx4_0 not found
[26] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:20ee:9340ba20: 3929 us(3929 us): open_hca: device mlx4_0 not found
[30] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:20ec:99e6aa20: 3972 us(115 us): open_hca: device mlx4_0 not found
[26] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
[10] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[2] MPI startup(): DAPL provider ofa-v2-ib0
[16] MPI startup(): DAPL provider ofa-v2-ib0
malaga.ncl.res.in:20ee:9340ba20: 4095 us(166 us): open_hca: device mlx4_0 not found
[30] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
[26] MPI startup(): DAPL provider ofa-v2-ib0
[30] MPI startup(): DAPL provider ofa-v2-ib0
[18] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[1] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[23] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[13] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[7] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[17] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can any one please help me in this problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Praveen,
Please attach your /etc/dat.conf file. What is the output from ibstat? Please run with I_MPI_DEBUG=2 and send the output.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
These are the files and output
ibstat
CA 'qib0'
CA type: InfiniPath_QLE7340
Number of ports: 1
Firmware version:
Hardware version: 2
Node GUID: 0x001175000070a728
System image GUID: 0x001175000070a728
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x0761086a
Port GUID: 0x001175000070a728
Link layer: InfiniBand
cat /etc/dat.conf
ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""
ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""
ofa-v2-ehca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" ""
ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-mthca0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
ofa-v2-cma-roe-eth3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth3 0" ""
ofa-v2-scm-roe-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-scm-roe-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""
OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""
OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""
OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""
OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ipath0 1" ""
OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ipath0 2" ""
OpenIB-ehca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ehca0 1" ""
OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""
OpenIB-cma-roe-eth2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""
OpenIB-cma-roe-eth3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth3 0" ""
OpenIB-scm-roe-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""
OpenIB-scm-roe-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""
cat matmul.o213
[0] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[14] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[26] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[2] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[4] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[6] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[12] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
[24] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
malaga.ncl.res.in:21be:f82aa20: 826 us(826 us): open_hca: device mlx4_0 not found
[2] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:21be:f82aa20: 902 us(76 us): open_hca: device mlx4_0 not found
malaga.ncl.res.in:21bf:a6c55a20: 824 us(824 us): open_hca: device mlx4_0 not found
[4] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:21bf:a6c55a20: 898 us(74 us): open_hca: device mlx4_0 not found
[4] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
malaga.ncl.res.in:21c0:247eba20: 810 us(810 us): open_hca: device mlx4_0 not found
[6] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:21c3:971c0a20: 833 us(833 us): open_hca: device mlx4_0 not found
[12] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:21c3:971c0a20: 909 us(76 us): open_hca: device mlx4_0 not found
[12] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
malaga.ncl.res.in:21c4:dcb51a20: 1042 us(1042 us): open_hca: device mlx4_0 not found
[14] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:21c4:dcb51a20: 1118 us(76 us): open_hca: device mlx4_0 not found
[14] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
malaga.ncl.res.in:21ca:29e59a20: 954 us(954 us): open_hca: device mlx4_0 not found
[26] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:21ca:29e59a20: 1029 us(75 us): open_hca: device mlx4_0 not found
[26] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
[2] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
malaga.ncl.res.in:21c0:247eba20: 887 us(77 us): open_hca: device mlx4_0 not found
[6] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
malaga.ncl.res.in:21c5:db6f8a20: 826 us(826 us): open_hca: device mlx4_0 not found
[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-2
malaga.ncl.res.in:21c5:db6f8a20: 904 us(78 us): open_hca: device mlx4_0 not found
[16] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-ib0
[22] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1
malaga.ncl.res.in:21c9:96faaa20: 812 us(812 us): open_hca: device mlx4_0 not found
42,1 0%
These are the files and output...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Praveen,
Those messages indicate that the DAPL* provider mlx4_0 is not available, but do not indicate why. Please send the output from ibstat. Also, what command are you using to run your program? Setting I_MPI_DEBUG=2 should have given additional output.
James.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Thanks for the reply..
I am using this command
mpiexec.hydra -machinefile ./NODE -np 32 -genv I_MPI_DEBUG=2 ./matmul.bin
$ ibstat
CA 'qib0'
CA type: InfiniPath_QLE7340
Number of ports: 1
Firmware version:
Hardware version: 2
Node GUID: 0x001175000070a728
System image GUID: 0x001175000070a728
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x0761086a
Port GUID: 0x001175000070a728
Link layer: InfiniBand
This is the ibstat output
I have attached the output file please find that
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Praveen,
Everything is working as expected. As I said, the messages are due to the DAPL* provider mlx4_0 not being available. That is because you are using ib0 instead. By default, the Intel® MPI Library tries the entries in /etc/dat.conf in order.
I would suggest modifying your /etc/dat.conf file and putting the ofa-v2-ib0 line first, as this is the provider you are using. I would recommend either commenting out the ofa-v2-mlx4_0-1 and ofa-v2-mlx4_0-2 lines or moving them to the bottom of the file.
You can also set I_MPI_DAPL_PROVIDER=ofa-v2-ib0 and this will skip to the approprate provider.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Thanks for the support
It is working Perfectly ....
If there any way to check the performance of the IB?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Praveen,
You can check directly using
[plain]ib_read_bw -d ib0&
ib_read_bw -w ib0 localhost[/plain]
Or you can use the Intel® MPI Benchmarks to test MPI performance over the fabric. A binary is included with the Intel® MPI Library installation, or you can download the source at http://software.intel.com/en-us/articles/intel-mpi-benchmarks/.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My team has ran some Gromacs job
Now performance got deducted
earlier the Performance on a single node was : 937.9 ns/day
Today when I tried the performance is: 832.1 ns/day
So thats more than 10% difference.
For the multi-node jobs:
Earlier Performance: 1401.8 ns/day
(expected was 1876ns/day, i.e. double of the single node performance)
Today's Performance: 1317.2 ns/day
Any idea why the single node performance has gone down
(The jobs were run with mpiexec.hydra on both days)
Today's Performance: 1317.2 ns/day
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How can i fine tune this setup for better performance??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Praveen,
I don't have a solid answer as to why the performance would be different on a different day. Was a different job running as well on the lower performing day that could have used up some system resources? Did anything on the system change?
As to how to improve the performance, we offer quite a few options. The simplest is the automatic tuner, mpitune. Please see http://software.intel.com/en-us/articles/increase-cluster-mpi-application-performance-with-a-mpi-tune-up for more information about mpitune.
You can also use the Intel® Trace Analyzer and Collector to help locate MPI bottlenecks. Go to http://software.intel.com/en-us/intel-trace-analyzer/ for more information.
We have a performance and threading analysis tool, Intel® VTune™ Amplifier XE, which can provide information about hotspots and threading performance problems within your program as well. Visit http://software.intel.com/en-us/intel-vtune-amplifier-xe/ for more information.
If you decide one or more of these tools can help, look through the articles to findi specific usage information, or feel free to ask and I can help point you in the correct direction or answer specific questions.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page