- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrew,
There're a lot of stf files in your archive which one should I look at?
Also I'd recommend you to try the latest Intel MPI Library 5.1.1 and Intel Trace Analyzer and Collector (ITAC) 9.1.1 if possible. It may prevent crashes in case of I_MPI_STATS using. Also there's a tool MPI Performance Snapshot (part of ITAC) - the tool for preliminary analysis. It may help you to determine the problem area (if any).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Andrew,
You wrote:
However, when I spread out the 4 tasks on 2 nodes (still 4 total tasks, just 2 on each node), I get what seem to be numerical-/precision-related errors.
Could you please show an example of the error? Is it an application error or something related to MPI?
Could you please try to run the following scenarios with I_MPI_DEBUG=6 and provide the output:
1. 4 MPI tasks on a single node
2. 4 MPI tasks on 2 nodes (still 4 total tasks, just 2 on each node)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem,
Thanks so much for your reply. The errors are all application errors.
I have attached what you asked for. Here are snippets of some I_MPI_DEBUG parts:
4 tasks, 1 node:
[0] MPI startup(): Intel(R) MPI Library, Version 5.0 Update 2 Build 20141030 (build id: 10994)
[0] MPI startup(): Copyright (C) 2003-2014 Intel Corporation. All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[2] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[3] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[3] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[3] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[2] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[0] MPI startup(): shm and dapl data transfer modes
[2] MPI startup(): shm and dapl data transfer modes
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup(): shm and dapl data transfer modes
[0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[1] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[1] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[2] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[2] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[3] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[3] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[0] MPI startup(): Device_reset_idx=0
[0] MPI startup(): Allgather: 1: 1-2861 & 0-8
[0] MPI startup(): Allgather: 3: 0-2147483647 & 0-8
[0] MPI startup(): Allgather: 1: 0-605 & 9-2147483647
[0] MPI startup(): Allgather: 3: 0-2147483647 & 9-2147483647
[0] MPI startup(): Allgatherv: 1: 0-2554 & 0-8
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 0-8
[0] MPI startup(): Allgatherv: 1: 0-272 & 9-16
[0] MPI startup(): Allgatherv: 2: 272-657 & 9-16
[0] MPI startup(): Allgatherv: 1: 657-2078 & 9-16
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 9-16
[0] MPI startup(): Allgatherv: 1: 0-1081 & 17-32
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 17-32
[0] MPI startup(): Allgatherv: 1: 0-547 & 33-64
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 33-64
[0] MPI startup(): Allgatherv: 1: 0-19 & 65-2147483647
[0] MPI startup(): Allgatherv: 2: 19-239 & 65-2147483647
[0] MPI startup(): Allgatherv: 1: 239-327 & 65-2147483647
[0] MPI startup(): Allgatherv: 4: 327-821 & 65-2147483647
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 65-2147483647
[0] MPI startup(): Allreduce: 1: 0-5738 & 0-4
[0] MPI startup(): Allreduce: 2: 5738-197433 & 0-4
[0] MPI startup(): Allreduce: 7: 197433-593742 & 0-4
[0] MPI startup(): Allreduce: 2: 0-2147483647 & 0-4
[0] MPI startup(): Allreduce: 1: 0-5655 & 5-8
[0] MPI startup(): Allreduce: 2: 5655-75166 & 5-8
[0] MPI startup(): Allreduce: 8: 75166-177639 & 5-8
[0] MPI startup(): Allreduce: 3: 177639-988014 & 5-8
[0] MPI startup(): Allreduce: 2: 988014-1643869 & 5-8
[0] MPI startup(): Allreduce: 8: 1643869-2494859 & 5-8
[0] MPI startup(): Allreduce: 2: 0-2147483647 & 5-8
[0] MPI startup(): Allreduce: 1: 0-587 & 9-16
[0] MPI startup(): Allreduce: 2: 587-3941 & 9-16
[0] MPI startup(): Allreduce: 1: 3941-9003 & 9-16
[0] MPI startup(): Allreduce: 2: 9003-101469 & 9-16
[0] MPI startup(): Allreduce: 8: 101469-355768 & 9-16
[0] MPI startup(): Allreduce: 3: 355768-3341814 & 9-16
[0] MPI startup(): Allreduce: 8: 0-2147483647 & 9-16
[0] MPI startup(): Allreduce: 1: 0-795 & 17-32
[0] MPI startup(): Allreduce: 2: 795-146567 & 17-32
[0] MPI startup(): Allreduce: 8: 146567-732118 & 17-32
[0] MPI startup(): Allreduce: 3: 0-2147483647 & 17-32
[0] MPI startup(): Allreduce: 1: 0-528 & 33-64
[0] MPI startup(): Allreduce: 2: 528-221277 & 33-64
[0] MPI startup(): Allreduce: 8: 221277-1440737 & 33-64
[0] MPI startup(): Allreduce: 3: 0-2147483647 & 33-64
[0] MPI startup(): Allreduce: 1: 0-481 & 65-128
[0] MPI startup(): Allreduce: 2: 481-593833 & 65-128
[0] MPI startup(): Allreduce: 8: 593833-2962021 & 65-128
[0] MPI startup(): Allreduce: 7: 0-2147483647 & 65-128
[0] MPI startup(): Allreduce: 1: 0-584 & 129-256
[0] MPI startup(): Allreduce: 2: 0-2147483647 & 129-256
[0] MPI startup(): Allreduce: 1: 0-604 & 257-2147483647
[0] MPI startup(): Allreduce: 2: 604-2997006 & 257-2147483647
[0] MPI startup(): Allreduce: 8: 0-2147483647 & 257-2147483647
[0] MPI startup(): Alltoall: 4: 0-2048 & 0-4
[0] MPI startup(): Alltoall: 2: 2049-8192 & 0-4
[0] MPI startup(): Alltoall: 4: 8193-16384 & 0-4
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 0-4
[0] MPI startup(): Alltoall: 1: 0-0 & 5-8
[0] MPI startup(): Alltoall: 4: 1-8 & 5-8
[0] MPI startup(): Alltoall: 1: 9-2585 & 5-8
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 5-8
[0] MPI startup(): Alltoall: 1: 0-2025 & 9-16
[0] MPI startup(): Alltoall: 4: 2026-3105 & 9-16
[0] MPI startup(): Alltoall: 2: 3106-19194 & 9-16
[0] MPI startup(): Alltoall: 3: 19195-42697 & 9-16
[0] MPI startup(): Alltoall: 4: 42698-131072 & 9-16
[0] MPI startup(): Alltoall: 3: 131073-414909 & 9-16
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 9-16
[0] MPI startup(): Alltoall: 2: 0-0 & 17-32
[0] MPI startup(): Alltoall: 1: 1-1026 & 17-32
[0] MPI startup(): Alltoall: 4: 1027-4096 & 17-32
[0] MPI startup(): Alltoall: 2: 4097-38696 & 17-32
[0] MPI startup(): Alltoall: 4: 38697-131072 & 17-32
[0] MPI startup(): Alltoall: 3: 0-2147483647 & 17-32
[0] MPI startup(): Alltoall: 3: 0-0 & 33-64
[0] MPI startup(): Alltoall: 1: 1-543 & 33-64
[0] MPI startup(): Alltoall: 4: 544-4096 & 33-64
[0] MPI startup(): Alltoall: 2: 4097-16384 & 33-64
[0] MPI startup(): Alltoall: 3: 16385-65536 & 33-64
[0] MPI startup(): Alltoall: 4: 65537-131072 & 33-64
[0] MPI startup(): Alltoall: 3: 0-2147483647 & 33-64
[0] MPI startup(): Alltoall: 1: 0-261 & 65-128
[0] MPI startup(): Alltoall: 4: 262-7180 & 65-128
[0] MPI startup(): Alltoall: 2: 7181-58902 & 65-128
[0] MPI startup(): Alltoall: 4: 58903-65536 & 65-128
[0] MPI startup(): Alltoall: 3: 65537-1048576 & 65-128
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 65-128
[0] MPI startup(): Alltoall: 1: 0-131 & 129-256
[0] MPI startup(): Alltoall: 4: 132-7193 & 129-256
[0] MPI startup(): Alltoall: 2: 7194-16813 & 129-256
[0] MPI startup(): Alltoall: 3: 16814-32768 & 129-256
[0] MPI startup(): Alltoall: 4: 32769-65536 & 129-256
[0] MPI startup(): Alltoall: 3: 0-2147483647 & 129-256
[0] MPI startup(): Alltoall: 1: 0-66 & 257-2147483647
[0] MPI startup(): Alltoall: 4: 67-6568 & 257-2147483647
[0] MPI startup(): Alltoall: 2: 6569-16572 & 257-2147483647
[0] MPI startup(): Alltoall: 3: 16573-32768 & 257-2147483647
[0] MPI startup(): Alltoall: 4: 32769-438901 & 257-2147483647
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 257-2147483647
[0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-8
[0] MPI startup(): Alltoallv: 0: 0-4 & 9-2147483647
[0] MPI startup(): Alltoallv: 2: 0-2147483647 & 9-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 2: 0-2147483647 & 0-4
[0] MPI startup(): Barrier: 5: 0-2147483647 & 5-8
[0] MPI startup(): Barrier: 2: 0-2147483647 & 9-32
[0] MPI startup(): Barrier: 4: 0-2147483647 & 33-2147483647
[0] MPI startup(): Bcast: 4: 1-256 & 0-8
[0] MPI startup(): Bcast: 1: 257-17181 & 0-8
[0] MPI startup(): Bcast: 7: 17182-1048576 & 0-8
[0] MPI startup(): Bcast: 7: 0-2147483647 & 0-8
[0] MPI startup(): Bcast: 7: 0-2147483647 & 9-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-12 & 0-4
[0] MPI startup(): Reduce_scatter: 5: 12-27 & 0-4
[0] MPI startup(): Reduce_scatter: 3: 27-49 & 0-4
[0] MPI startup(): Reduce_scatter: 1: 49-187 & 0-4
[0] MPI startup(): Reduce_scatter: 3: 187-405673 & 0-4
[0] MPI startup(): Reduce_scatter: 4: 405673-594687 & 0-4
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-4
[0] MPI startup(): Reduce_scatter: 5: 0-24 & 5-8
[0] MPI startup(): Reduce_scatter: 1: 24-155 & 5-8
[0] MPI startup(): Reduce_scatter: 3: 155-204501 & 5-8
[0] MPI startup(): Reduce_scatter: 5: 204501-274267 & 5-8
[0] MPI startup(): Reduce_scatter: 4: 0-2147483647 & 5-8
[0] MPI startup(): Reduce_scatter: 1: 0-63 & 9-16
[0] MPI startup(): Reduce_scatter: 3: 63-72 & 9-16
[0] MPI startup(): Reduce_scatter: 1: 72-264 & 9-16
[0] MPI startup(): Reduce_scatter: 3: 264-168421 & 9-16
[0] MPI startup(): Reduce_scatter: 4: 168421-168421 & 9-16
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 9-16
[0] MPI startup(): Reduce_scatter: 3: 0-0 & 17-32
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 17-32
[0] MPI startup(): Reduce_scatter: 1: 4-12 & 17-32
[0] MPI startup(): Reduce_scatter: 5: 12-18 & 17-32
[0] MPI startup(): Reduce_scatter: 1: 18-419 & 17-32
[0] MPI startup(): Reduce_scatter: 3: 419-188739 & 17-32
[0] MPI startup(): Reduce_scatter: 4: 188739-716329 & 17-32
[0] MPI startup(): Reduce_scatter: 5: 716329-1365841 & 17-32
[0] MPI startup(): Reduce_scatter: 2: 1365841-2430194 & 17-32
[0] MPI startup(): Reduce_scatter: 4: 0-2147483647 & 17-32
[0] MPI startup(): Reduce_scatter: 3: 0-0 & 33-64
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 33-64
[0] MPI startup(): Reduce_scatter: 5: 4-17 & 33-64
[0] MPI startup(): Reduce_scatter: 1: 17-635 & 33-64
[0] MPI startup(): Reduce_scatter: 3: 635-202937 & 33-64
[0] MPI startup(): Reduce_scatter: 5: 202937-308253 & 33-64
[0] MPI startup(): Reduce_scatter: 4: 308253-1389874 & 33-64
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 33-64
[0] MPI startup(): Reduce_scatter: 3: 0-0 & 65-128
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 65-128
[0] MPI startup(): Reduce_scatter: 5: 4-16 & 65-128
[0] MPI startup(): Reduce_scatter: 1: 16-1238 & 65-128
[0] MPI startup(): Reduce_scatter: 3: 1238-280097 & 65-128
[0] MPI startup(): Reduce_scatter: 5: 280097-631434 & 65-128
[0] MPI startup(): Reduce_scatter: 4: 631434-2605072 & 65-128
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 65-128
[0] MPI startup(): Reduce_scatter: 2: 0-0 & 129-256
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 129-256
[0] MPI startup(): Reduce_scatter: 5: 4-16 & 129-256
[0] MPI startup(): Reduce_scatter: 1: 16-2418 & 129-256
[0] MPI startup(): Reduce_scatter: 3: 0-2147483647 & 129-256
[0] MPI startup(): Reduce_scatter: 2: 0-0 & 257-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 257-2147483647
[0] MPI startup(): Reduce_scatter: 5: 4-16 & 257-2147483647
[0] MPI startup(): Reduce_scatter: 1: 16-33182 & 257-2147483647
[0] MPI startup(): Reduce_scatter: 3: 33182-3763779 & 257-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-2147483647 & 257-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-256
[0] MPI startup(): Reduce: 3: 4-45 & 257-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 257-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 3: 0-2147483647 & 0-8
[0] MPI startup(): Scatter: 3: 1-140 & 9-16
[0] MPI startup(): Scatter: 1: 141-1302 & 9-16
[0] MPI startup(): Scatter: 3: 0-2147483647 & 9-16
[0] MPI startup(): Scatter: 3: 1-159 & 17-32
[0] MPI startup(): Scatter: 1: 160-486 & 17-32
[0] MPI startup(): Scatter: 3: 0-2147483647 & 17-32
[0] MPI startup(): Scatter: 1: 1-149 & 33-64
[0] MPI startup(): Scatter: 3: 0-2147483647 & 33-64
[0] MPI startup(): Scatter: 1: 1-139 & 65-2147483647
[0] MPI startup(): Scatter: 3: 0-2147483647 & 65-2147483647
[0] MPI startup(): Scatterv: 1: 0-2147483647 & 0-256
[0] MPI startup(): Scatterv: 2: 0-2147483647 & 257-2147483647
[1] MPI startup(): Recognition=2 Platform(code=8 ippn=2 dev=6) Fabric(intra=1 inter=4 flags=0x0)
[3] MPI startup(): Recognition=2 Platform(code=8 ippn=2 dev=6) Fabric(intra=1 inter=4 flags=0x0)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 125336 c560-802.stampede.tacc.utexas.edu {8,9,10,11}
[0] MPI startup(): 1 125337 c560-802.stampede.tacc.utexas.edu {12,13,14,15}
[2] MPI startup(): Recognition=2 Platform(code=8 ippn=2 dev=6) Fabric(intra=1 inter=4 flags=0x0)
[0] MPI startup(): 2 125338 c560-802.stampede.tacc.utexas.edu {0,1,2,3}
[0] MPI startup(): 3 125339 c560-802.stampede.tacc.utexas.edu {4,5,6,7}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=2 dev=6) Fabric(intra=1 inter=4 flags=0x0)
[0] MPI startup(): I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u
[0] MPI startup(): I_MPI_DEBUG=6
[0] MPI startup(): I_MPI_FABRICS=shm:dapl
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_MAP=mlx4_0:1,scif0:-1,mic0:1
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_PIN_MAPPING=4:0 8,1 12,2 0,3 4
4 tasks, 2 nodes:
[0] MPI startup(): Intel(R) MPI Library, Version 5.0 Update 2 Build 20141030 (build id: 10994)
[0] MPI startup(): Copyright (C) 2003-2014 Intel Corporation. All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[2] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[3] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[3] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[3] MPI startup(): shm and dapl data transfer modes
[2] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[2] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): shm and dapl data transfer modes
[2] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[2] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[1] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[1] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[3] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[3] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[0] MPI startup(): Device_reset_idx=0
[0] MPI startup(): Allgather: 1: 1-490 & 0-8
[0] MPI startup(): Allgather: 2: 491-558 & 0-8
[0] MPI startup(): Allgather: 1: 559-2319 & 0-8
[0] MPI startup(): Allgather: 3: 2320-46227 & 0-8
[0] MPI startup(): Allgather: 1: 46228-2215101 & 0-8
[0] MPI startup(): Allgather: 3: 0-2147483647 & 0-8
[0] MPI startup(): Allgather: 1: 1-1005 & 9-16
[0] MPI startup(): Allgather: 2: 1006-1042 & 9-16
[0] MPI startup(): Allgather: 1: 1043-2059 & 9-16
[0] MPI startup(): Allgather: 3: 0-2147483647 & 9-16
[0] MPI startup(): Allgather: 1: 1-2454 & 17-2147483647
[0] MPI startup(): Allgather: 3: 0-2147483647 & 17-2147483647
[0] MPI startup(): Allgatherv: 1: 0-3147 & 0-4
[0] MPI startup(): Allgatherv: 2: 3147-5622 & 0-4
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 0-4
[0] MPI startup(): Allgatherv: 1: 0-975 & 5-8
[0] MPI startup(): Allgatherv: 2: 975-4158 & 5-8
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 5-8
[0] MPI startup(): Allgatherv: 1: 0-2146 & 9-16
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 9-16
[0] MPI startup(): Allgatherv: 1: 0-81 & 17-32
[0] MPI startup(): Allgatherv: 2: 81-414 & 17-32
[0] MPI startup(): Allgatherv: 1: 414-1190 & 17-32
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 17-32
[0] MPI startup(): Allgatherv: 2: 0-1 & 33-2147483647
[0] MPI startup(): Allgatherv: 1: 1-3 & 33-2147483647
[0] MPI startup(): Allgatherv: 2: 3-783 & 33-2147483647
[0] MPI startup(): Allgatherv: 4: 783-1782 & 33-2147483647
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 33-2147483647
[0] MPI startup(): Allreduce: 7: 0-2084 & 0-4
[0] MPI startup(): Allreduce: 1: 2084-15216 & 0-4
[0] MPI startup(): Allreduce: 7: 15216-99715 & 0-4
[0] MPI startup(): Allreduce: 3: 99715-168666 & 0-4
[0] MPI startup(): Allreduce: 2: 168666-363889 & 0-4
[0] MPI startup(): Allreduce: 7: 0-2147483647 & 0-4
[0] MPI startup(): Allreduce: 1: 0-14978 & 5-8
[0] MPI startup(): Allreduce: 2: 14978-66879 & 5-8
[0] MPI startup(): Allreduce: 8: 66879-179296 & 5-8
[0] MPI startup(): Allreduce: 3: 179296-304801 & 5-8
[0] MPI startup(): Allreduce: 7: 304801-704509 & 5-8
[0] MPI startup(): Allreduce: 2: 0-2147483647 & 5-8
[0] MPI startup(): Allreduce: 1: 0-16405 & 9-16
[0] MPI startup(): Allreduce: 2: 16405-81784 & 9-16
[3] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=6) Fabric(intra=1 inter=4 flags=0x0)
[0] MPI startup(): Allreduce: 8: 81784-346385 & 9-16
[0] MPI startup(): Allreduce: 7: 346385-807546 & 9-16
[0] MPI startup(): Allreduce: 2: 807546-1259854 & 9-16
[2] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=6) Fabric(intra=1 inter=4 flags=0x0)
[0] MPI startup(): Allreduce: 3: 0-2147483647 & 9-16
[0] MPI startup(): Allreduce: 1: 0-8913 & 17-32
[0] MPI startup(): Allreduce: 2: 8913-103578 & 17-32
[0] MPI startup(): Allreduce: 8: 103578-615876 & 17-32
[0] MPI startup(): Allreduce: 2: 0-2147483647 & 17-32
[0] MPI startup(): Allreduce: 1: 0-1000 & 33-64
[0] MPI startup(): Allreduce: 2: 1000-2249 & 33-64
[0] MPI startup(): Allreduce: 1: 2249-6029 & 33-64
[0] MPI startup(): Allreduce: 2: 6029-325357 & 33-64
[0] MPI startup(): Allreduce: 8: 325357-1470976 & 33-64
[0] MPI startup(): Allreduce: 7: 1470976-2556670 & 33-64
[0] MPI startup(): Allreduce: 3: 0-2147483647 & 33-64
[0] MPI startup(): Allreduce: 1: 0-664 & 65-128
[0] MPI startup(): Allreduce: 2: 664-754706 & 65-128
[0] MPI startup(): Allreduce: 4: 754706-1663862 & 65-128
[0] MPI startup(): Allreduce: 2: 1663862-3269097 & 65-128
[0] MPI startup(): Allreduce: 7: 0-2147483647 & 65-128
[0] MPI startup(): Allreduce: 1: 0-789 & 129-2147483647
[0] MPI startup(): Allreduce: 2: 789-2247589 & 129-2147483647
[0] MPI startup(): Allreduce: 8: 0-2147483647 & 129-2147483647
[0] MPI startup(): Alltoall: 2: 0-1 & 0-2
[0] MPI startup(): Alltoall: 3: 2-64 & 0-2
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 0-2
[0] MPI startup(): Alltoall: 2: 0-0 & 3-4
[0] MPI startup(): Alltoall: 3: 1-119 & 3-4
[0] MPI startup(): Alltoall: 1: 120-256 & 3-4
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 3-4
[0] MPI startup(): Alltoall: 1: 0-1599 & 5-8
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 5-8
[0] MPI startup(): Alltoall: 2: 0-0 & 9-16
[0] MPI startup(): Alltoall: 1: 1-8 & 9-16
[0] MPI startup(): Alltoall: 2: 9-36445 & 9-16
[0] MPI startup(): Alltoall: 3: 36446-163048 & 9-16
[0] MPI startup(): Alltoall: 4: 163049-524288 & 9-16
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 9-16
[0] MPI startup(): Alltoall: 1: 0-789 & 17-32
[0] MPI startup(): Alltoall: 2: 790-78011 & 17-32
[0] MPI startup(): Alltoall: 3: 78012-378446 & 17-32
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 17-32
[0] MPI startup(): Alltoall: 1: 0-517 & 33-64
[0] MPI startup(): Alltoall: 4: 518-4155 & 33-64
[0] MPI startup(): Alltoall: 2: 4156-124007 & 33-64
[0] MPI startup(): Alltoall: 3: 124008-411471 & 33-64
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 33-64
[0] MPI startup(): Alltoall: 1: 0-260 & 65-128
[0] MPI startup(): Alltoall: 4: 261-4618 & 65-128
[0] MPI startup(): Alltoall: 2: 4619-65536 & 65-128
[0] MPI startup(): Alltoall: 3: 65537-262144 & 65-128
[0] MPI startup(): Alltoall: 4: 262145-611317 & 65-128
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 65-128
[0] MPI startup(): Alltoall: 1: 0-133 & 129-2147483647
[0] MPI startup(): Alltoall: 4: 134-5227 & 129-2147483647
[0] MPI startup(): Alltoall: 2: 5228-17246 & 129-2147483647
[0] MPI startup(): Alltoall: 4: 17247-32768 & 129-2147483647
[0] MPI startup(): Alltoall: 3: 32769-365013 & 129-2147483647
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 129-2147483647
[0] MPI startup(): Alltoallv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 1: 0-2147483647 & 0-2
[0] MPI startup(): Barrier: 3: 0-2147483647 & 3-4
[0] MPI startup(): Barrier: 5: 0-2147483647 & 5-8
[0] MPI startup(): Barrier: 2: 0-2147483647 & 9-32
[0] MPI startup(): Barrier: 3: 0-2147483647 & 33-128
[0] MPI startup(): Barrier: 4: 0-2147483647 & 129-2147483647
[0] MPI startup(): Bcast: 4: 1-806 & 0-4
[0] MPI startup(): Bcast: 7: 807-18093 & 0-4
[0] MPI startup(): Bcast: 6: 18094-51366 & 0-4
[0] MPI startup(): Bcast: 4: 51367-182526 & 0-4
[0] MPI startup(): Bcast: 1: 182527-618390 & 0-4
[0] MPI startup(): Bcast: 7: 0-2147483647 & 0-4
[0] MPI startup(): Bcast: 1: 1-24 & 5-8
[0] MPI startup(): Bcast: 4: 25-74 & 5-8
[0] MPI startup(): Bcast: 1: 75-18137 & 5-8
[0] MPI startup(): Bcast: 7: 18138-614661 & 5-8
[0] MPI startup(): Bcast: 1: 614662-1284626 & 5-8
[0] MPI startup(): Bcast: 2: 0-2147483647 & 5-8
[0] MPI startup(): Bcast: 1: 1-1 & 9-16
[0] MPI startup(): Bcast: 7: 2-158 & 9-16
[0] MPI startup(): Bcast: 1: 159-16955 & 9-16
[0] MPI startup(): Bcast: 7: 0-2147483647 & 9-16
[0] MPI startup(): Bcast: 7: 1-242 & 17-32
[0] MPI startup(): Bcast: 1: 243-10345 & 17-32
[0] MPI startup(): Bcast: 7: 0-2147483647 & 17-32
[0] MPI startup(): Bcast: 1: 1-1 & 33-2147483647
[0] MPI startup(): Bcast: 7: 2-737 & 33-2147483647
[0] MPI startup(): Bcast: 1: 738-5340 & 33-2147483647
[0] MPI startup(): Bcast: 7: 0-2147483647 & 33-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 1: 0-6 & 0-2
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2
[0] MPI startup(): Reduce_scatter: 4: 0-5 & 3-4
[0] MPI startup(): Reduce_scatter: 5: 5-13 & 3-4
[0] MPI startup(): Reduce_scatter: 3: 13-59 & 3-4
[0] MPI startup(): Reduce_scatter: 1: 59-76 & 3-4
[0] MPI startup(): Reduce_scatter: 3: 76-91488 & 3-4
[0] MPI startup(): Reduce_scatter: 4: 91488-680063 & 3-4
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 3-4
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 5-8
[0] MPI startup(): Reduce_scatter: 5: 4-11 & 5-8
[0] MPI startup(): Reduce_scatter: 1: 11-31 & 5-8
[0] MPI startup(): Reduce_scatter: 3: 31-69615 & 5-8
[0] MPI startup(): Reduce_scatter: 2: 69615-202632 & 5-8
[0] MPI startup(): Reduce_scatter: 5: 202632-396082 & 5-8
[0] MPI startup(): Reduce_scatter: 4: 396082-1495696 & 5-8
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 5-8
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 9-16
[0] MPI startup(): Reduce_scatter: 1: 4-345 & 9-16
[0] MPI startup(): Reduce_scatter: 3: 345-79523 & 9-16
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 9-16
[0] MPI startup(): Reduce_scatter: 3: 0-0 & 17-32
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 17-32
[0] MPI startup(): Reduce_scatter: 1: 4-992 & 17-32
[0] MPI startup(): Reduce_scatter: 3: 992-71417 & 17-32
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 17-32
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 33-64
[0] MPI startup(): Reduce_scatter: 1: 4-1472 & 33-64
[0] MPI startup(): Reduce_scatter: 3: 1472-196592 & 33-64
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 33-64
[0] MPI startup(): Reduce_scatter: 3: 0-0 & 65-128
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 65-128
[0] MPI startup(): Reduce_scatter: 1: 4-32892 & 65-128
[0] MPI startup(): Reduce_scatter: 3: 32892-381072 & 65-128
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 65-128
[0] MPI startup(): Reduce_scatter: 2: 0-0 & 129-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-4 & 129-2147483647
[0] MPI startup(): Reduce_scatter: 1: 4-33262 & 129-2147483647
[0] MPI startup(): Reduce_scatter: 3: 33262-1571397 & 129-2147483647
[0] MPI startup(): Reduce_scatter: 5: 1571397-2211398 & 129-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-2147483647 & 129-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2
[0] MPI startup(): Reduce: 3: 0-10541 & 3-4
[0] MPI startup(): Reduce: 1: 0-2147483647 & 3-4
[0] MPI startup(): Reduce: 1: 0-2147483647 & 5-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatterv: 1: 0-2147483647 & 0-2147483647
[1] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=6) Fabric(intra=1 inter=4 flags=0x0)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 29454 c557-604.stampede.tacc.utexas.edu {8,9,10,11,12,13,14,15}
[0] MPI startup(): 1 29455 c557-604.stampede.tacc.utexas.edu {0,1,2,3,4,5,6,7}
[0] MPI startup(): 2 71567 c558-304.stampede.tacc.utexas.edu {8,9,10,11,12,13,14,15}
[0] MPI startup(): 3 71568 c558-304.stampede.tacc.utexas.edu {0,1,2,3,4,5,6,7}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=6) Fabric(intra=1 inter=4 flags=0x0)
[0] MPI startup(): I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u
[0] MPI startup(): I_MPI_DEBUG=6
[0] MPI startup(): I_MPI_FABRICS=shm:dapl
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_MAP=mlx4_0:1,scif0:-1,mic0:1
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_PIN_MAPPING=2:0 8,1 0
Please note that I need to re-log-in in order to change the task configurations, that's why different nodes are used between the two cases. (But the results are always repeatable anyway.)
Thanks very for your help!
Best,
Andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With mixed feelings, it seems that disabling Infiniband altogether fixes the problem, i.e., export I_MPI_FABRICS=tcp.
Thanks,
Andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrew,
As far as I see according to the provided log files it doesn't look like an MPI error:
ERROR 2189: fatal error -- debug output follows
Number of non normalized orbitals 194
Largest normalization error 16747.5406248335
Number of non orthogonal pairs 67803
Largest orthogonalization error 4957.20919383731
orbital 126 is not normalized: 2057.38967586926
orbital 127 is not normalized: 3530.87444859233
orbital 128 is not normalized: 2152.55926608134
orbital 129 is not normalized: 1262.41856845017
******** MANY ERRORS LIKE THESE, INDICATING NUMERICAL/PRECISION ISSUES ********
orbitals 126 and 1 are not orthogonal: -0.23166245E-01
orbitals 126 and 2 are not orthogonal: -0.59257524E-02
orbitals 126 and 3 are not orthogonal: -0.68079900E-02
orbitals 126 and 4 are not orthogonal: 0.15700180E-01
******** EVEN MORE ERRORS LIKE THESE, INDICATING NUMERICAL/PRECISION ISSUES ********
----------------------------------------------------------------------
Jaguar cannot recover from this error and will now abort.
For technical support please contact the Schrodinger Support Center at
http://www.schrodinger.com/supportcenter/ or help@schrodinger.com
----------------------------------------------------------------------
I'd check the application parameters (input data and so on).
The only thing that we are able to do from MPI perspective is to play with different collective algorithms. For this you can try to run your problem task (4 MPI processes on 2 nodes) with I_MPI_FABRICS=tcp and gather IMPI statistics (with I_MPI_STATS=20) to understand what collective operations and message lengths are mostly used in the application. And then try to vary algorithms for most used MPI collective operations. IMPI statistics are saved into stats.txt file by default - please provide it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem,
Thank you so much for looking into this.
I may have been content with I_MPI_FABRICS=tcp solving the precision/numerical issue I originally posted about. It is apparently a known issue that Jaguar seg faults when using Infiniband, at least for Open MPI and now apparently for my build with Intel MPI.
However, I am finding on a test job using 2 nodes, 1 task/node, and 16 threads/task that my build using Intel MPI uses only 54% of the total CPU on average. In contrast, a version of the program built on a different machine (but run on this machine) using a version of Open MPI that comes bundled with it uses 70% of the total CPU on average (still not great obviously).
Would you guess that your suggestion regarding gathering IMPI statistics would be the way to increase the efficiency of my Intel MPI build? Should I try using the Intel Trace Analyzer and Collector program to do this as well?
Regardless, I will provide you with the stats.txt file ASAP, I was just wondering how related this problem was in the meantime.
Thank you,
Andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem,
I apologize for the delay, but I have been doing lots of testing on this.
First, I got Infiniband working on most of the subprograms that Jaguar calls, so I am now less concerned about the efficiency of the collectives at the moment.
I am also less concerned about my build using only 54% of the CPUs as opposed to the release version using 70%, because Jerome Vienne at TACC suggested that this sort of thing is hard to interpret and it may be best to compare overall execution time. Since my build is still 20% faster than the release version, I am therefore okay with this.
However, I still tried what you suggested, setting I_MPI_STATS, but I found that setting it to anything other than 0 returned seg-fault-looking errors.
So I figured that using the Intel Trace Analyzer and Collector programs would give similar results. I am a novice on that software but upon running a small-medium-sized test system on 4 nodes I found that these were the highest-usage functions (out of the total run time of 1162s). Please note that the job runs 1 task on each node and uses all 16 CPUs of threading per task:
- MPI_Barrier: 78s
- MPI_Allreduce: 46s
- MPI_Bcast: 19s
- MPI_Comm_create: 6s
Does this seem reasonable or would you strongly recommend tuning the collectives algorithms used? I attached the STF files if those will suffice instead of the stats.txt that I was unable to generate. I'm not really sure how to interpret these Intel Trace Analyzer results. Note that I generated the "ideal" configurations too in order to then generate the imbalance diagrams...just "ls *.stf" and you'll see what I mean.
Thanks very much for your help, Artem.
Best,
Andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrew,
There're a lot of stf files in your archive which one should I look at?
Also I'd recommend you to try the latest Intel MPI Library 5.1.1 and Intel Trace Analyzer and Collector (ITAC) 9.1.1 if possible. It may prevent crashes in case of I_MPI_STATS using. Also there's a tool MPI Performance Snapshot (part of ITAC) - the tool for preliminary analysis. It may help you to determine the problem area (if any).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem,
I really apologize for my late reply. I'd say at this point I'm fine with how everything is working, so no worries with the STF files. And thank you for your suggestion regarding the performance analysis tools.
Thanks very much for your fast and helpful replies, Artem!
Best,
Andrew

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page