Are you referring to the new fault-tolerant behavior in Intel MPI Library 4.0? The FT support is limited and does not cover collectives. Feel free to check out the Reference Manual for details on provided functionality.
Intel MPI Library works according to MPI standard: completion of MPI_Bcast on the root only says that it complete sending, completion on non-root guarantee that we got all data (without loses). There are no additional confirmations to root.Regards!
I realise the time to broadcast 200MB of data increases signficant with increased nodes.
Say if I were to broadcast to 1 node, it takes < 2s to complete.
Broadcast to 2 nodes takes <6s to complete.
Broadcast to 3 nodes takes <10s to complete.
Forcing a multicast will allow me to complete transmitting to all nodes in the shortest time.
There are 2 options you could try:
1. Play with different algorithms using I_MPI_ADJUST_BCAST environment variable - see Reference Manual.
2. OFA Fabric in the Intel MPI Library supports multi-rail feature. Set I_MPI_FABRICS=shm:ofa, I_MPI_OFA_NUM_ADAPTERS=
Are you working on Windows?
OFA module is not supported on Windows platform! Sorry. And do not expect it in the nearest future. It means that you cannot use multi-rail feature either.
-genv I_MPI_ADJUST_BCAST '1:4-16;2:17-128;3:129-4096;7:4097-4000000'
Means that alrorithm 1 (Binominal) will be used for message from 4 to 16 bytes long, algorithm 2 (Recoursive doubling) for messages from 17 to 128 bytes long, algorithm 3 (Ring) for messages from 129 to 4K bytes long, algorithm 7 (Shumilin's) for large messages.
BTW: Intel MPI library doesn't support multi-cast communication.