Hi Fernando,

Blanco__Fernando · ‎06-08-2020

Dear experts,

I compiled the IMB benchmark with intel19 and openmpi-4.0.3. All test run ok except IMB_EXT.

If I run IMB_EXT in one node, all is ok, but if I run it in 2 nodes, when it runs de benchmark Accumulate, several are ok but for example,

accumulate process = 64 mode Aggregate only execute this

#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 1000 0.00 0.01 0.01 0.00
4 1000 1.60 1.60 1.60 0.00
8 1000 1.40 1.41 1.41 0.00
16 1000 1.42 1.44 1.44 0.00
32 1000 1.45 1.46 1.46 0.00
64 1000 1.53 1.54 1.53 0.00
128 1000 1.52 1.52 1.52 0.00
256 1000 1.55 1.56 1.56 0.00
512 1000 1.51 1.52 1.52 0.00
1024 1000 1.66 1.68 1.68 0.00
2048 1000 1.59 1.59 1.59 0.00
4096 1000 2.48 2.49 2.49 0.00
8192 1000 2.50 2.51 2.50 0.00
16384 1000 3.68 3.69 3.69 0.00

The program doesn't finish, but it doesn't make anything

If I compile the test with CPPFLAGS=-DCHECK, I get errors in other accumulate test

#-----------------------------------------------------------------------------
# Benchmarking Accumulate
# #processes = 32
# ( 32 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#
# MODE: NON-AGGREGATE
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 100 7.66 7.69 7.68 0.00
0: Error Accumulate,size = 4,sample #0
Process 0: Got invalid buffer:
Buffer entry: 105.599998
pos: 0
Process 0: Expected buffer:
Buffer entry: 52.799999
4 100 117.85 117.95 117.90 1.00
0: Error Accumulate,size = 8,sample #0
Process 0: Got invalid buffer:
Buffer entry: 105.599998
pos: 0
Process 0: Expected buffer:
Buffer entry: 52.799999
.....

......

0: Error Accumulate,size = 4194304,sample #0
Process 0: Got invalid buffer:
Buffer entry: 105.499992
pos: 0
Process 0: Expected buffer:
Buffer entry: 52.799999
4194304 10 156021.70 194528.12 193324.60 0.00

#-----------------------------------------------------------------------------
# Benchmarking Accumulate
# #processes = 64
#-----------------------------------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 1000 0.00 0.01 0.01 0.00
4 1000 1.60 1.60 1.60 0.00
8 1000 1.40 1.41 1.41 0.00
16 1000 1.42 1.44 1.44 0.00
32 1000 1.45 1.46 1.46 0.00
64 1000 1.53 1.54 1.53 0.00
128 1000 1.52 1.52 1.52 0.00
256 1000 1.55 1.56 1.56 0.00
512 1000 1.51 1.52 1.52 0.00
1024 1000 1.66 1.68 1.68 0.00
2048 1000 1.59 1.59 1.59 0.00
4096 1000 2.48 2.49 2.49 0.00
8192 1000 2.50 2.51 2.50 0.00
16384 1000 3.68 3.69 3.69 0.00

And in this test not errors but it stop.

Any idea about this problem???

PrasanthD_intel · ‎06-11-2020

Hi Fernando,

We have tried to replicate the scenario using Intel MPI at our end but haven't faced any errors.

Could you tell us whether you are using the benchmarks that came with IMPI or are you cloning them from GitHub?

If possible could you update to the latest version of MPI and see if the error persists?

Regards

Prasanth

Blanco__Fernando · ‎06-11-2020

Hi Prasanth

I download the benchmarks from GitHub and I use openmpi-4.0.3. All test works fine, only the EXT was wrong.

PrasanthD_intel · ‎06-17-2020

Hi Fernando,

From the logs, it can be observed that the run stalled for 32 KB messages. This seems to me due to a large number of iterations (currently 1000).

1000 iterations might be too many for the some interconnect's.

Could you provide which interconnect you are using?

Can you check by reducing the number of iterations using the following flags,

-iter 20 -iter_policy off

The above flags will enforce 20 iterations for all message sizes and should let IMB complete significantly quicker.

Regards

Prasanth

Blanco__Fernando · ‎06-18-2020

With -iter 20 -iter_policy off it works!!!

Thank you very much!

PrasanthD_intel · ‎06-18-2020

Hi Fernando,

Good to know that the benchmark is running successfully.

Can we close this thread now?

You can always raise a new thread if you have any queries.

regards,

--Prasanth

PrasanthD_intel · ‎07-06-2020

Hi Fernando,

We are closing this thread considering your issue has been completely resolved.

Please raise a new thread for further queries.

Thanks

Prasanth

IMB_EXT no work