Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2228 Discussions

release vs. release_mt performance

alexey-medvedev-MSU
2,990 Views

On Lomonosov-2 supercomputer (http://hpc.msu.ru/node/159, partition "Test"), with IMPI2019u9 I've got a significant difference in IMPI-MPI1 results between release and release_mt library kinds. I wonder if this is an expected difference level (roughly 2x difference) or something must be tuned to get better figures on release_mt?

On 2 nodes with 14 cores each, verbs provider, I see on PingPong test:

# mpiexec.hydra -np 28 -ppn 14 IMB-MPI1 pingpong -multi 0 -map 14x2 -npmin 28
release:
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         0.99         1.85         1.19         0.00
            1         1000         1.00         2.01         1.36         0.50
            2         1000         1.01         2.03         1.39         0.98
            4         1000         1.03         2.07         1.44         1.93
            8         1000         1.02         2.09         1.44         3.82
           16         1000         0.99         2.09         1.43         7.65
           32         1000         1.00         2.11         1.44        15.19
           64         1000         1.07         2.17         1.49        29.55
          128         1000         1.09         2.25         1.55        56.99
          256         1000         1.58         2.79         2.11        91.81
          512         1000         1.69         2.96         2.26       172.76
         1024         1000         1.93         3.16         2.48       324.01
         2048         1000         2.40         3.77         3.06       543.28
         4096         1000         3.73         5.31         4.62       771.92
         8192         1000         7.05         9.27         8.17       883.79
        16384         1000        14.28        15.84        15.02      1034.12
        32768         1000        26.20        28.64        28.23      1144.21
        65536          640        52.23        56.97        56.20      1150.41
       131072          320       106.09       113.83       112.49      1151.49
       262144          160       198.19       228.21       222.12      1148.68
       524288           80       414.78       452.38       445.68      1158.94
      1048576           40       852.25       910.42       895.15      1151.75
      2097152           20      1674.94      1800.72      1771.30      1164.62
      4194304           10      3328.46      3580.92      3500.30      1171.29

release_mt:
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         1.17         1.18         1.18         0.00
            1         1000         1.04         1.05         1.05         1.91
            2         1000         1.06         1.06         1.06         3.76
            4         1000         1.05         1.06         1.06         7.55
            8         1000         1.05         1.06         1.06        15.08
           16         1000         1.07         1.08         1.07        29.74
           32         1000         1.09         1.09         1.09        58.51
           64         1000         1.09         1.11         1.10       115.46
          128         1000         1.09         1.10         1.10       232.43
          256         1000         1.16         1.17         1.17       437.35
          512         1000         1.60         1.61         1.60       637.11
         1024         1000         1.96         1.97         1.96      1039.45
         2048         1000         2.33         2.35         2.34      1746.43
         4096         1000         3.26         3.28         3.27      2499.51
         8192         1000         9.00         9.10         9.06      1801.16
        16384         1000        14.64        14.73        14.68      2224.33
        32768         1000        18.79        18.89        18.84      3469.87
        65536          640        33.53        33.82        33.70      3875.75
       131072          320        46.41        47.26        46.91      5546.99
       262144          160       170.80       190.11       181.74      2757.88
       524288           80       298.61       342.64       324.36      3060.27
      1048576           40       517.24       593.49       561.51      3533.58
      2097152           20      1783.70      1903.42      1855.13      2203.56
      4194304           10      2774.07      3524.14      3203.87      2380.33

 

On SendRecv test:

# mpiexec.hydra -np 28 -ppn 14 IMB-MPI1 sendrecv -multi 0 -map 14x2 -npmin 28
release:
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         1.49         1.49         1.49         0.00
            1         1000         1.58         1.59         1.59         1.26
            2         1000         1.62         1.63         1.62         2.46
            4         1000         1.66         1.67         1.66         4.80
            8         1000         1.67         1.68         1.67         9.55
           16         1000         1.67         1.67         1.67        19.14
           32         1000         1.68         1.68         1.68        38.03
           64         1000         1.72         1.73         1.72        74.09
          128         1000         1.77         1.78         1.77       143.92
          256         1000         2.34         2.34         2.34       218.43
          512         1000         2.51         2.52         2.51       406.40
         1024         1000         2.86         2.87         2.86       714.63
         2048         1000         3.65         3.66         3.66      1118.13
         4096         1000         8.22         8.25         8.24       992.80
         8192         1000        19.13        19.24        19.20       851.43
        16384         1000        33.33        33.51        33.45       977.78
        32768         1000        57.93        58.25        58.16      1125.04
        65536          640       115.23       115.76       115.58      1132.32
       131072          320       228.99       229.29       229.16      1143.31
       262144          160       456.14       462.42       459.60      1133.80
       524288           80       888.48       916.33       907.38      1144.32
      1048576           40      1791.33      1836.76      1819.39      1141.77
      2097152           20      3621.92      3685.21      3663.56      1138.14
      4194304           10      7400.42      7462.25      7447.10      1124.14

release_mt:
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         0.97         0.98         0.97         0.00
            1         1000         1.01         1.02         1.02         1.96
            2         1000         1.06         1.06         1.06         3.76
            4         1000         1.05         1.06         1.06         7.54
            8         1000         1.05         1.06         1.06        15.10
           16         1000         1.05         1.06         1.05        30.25
           32         1000         1.09         1.10         1.10        58.13
           64         1000         1.09         1.10         1.10       116.52
          128         1000         1.11         1.12         1.12       228.76
          256         1000         1.13         1.13         1.13       451.59
          512         1000         1.61         1.62         1.61       633.24
         1024         1000         1.93         1.94         1.94      1054.86
         2048         1000         2.33         2.35         2.34      1744.99
         4096         1000         3.24         3.26         3.25      2514.90
         8192         1000         8.87         8.99         8.93      1823.41
        16384         1000        14.05        14.13        14.08      2319.07
        32768         1000        18.73        18.83        18.77      3480.01
        65536          640        33.52        33.98        33.74      3857.27
       131072          320        76.41        77.55        76.98      3380.47
       262144          160       167.88       185.72       177.44      2823.00
       524288           80       279.85       333.56       307.80      3143.58
      1048576           40       512.18       604.98       555.47      3466.50
      2097152           20      1660.85      1872.98      1807.15      2239.38
      4194304           10      2820.28      3385.53      3197.91      2477.78


On Allreduce test:

# mpiexec.hydra -np 28 -ppn 14 IMB-MPI1 allreduce -npmin 28
release:
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.04         0.09         0.04
            4         1000         2.54         3.10         2.85
            8         1000         2.77         3.06         2.94
           16         1000         2.29         3.03         2.69
           32         1000         2.38         3.04         2.74
           64         1000         2.94         3.83         3.12
          128         1000         3.14         4.16         3.49
          256         1000         3.73         5.41         4.45
          512         1000         3.45         5.44         4.34
         1024         1000         4.91         6.64         5.63
         2048         1000         6.89         9.04         7.82
         4096         1000         9.65        13.11        11.24
         8192         1000        16.25        20.12        18.07
        16384         1000        30.61        44.25        34.31
        32768         1000        41.53        65.61        49.35
        65536          640        68.45       101.01        82.05
       131072          320       125.73       171.21       146.80
       262144          160       255.96       322.65       289.61
       524288           80       579.32       754.60       654.68
      1048576           40      1296.14      1642.46      1473.95
      2097152           20      3037.64      3842.48      3528.60
      4194304           10      7157.87      8204.61      7885.93
release_mt:
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.04         0.09         0.05
            4         1000         2.42         3.39         2.91
            8         1000         2.50         4.17         3.49
           16         1000         2.81         4.45         3.84
           32         1000         2.87         4.43         3.88
           64         1000         2.90         5.01         4.12
          128         1000         4.24         5.65         4.94
          256         1000         4.93         7.18         6.03
          512         1000         4.10         6.96         5.67
         1024         1000         5.40         8.31         7.00
         2048         1000         8.27        11.40         9.98
         4096         1000        12.66        17.31        15.15
         8192         1000        37.85        52.78        44.80
        16384         1000        64.47        85.19        77.01
        32768         1000       104.25       133.76       126.37
        65536          640       203.90       249.45       238.23
       131072          320       398.31       485.98       466.45
       262144          160       798.27      1023.07       959.00
       524288           80      1656.68      2016.64      1929.01
      1048576           40      3252.24      4003.58      3813.06
      2097152           20      6578.58      8080.62      7716.11
      4194304           10     14086.20     17127.67     16398.42

 

0 Kudos
7 Replies
alexey-medvedev-MSU
2,987 Views

(sorry, I mixed up release/release_mt headers in SendRecv and PingPong datasets)

--
Regards,
Alexey

0 Kudos
alexey-medvedev-MSU
2,975 Views

(one more mistake: for these PingPong and SendRecv runs, correct IMB-MPI1 map arg is: "-map 2x14". So this is intra-node communication, not a cross-node one).

0 Kudos
PrasanthD_intel
Moderator
2,959 Views

Hi Alexey,

We too have observed similar differences in the time taken by both versions.

We will investigate further and get back to you.

 

Regards

Prasanth

0 Kudos
PrasanthD_intel
Moderator
2,923 Views

Hi Alexey,

 

After contacting the internal team. I got the following response

"We shouldn't compare performances of release and release_mt. Release_mt is only for the advanced users who want to test and take advantage of the latest features and is currently not intended for public usage(general public)"

Also, the benchmarks for which you are testing don't have any release_mt features.

We believe this answer your question. and let us know if you have any other queries. Else, we can close this thread.

 

Regards

Prasanth

 

0 Kudos
alexey-medvedev-MSU
2,916 Views

Hi Prasanth,

>> We believe this answer your question

well ok, the orginal question was: "I wonder if this is an expected difference level (roughly 2x difference) or something must be tuned to get better figures on release_mt?"

Shall I take this as an answer: "this IS an expected difference level"? 

A few more questions arise:

Could you point where in the release documentation it is stated that "Release_mt is only for the advanced users" and "is currently not intended for public usage"? I searched text documents in the installation package, "developer guide", "release notes" and "known issues" on the Web, but failed to find anything like this.

Then, in my humble opinion, I AM an advanced user, moreover we've paid for 1 year support. Am I allowed to use release_mt and have some appropriate support, or my case is still a case of "public usage", which is not intended?

Since I_MPI_ASYNC_PROGRESS=1 is allowed only with release_mt library kind, does it mean that asynchronous progress feature is also "currently not intended for public usage"? If so, when this feature switched from a normal feature (as it used to be in IntelMPI 2017, 2018) to this state? Was it declared in the release notes?

Since lack of documentation on this topic, could you please inform me, if I_MPI_ASYNC_PROGRESS && release_mt are still "not intended for public usage" in IMPI 2021beta?

>> Also, the benchmarks for which you are testing don't have any release_mt features.

I'd like to comment on this: any typical large-scale HPC application uses both blocking and non-blocking communications of various kinds. So, mixed micro-benchmarking of blocking MPI interfaces and non-blocking ones, with and without communication-computation overlap is always appropriate.

--
Regards,
Alexey

0 Kudos
PrasanthD_intel
Moderator
2,892 Views

Hi Alexey,


Sorry if there has been a miscommunication when I mentioned "not for public usage" what I mean is that the release_mt is for advanced users and not for general users.


==>Am I allowed to use release_mt and have some appropriate support, or my case is still a case of "public usage", which is not intended?

Yes, there is support for release_mt.


As you have said that "I AM an advanced user, moreover we've paid for 1-year support." You can raise a ticket at https://software.intel.com/content/www/us/en/develop/support/priority-support.html and receive immediate support.


Also, I am escalating this thread to the internal team for better support.


Regards

Prasanth


0 Kudos
alexey-medvedev-MSU
2,845 Views

Hi Prasanth,

OK I think it is clear that in IMPI 2019 "release_mt" kind may be slower than "release" in some cases, and it is OK. Thing which is not quite clear: what is the status of I_MPI_ASYNC_PROGRESS feature, which is strongly tied to "release_mt": is it fully supported, or preview, or experimental, or recommended for limited usage scenarios, or is intended for a limited subset of product users or it is some other status? Is this status changing in IMPI 2021? I have a feeling that this topic should be clarified, this don't seem to be accuratelly and explicitely defined in the documentation. 

I will also submit the question on this topic via https://software.intel.com/content/www/us/en/develop/support/priority-support.html a bit later.

Thanks for help!

--
Regards,
Alexey 

 

0 Kudos
Reply