Intel® HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

How to set the size of CQ?

oleotiger
Novice
2,376 Views

From the device information with command ibv_devinfo -v, I got

        max_cq:                         20480
        max_cqe:                        65535

 

The max  number cqe is 65535.

 

With openmpi, I can set UCX_RC_TX_CQ_LEN(default 4096) to 65535. This value can gurantee that the length of CQE will not exceed the limit of configuration of the network card.

 

Is there envrionment variable of libfabric or mpi that can limit the number of CQE?

 

 

 

 

0 Kudos
10 Replies
VarshaS_Intel
Moderator
2,338 Views

Hi,

  Thanks for reaching out to us.

  We are working on your issue and we will get back to you soon.


Thanks & Regards

Varsha


0 Kudos
VarshaS_Intel
Moderator
2,269 Views

Hi,

Could you please specify to us which libfabric provider(Mellanox / PSM2 / Verbs) you are using currently?

 

Thanks & Regards

Varsha 

 

0 Kudos
oleotiger
Novice
2,250 Views

I have two servers with differernt libfabric providers.

 

One is equipped with Mellanox network card ( RoCE protocal ). 

One is equipped with Huawei Hi1822 network card ( RoCE supported ).

 

The max number of CQE of Hi1822 is 65535. So I;m looking for a way to set the max number of CQE of transport layer.

 

OpenUCX can control the number of CQR with environment variables.

 

Is there any way of libfabric or intelmpi that can limit the max number of cqe created?

0 Kudos
VarshaS_Intel
Moderator
2,227 Views

Hi,

Thank you for providing the information.


>>>With openmpi, I can set UCX_RC_TX_CQ_LEN(default 4096) to 65535.

Could you please let us know how you are checking that OpenMPI is obeying the setting of UCX_RC_TX_CQ_LEN?


Thanks & Regards

Varsha


0 Kudos
oleotiger
Novice
2,192 Views

As the infomation of the network card shown above, the max_cqe of my network card is 65535.

 

With setting UCX_RC_TX_CQ_LEN to 65535 or any number that is equal or less than 65535, I can run OSU benchmark with it.

Once setting UCX_RC_TX_CQ_LEN to 65536 or any number that is large than 65535, I can run OSU. It will raise exceptions.

 

So I get the conclusion that OpenMPI is obeying the setting of UCX_RC_TX_CQ_LEN.

0 Kudos
VarshaS_Intel
Moderator
2,123 Views

Hi,

 

Thanks for providing all the requested information.

 

We want to add some points regarding the UCX:      

•The Intel MPI uses UCX in the backend for Infiniband. The UCX variables are specific to the UCX framework and are not specific to Intel MPI/OpenMPI.

•UCX is a collaboration between industry, laboratories and academia to create an open-source production grade communication framework for data centric and HPC applications.

VarshaS_Intel_1-1632399899083.png

 

>>> Is there any way of libfabric or intelmpi that can limit the max number of cqe created?

As long as UCX obeys the setting of UCX_RC_TX_CQ_LEN, the MPI application is expected to see the effects of setting this UCX variable UCX_RC_TX_CQ_LEN.

 

Inline with your experiment, we have followed all the steps provided by you at our end with Intel MPI and we found a similar behavior which you got with OpenMPI.

 

When we run the command "ibv devinfo -v" we got the max_cqe = 4194303.

ucxexception.png

Also, we want to mention that we can not say that certain MPI library is obeying/disobeying the setting UCX_RC_TX_CQ_LEN.

 

Thanks & Regards

Varsha

 

0 Kudos
oleotiger
Novice
2,074 Views

If I'm using OFED, IntelMPI is using the stack : Libfabric-->OFED-->IB verbs/api. In this way there is no UCX. I want to know that if application is running in this way, is there any way that we can control the size of CQE ( as what UCX_RC_TX_CQ_LEN does)?

0 Kudos
VarshaS_Intel
Moderator
2,012 Views

Hi,

 

>>>I want to know that if application is running in this way, is there any way that we can control the size of CQE ( as what UCX_RC_TX_CQ_LEN does)?

UCX library will be used by Intel MPI Library only when the mlx provider is used. On the verbs side, such a control (i.e. UCX_RC_TX_CQ_LEN) does not exist.

 

Thanks & Regards

Varsha

 

0 Kudos
VarshaS_Intel
Moderator
1,972 Views

Hi,


We haven't heard back from you. Could you please provide an update?


Thanks & Regards

Varsha


0 Kudos
VarshaS_Intel
Moderator
1,928 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards

Varsha



0 Kudos
Reply