Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1983 Discussions

Intel oneAPI MPI runtime 2021.6 -- wrong CPU topology reported using hwloc

AlefRome
Novice
668 Views

Hi All,

 

this is to report that the Intel MPI Library runtime (versions 2021.5 and 2021.6) seem to have a buggy implementation when relying on the hwloc topology library.

 

When setting the environment variable I_MPI_HYDRA_TOPOLIB=hwloc (which is the default), the "cpuinfo" utility is not able to detect core IDs/placement and core/cache sharing.

 

This happens in the following conditions:

 

  • Operating Systems: CentOS 7, RHEL 7, RHEL 8, Oracle Linux 7, Oracle Linux 8
  • Microarch families: Skylake, Cascade Lake, Ice Lake

 

The problem causes incorrect CPU pinning for the MPI processes, which are all assigned to CPU #0 of each node, causing starvation and performance penalties.

 

When setting I_MPI_HYDRA_TOPOLIB=ipl, the "cpuinfo" utility shows the expected results, but when running mpiexec.hydra, this crashes (as reported in this thread: https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/Intel-oneAPI-MPI-runtime-2021-6-mpiexec-hydr...).

 

In the two archives that I have attached, you can find the following files:

 

  • intel-cpuinfo-utility-ipl.txt: the output from the "cpuinfo" utility when I_MPI_HYDRA_TOPOLIB is set to "ipl"
  • intel-cpuinfo-utility-hwloc.txt: the output from the "cpuinfo" utility when I_MPI_HYDRA_TOPOLIB is set to "hwloc"
  • hwloc-ls.txt: the output of the command "hwloc-ls" to show the topology
  • lstopo.xml: the output from hwloc, saved in XML format (this may allow replication of the problem in your labs)
  • lscpu.txt: the output from the Linux lscpu utility

 

Thank you very much.

Best Regards.

Pietro.

 

Labels (1)
0 Kudos
7 Replies
SantoshY_Intel
Moderator
640 Views

Hi,

 

Thank you for posting in Intel Communities.

 

We tried on a Linux machine, but we can't reproduce your issue at our end.

 

Below are the specifications of the machine:

Operating System: Centos 7

Model name :      Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

Processor : Ice Lake

 

We are able to get the result of "cpuinfo" correctly as shown in the below screenshot:

SantoshY_Intel_0-1653308596512.png

 

 

>>"The problem causes incorrect CPU pinning for the MPI processes, which are all assigned to CPU #0 of each node, causing starvation and performance penalties."
CPU pinning for MPI processes is working as expected at our end without any issues.

 

Thanks & Regards,

Santosh

AlefRome
Novice
623 Views

Hi Santosh, thanks for having quickly verified this on your side.

 

We have tried the hwloc topology library on another cloud provider and it works as expected.

Therefore, we tried to understand the differences in the two CPU topologies.

 

In the working situation, we have the following topology reported by the hwloc "lstopo-no-graphics" command: 

 

Machine (31GB)
  Package L#0 + L3 L#0 (54MB)
    L2 L#0 (1280KB) + L1d L#0 (48KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#8)
    L2 L#1 (1280KB) + L1d L#1 (48KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#9)
    L2 L#2 (1280KB) + L1d L#2 (48KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#10)
    L2 L#3 (1280KB) + L1d L#3 (48KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#11)
    L2 L#4 (1280KB) + L1d L#4 (48KB) + L1i L#4 (32KB) + Core L#4
      PU L#8 (P#4)
      PU L#9 (P#12)
    L2 L#5 (1280KB) + L1d L#5 (48KB) + L1i L#5 (32KB) + Core L#5
      PU L#10 (P#5)
      PU L#11 (P#13)
    L2 L#6 (1280KB) + L1d L#6 (48KB) + L1i L#6 (32KB) + Core L#6
      PU L#12 (P#6)
      PU L#13 (P#14)
    L2 L#7 (1280KB) + L1d L#7 (48KB) + L1i L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#15)

 

In the non-working scenario (on another cloud provider), we have the following hwloc "lstopo-no-graphics" output:

 

Machine (55GB)
  Package L#0 + L3 L#0 (16MB)
    L2 L#0 (4096KB) + Core L#0
      L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
    L2 L#1 (4096KB) + Core L#1
      L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
    L2 L#2 (4096KB) + Core L#2
      L1d L#4 (32KB) + L1i L#4 (32KB) + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (32KB) + PU L#5 (P#5)
    L2 L#3 (4096KB) + Core L#3
      L1d L#6 (32KB) + L1i L#6 (32KB) + PU L#6 (P#6)
      L1d L#7 (32KB) + L1i L#7 (32KB) + PU L#7 (P#7)

 

Basically, in the second case we have the "core" on the same line as the "L2" cache, and the PU on the same line as the L1d and L1i caches.

 

Could it be that the code implementing the hwloc library inside the MPI runtime cannot correctly "interpret" this configuration?

This is referring to an Intel(R) Xeon(R) Gold 6354 running on custom KVM hypervisor.

 

In the tar.gz files that I have attached to the thread, you have the XML file representing the topology of the non-working scenario.

Perhaps you may be able to partially see the issue by following the steps below:

 

  • copy the "lstopo.xml" file from one of the attached tar.gz file to your /tmp (/tmp/lstopo.xml)
  • export the environment variables as follows
export HWLOC_XMLFILE=/tmp/lstopo.xml
export HWLOC_THISSYSTEM=1
export HWLOC_COMPONENTS="xml,-custom,-no_os,-linux,-x86,-linuxpci,-synthetic"
export HWLOC_COMPONENTS_VERBOSE=1
  • finally, run the "cpuinfo" utility (and/or the lstopo-no-graphics to study the topology)

 

Thank you very much for the time you are dedicating to this.

Kind Regards.

Pietro.

SantoshY_Intel
Moderator
593 Views

Hi,

 

>>"We have tried the hwloc topology library on another cloud provider and it works as expected."

Could you please let us know which cloud provider you are using?

 

>>"Therefore, we tried to understand the differences in the two CPU topologies."

Are you comparing the working & non-working situations using the same CPU processor( Intel(R) Xeon(R) Gold 6354)?

 

I tried with the lstopo.xml file and followed your instructions, but it didn't help. we are getting an error as shown in the attachment(Screenshot1.png).

 

So, is there any need to change the contents of the XML file(lstopo.xml), or need to do any additional configuration?

 

Please get back to us with the above-requested details.

 

Thanks & Regards,

Santosh

 

AlefRome
Novice
571 Views

Hi Santosh, thanks.

 


Could you please let us know which cloud provider you are using?

Sure. The non-working scenario occurs on Oracle Cloud Infrastructure (with both Intel Xeon Platinum and Intel Xeon Gold based virtual machines). Virtual machines shapes affected by the issue are VM.Standard2.X, VM.Standard3.X and VM.Optimized3.X.

 

Are you comparing the working & non-working situations using the same CPU processor( Intel(R) Xeon(R) Gold 6354)?

The working scenario is on AWS, using an Intel Xeon Platinum processor (instance type is c5.4xlarge).

 

We think that the processor model is irrelevant. We suspect that the issue is caused by "how" the cloud provider hypervisor (customized, based on KVM) presents the CPU topology to the guest VM.

If this can help, we're available to provide you with SSH access to a VM on Oracle Cloud Infrastructure, with administrative privileges.

 

Regarding the error in the screenshot you attached: it seems that the /tmp/lstopo.xml file cannot be found (I/O warning). Could you please double check that the file has been copied from the tar.gz to /tmp?

 

The "lstopo.xml" file might be helpful to your Dev Team for performing an analysis on the MPI library code that reads the CPU topology and populates the collected information within its internal structures, and this leads to incorrect MPI process pinning.

 

Many thanks and Kind Regards.

Pietro.

 

 

SantoshY_Intel
Moderator
532 Views

Hi,


Thanks for providing all the details.


We have reported your issue to the concerned development team. they are looking into your issue.


Thanks & Regards,

Santosh


SantoshY_Intel
Moderator
509 Views

Hi,


We have provided your feedback to the relevant team. At this moment there is no visibility when it will be implemented and available for use. Please let me know if we can go ahead and close this case.


Thanks & Regards,

Santosh


SantoshY_Intel
Moderator
486 Views

Hi,


We are closing this thread. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Santosh


Reply