Community
cancel
Showing results for 
Search instead for 
Did you mean: 
OOxed
Novice
1,255 Views

Intel Omni-Path implementation in an opensource HPC stack

Jump to solution

Hi,

I am currently building an Open Source HPC stack: https://github.com/oxedions/banquise GitHub - oxedions/banquise: HPC stack based on Salt

This stack is made do deploy and maintain an HPC cluster, but I can also be used as a base for other tasks.

However, I miss interconnect hardware to create other networks than basic Ethernet. I would like to add the Intel OmniPath compatibility in the stack.

Is there a specific documentation describing how to setup Intel interconnect on RHEL 7, and also Debian/SLES ? Also, is there a specific place to download packages with I suppose libs, kernel modules, and tools ?

With my best regards

Ox

1 Solution
idata
Community Manager
84 Views

Hello Ox,

 

 

Regarding your question, "I would like to add the Intel OmniPath compatibility in the stack. And documentation describing how to setup Intel interconnect on RHEL 7, and also Debian/SLES ? ".

 

 

Here are some information I found, http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fa... http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fa..., you could start on page 21.

 

 

I am going to research more on this and I will get back with you as soon as possible.

 

 

If there is anything else we can help please feel free to ask.

 

 

Best regards,

 

 

Henry A.

View solution in original post

9 Replies
idata
Community Manager
85 Views

Hello Ox,

 

 

Regarding your question, "I would like to add the Intel OmniPath compatibility in the stack. And documentation describing how to setup Intel interconnect on RHEL 7, and also Debian/SLES ? ".

 

 

Here are some information I found, http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fa... http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fa..., you could start on page 21.

 

 

I am going to research more on this and I will get back with you as soon as possible.

 

 

If there is anything else we can help please feel free to ask.

 

 

Best regards,

 

 

Henry A.

View solution in original post

idata
Community Manager
84 Views

Hello Ox Oxedions,

 

 

Please help us providing more information in order to better assist you with your question:

 

 

1. Are you trying to build an Intel OPA cluster?

 

 

2. Are you looking for documents on how to setup Intel's OPA fabric?

 

 

3. Are you looking for OPA software?

 

 

4. Please let us know on what do you mean with "Interconnect"

 

 

5. Please provide us more detailed information.

 

 

We will be looking forward for your response.

 

 

Best regards

 

Sergio S.

OOxed
Novice
84 Views

Hi Henry, Hi Sergio,

Thank you very much for your answers.

Henry: This is exactly what I was looking for. I didn't have the time to look deeper, but it's seems very similar to Mellanox OFED setup. I just have to extract what I need from the CLI install script (RPM installed and maybe other things done by this script).

I just fail to understand where the subnet manager (if needed) should be running, and if opensm can be used as the subnet manager. Maybe Intel Omnipath assume fabric switches are always in charge of this task, and that no SM should be running on a linux sever. It must be described somewhere in the pdf, I will find it.

Sergio:

1. Are you trying to build an Intel OPA cluster?

No, I don't have this hardware at disposal. However, Intel OPA cluster are now common in HPC, and this is why I want to add OPA support in my opensource tool. (see part 5).

2. Are you looking for documents on how to setup Intel's OPA fabric?

Yes.

3. Are you looking for OPA software?

Yes. I found download procedure in the document provided by Henry, I will get it this way.

4. Please let us know on what do you mean with "Interconnect"

Of course. In common HPC clusters, there are 2 kinds of networks: eth network, for administration purposes, and interconnect. Interconnects are used for parallel computations (data exchanges between processes) and for IO (Parallel file system or NFS), because they provide very low latency and high bandwidth. Famous interconnects are Qlogic (now Intel) Infiniband, Mellanox Infiniband, Cray Aries and Intel Omnipath.

Also, another difference with standard ethernet is that these interconnect networks are relying on a subnetmanager, that scan the network to provide a map with shortest path to all nodes, and discover new elements. Most of the time, the interconnect networks are using a specific topology to provide better performances (ftree, hypercube, alltoall, etc).

5. Please provide us more detailed information.

I am working on my free time on an OpenSource project, called Banquise. It is the result of what I think would be the next gen stack for HPC: no scripts, one tool, and ability to replay infinitely the "apply" to ensure its ok or to update things.

The aim of this project is to allow small universities/companies to easily deploy and maintain an HPC cluster, so I need to provide a simple way to setup many things, and in particular interconnects if I want my project to meet success. I don't have the hardware, everything I develop when related to a specific hardware is done "virtually", and will be tested later.

My aim here is to do something similar to this for Intel Omnipath (assuming it works like Mellanox IB):

- a general state "client side", that install basic rpm, start needed services/load needed kernel modules, then that setup ipoib.

- a specific state "server side" that install what is needed on the management server (Subnet Manager, monitoring tools, etc).

- some probes for monitoring (like perfquery tool for infiniband) to be added in Shinken.

I do not plan to provide a way to configure switches, this is the task of Intel tools, so I just assume the switches are already up and configured, or that the model used are passive.

I hope these information will help you understand what is my goal and what I am looking for.

With my best regards

Ox

idata
Community Manager
84 Views

Hello Ox Oxedions;

Thank you for taking the time to provide more additional information, we are going to check on it and let you know as soon we have a response for you.

Thank you for your patience on this matter.

Regards.

Sergio S.

OOxed
Novice
84 Views

Hi Sergio,

Thank you very much.

I am the one asking, so I will be patient.

And I still have few things to code before I can add the Interconnect states :-)

With my best regards

Ox

idata
Community Manager
84 Views

Hello Ox,

 

 

Thanks for the update.

 

 

We are still working on your case and as soon as we have an update we will proceed to reply back to you.

 

 

Best regards,

 

Caesar B.

 

idata
Community Manager
84 Views

Hello OX,

 

 

For documents on how to setup Intel's Omni-Path Fabric.

 

 

A good starting point is the "http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fa... Intel Omni-path Fabric Staging Guide" (10.2 and 10.3 releases, and the name may be changed to something like Setup Guide in 10.4 release).

 

 

For switch hardware setup, detailed instruction is documented in the "Ihttp://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fa... ntel® Omni-Path Fabric Switches Hardware Installation Guide". Setup of the host interface card is documented in "http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_Omni_... Intel® Omni-Path Host Fabric Interface Installation Guide".

 

 

For software setup, detailed instruction is documented in the http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Fa... "Intel® Omni-Path Fabric Software Installation Guide". All documents are available for public download and updated with each OPA software release. ( http://www.intel.com/content/www/us/en/support/network-and-i-o/fabric-products/000016242.html http://www.intel.com/content/www/us/en/support/network-and-i-o/fabric-products/000016242.html )

 

 

The interconnect networks that are relying on a subnet manager, that scan the network to provide a map with shortest path to all nodes, and discover new elements are called "Fabric" by Intel. These interconnect networks are using a specific topologies to provide better performances (ftree, hypercube, alltoall, etc).

 

 

Please take into consideration there are other documents released with each Omni-Path(OPA) software release. Some examples are "http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_... Intel® Performance Scaled Messaging 2 (PSM2) Programmer's Guide", "http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Pe... Intel® Omni-Path Fabric Performance Tuning User Guide", etc.

 

 

Please let me know if you have any more questions.

 

 

Best regards,

 

Caesar B.

 

 

 

 

 

 

 

OOxed
Novice
84 Views

Hi Caesar,

My apologies for the delay of my answer.

Using this, I will add an experimental support for Intel OPA in the next release of my tool, and then find someone with the good hardware to test it.

Because I am focusing on the main engine to enable multiple nodes group (needed for monitoring, but iwll be usefull for other softwares) and multiple networks, it may not be released before this summer.

Thank you very much for all of these files and links !

With my best regards

Ox

idata
Community Manager
84 Views

You are welcome, do not hesitate in contacting us if you need further assistance.

 

 

Best Regards,

 

 

Steven V.
Reply