- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I need help!!
We built a cluster today, with 1 MIC card in every node, and the server is E5-2670 V3 CPU, and we installed nfs, nis, rsh, ssh, mpss service, and we installed intel parallel studio 2015 update2. However, when I run xhpl_offload_intel64 which is from mkl/benchmarks/mp_linpack/bin_intel/intel64/benchmarks/mp_linpack/bin_intel/intel64/, there is no problem when running in one node, however, when I want to run linpack in 2 nodes by the command: mpirun --perhost 1 -n 2 -hosts ibmic01,ibmic03 -genv MIC_LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl_offload_intel64, there is an error: segmentation fault. and there is no process in any nodes. What's more, when I run the parallel application like WRF, PALABOS.
What's wrong with our cluster? Is there any problem with my environment? or our compiler has some problem?
Please help me within 24 hours. Thank you!!!!!!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A couple things right of the top:
The offload routines expect to have the MPI code run on the host systems. This code will then offload some of the work to the coprocessor. If the hosts are named mynode1, mynode2, etc then the coprocessors would, by default, be named mynode1_mic0, mynode2_mic0, etc. In the case of this code, you would want the host names to be mynode1, mynode2, etc.
The name of the default InfiniBand interface to the coprocessors when you using OFED is mic0:ib, but the host coprocessor names stay mynode1_mic0 or simply mic0 if you are looking at a single node. If you run one of the codes that actually runs MPI ranks on the coprocessor rather than just offloading code to it, Intel MPI will check for InfiniBand connections to the nodes listed in the -hosts option and use that if available.
First try changing the host names to be the names of the system on which the coprocessor cards are installed and see if the code runs for you. If you run micsmc on one of the hosts, you can see how much the coprocessor is being used on that node. As far as the configuration of your cluster, there may be some issues you want to revisit. For example, what do the names ibmic01, ibmic03, etc refer to? Are you running ipoib, which is not necessary for using the InfiniBand adapters for MPI? When you say you installed rsh, did you explicitly install it on the coprocessor?
I was looking around for the best documentation I could point you to for setting up a cluster, but I can't find what I was looking for just now. I post the link when I find it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Rancho,
Also, I would like to know if you were able to run any MPI program with Intel Xeon phi coprocessor in your cluster successfully. You can refer to this forum thread to run a sample MPI program on host and the coprocessor.
Let us know if that thread helps. If not, we can assist you further in running Intel Optimized MP linpack on your cluster.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SUNNY G. (Intel) wrote:
Hello Rancho,
Also, I would like to know if you were able to run any MPI program with Intel Xeon phi coprocessor in your cluster successfully. You can refer to this forum thread to run a sample MPI program on host and the coprocessor.
Let us know if that thread helps. If not, we can assist you further in running Intel Optimized MP linpack on your cluster.
Thanks
Hi, I think the coprocessor is fine, and my problem is that I can't run a parallel application like linpack in 2 or more nodes, I don't know what's going wrong!
Thank you anyway!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page