- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I've compiled and configured HPL to run in a system with Xeon Phi but I have a problem with the linpack run.
I've copied HPL.dat and xhpl under /tmp of mic0, set the I_MPI_MIC enabled and /sbin/sysctl -w net.ipv4.ip_forward=1
Then I've run the following command:
HOST# mpirun -hosts mic0 -n 114 -wdir /tmp ./xhpl
I opened a top on the MIC card and actually the run stared, but after a minute I got the following message:
Connection to mic0 closed by remote host.
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
after that HPL stopped and all the terminals opened on mic0 were closed.
Do you have any suggestion about it?
Thanks as always for your help!
Do you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Francesca,
My suggestion is to try with a small number of rank (n =1) to see if the same problem occurs. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I've made some tests and I've seen the following behavior.
Usually I calculate the HPL Ns using the 80% of the memory, so since the Phi cards have 6GB, I used 4.8GB of memory which gives me Ns=24495. With this value I have the issue that I wrote in the previous post.
I tried then using Ns=14495 (just to try) and HPL actually worked fine without any issue.
During this tests I've always used n = 228, so I suppose it's not a matter of the rank size but could be a memory-related problem.
What do you think about this?
Are there any memory limit on the phi cards?
For the record I'm using hpl-2.1 and CentOS 6.3.
Thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
a little update: I've tried with n=16 and I don't have the issue if I calculate Ns using 50% of the memory (if I do the same test with n=228 I have the connection problem).
Then I tried to use n=16 with Ns calculated with the 80% of the memory and again the connection problem occurred.
Am I missing some confuguration step or HPL setting?
Thanks as always!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would you like to point me where to get your HPL code? I need to test it.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I've downloaded the source code from here: http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz
I put in attachment my Make file that compiled it.
I've also foud, looking at the Phi monitor that there is an oversized allocation of memory compared to the memory that I use in order to calculate N. So it seems that I got that error and that exit from the program because it was allocated too much memory ( > 6GB).
Do you know why it has this behaviour? Are there any particular memory settings or variables to export?
I've tried to summarize what I saw in a table that I also put in attachment.
This behaviour happened also if I use the Intel mp_linpack under the mkl folder.
Thank you very much for your support!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I've downloaded the HPL source code from here: http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz
I've put in attachment the Make file that compiled it.
I've also discovered looking at the mic monitor that there is an oversized allocation of memory compared to the amount of memory that I use to calculate N. In fact I saw that the problem of exiting and killing the HPL application occurs when it allocates too much memory ( >6GB).
I've tried to summarize my tests in a table that I've also put in attachment.
Do you know why it has this behavior? Are there any memory settings of variables to export that I'm missing?
Thank you for your support!
ps: sorry if this post is a copy but I've lost the previous one that I wrote this morning :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried but failed to build the HPL-2.1 using your Makefile. Below are the steps I tried:
1. Download hpl-2.1.tar.gz
2. Untar the file:
> gunzip hpl-2.1.tar.gz; tar -xvf hlp-2.1tar
That created a directory called "hpl-2.1"
3. I copy that directory under /opt/ . Now we have /opt/hpl-2.1
4. Rename your file "make.intelmic.text" to "Makefile" and place it under /op/t/hpl-2.1:
> mv make.intelmic.txt /opt/hpl-2.1/Makefile
5. Build it:
> make
make: *** No targets. Stop
I am not sure why your Makefile doesn't work for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried but failed to build the HPL-2.1 using your Makefile. Below are the steps I tried:
1. Download hpl-2.1.tar.gz
2. Untar the file:
> gunzip hpl-2.1.tar.gz; tar -xvf hlp-2.1tar
That created a directory called "hpl-2.1"
3. I copy that directory under /opt/ . Now we have /opt/hpl-2.1
4. Rename your file "make.intelmic.text" to "Makefile" and place it under /op/t/hpl-2.1:
> mv make.intelmic.txt /opt/hpl-2.1/Makefile
5. Build it:
> make
make: *** No targets. Stop
I am not sure why your Makefile doesn't work for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
everything is ok till point 4. In this step you have to rename make.intelmic.text to Make.Intel under the directory /op/t/hpl-2.1:
mv make.intelmic.txt /opt/hpl-2.1/Make.Intel
then you can build it:
(under /opt/hpl-2.1/ )
make arch=Intel
Let meknow if now it's working for you too.
Thank you in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for the instruction. I was able to build and run the application now. After I transferred xhpl, HPL.dat to mic0, I run it succesfully with n=114 or n=256
> mpirun -host mic0 -n 256 -wdir /tmp ./xhpl
For your information, the system I use is RHEL 6.2 and it has MPSS 4982-15 installed. What MPSS version you have? Did you rebuild your MPSS for CentOS 6.3? If so, what version of the compiler?
Also, make sure to transfer mpiexec, pmi_proxy, libmpi.so.4 and libmpigf.so.4 to the coprocessor mic0 before running mpi.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
thank you for your tests. How much memory did you use? Which value did you assign to N?
Actually in my system it is installed CentOS 6.3 with MPSS 5889-16. I've downloaded from the Intel website the package KNC_gold_update_2-2.1.5889-16-rhel-6.3.tar (it is related to the same kernel version that I have: 2.6.32-279.el6.x86_64) so I did not rebuit MPSS for my distribution.
I'm using icc 13.1.0 20130121. and the mpiicc that is under /opt/intel/impi/4.1.0/bin64/, is that the correct mpiicc to use? Or should I use the mpiicc under /opt/intel/impi/4.1.0/mic/bin/?
And also I did trasfer all the files you mentioned, without them the run can't start at all.
Thanks for you support!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Francesca,
Using the tool micsmc, I saw that memory is used about 25% when the program runs. I just use whatever the default value of N is used when I type "mpirun -host mic0 -n 256 -wdir /tmp /xhpl".
Your MPSS version is more recent than mine, your Intel composer is the same. Could you verify the MPI version by typing "ls -l /opt/intel/impi" please? Mine is 4.1.0.030.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have the 4.1.0.030 version of impi.
In my case using micsmc I saw that the memory is over allocated that's why HPL chushed.
Have you set any variable like OMP_NUM_THREADS during your run?
Thanks for you support!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have the 4.1.0.030 version of impi.
In my case using micsmc I saw that the memory is over allocated that's why HPL chushed.
Have you set any variable like OMP_NUM_THREADS during your run?
Thanks for you support!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I didn't set any env variable at all. The only difference is that you ran MPSS built for rhel on your CentOS, maybe that is the cause? Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
solutions usefull to me also.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am facing issue in running Linpack on Intel Phi 5110P. I am using Intel mpss-3.1.2 with n=24000. I used mpirun -n 200 -host mic0 -wdir /tmp ./xhpl. But, it is giving the error----
HPL ERROR from process # 0, on line 246 of function HPL_pdtest:
>>> [0,0] Memory allocation failed for A, x and b. Skip. <<<
Thanks in advance.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page