- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I've got a weird problem: I wanted to test the GLOPS performance of the Xeon Phi's that are entrusted to me: 2 x Xeon Phi 5110P, 1x Xeon Phi 7120 . I read that the linpack benchmark is included in Intel's MKL libs and that a Xeon Phi version is included. So I grabbed the binaries and ran them on my Xeon Phi's.
On the 7120 (with mpss 3.3.2) the benchmark runs fine:
Thu Feb 12 16:58:54 CET 2015 Intel(R) Optimized LINPACK Benchmark data Current date/time: Thu Feb 12 16:58:54 2015 CPU frequency: 1.238 GHz Number of CPUs: 1 Number of cores: 244 Number of threads: 244 Parameters are set to: Number of tests: 14 Number of equations to solve (problem size) : 2048 4096 6144 8192 10240 12288 14336 16384 18432 20480 22528 24576 26624 28672 Leading dimension of array : 2112 6208 6208 8256 10304 12352 14400 18496 18496 20544 22592 26688 26688 28736 Number of trials to run : 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Maximum memory requested that can be used=6591927552, at the size=28672 Performance Summary (GFlops) Size LDA Align. Average Maximal 2048 2112 4 62.4610 89.8029 4096 6208 4 254.9105 260.5183 6144 6208 4 399.6637 404.3374 8192 8256 4 484.3184 491.6444 10240 10304 4 577.4737 587.8460 12288 12352 4 639.3712 643.3008 14336 14400 4 696.0603 701.3388 16384 18496 4 744.9810 748.8416 18432 18496 4 788.7247 791.7044 20480 20544 4 818.3679 820.8570 22528 22592 4 846.7491 848.7561 24576 26688 4 868.7217 870.2109 26624 26688 4 884.2233 885.7552 28672 28736 4 896.8622 896.9412 Residual checks PASSED End of test
However, on both 5110P's (with mpss 3.4.2) the benchmark gets killed before it is complete!
mic0 $ cd linpack/ mic0 $ export LD_LIBRARY_PATH=$PWD mic0 $ ./runme_mic This is a SAMPLE run script for SMP LINPACK. Change it to reflect the correct number of CPUs/threads, problem input files, etc.. Fri Feb 13 10:01:12 CET 2015 Intel(R) Optimized LINPACK Benchmark data Current date/time: Fri Feb 13 10:01:12 2015 CPU frequency: 1.053 GHz Number of CPUs: 1 Number of cores: 240 Number of threads: 240 Parameters are set to: Number of tests: 14 Number of equations to solve (problem size) : 2048 4096 6144 8192 10240 12288 14336 16384 18432 20480 22528 24576 26624 28672 Leading dimension of array : 2112 6208 6208 8256 10304 12352 14400 18496 18496 20544 22592 26688 26688 28736 Number of trials to run : 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Maximum memory requested that can be used=6591927552, at the size=28672 =================== Timing linear equation system solver =================== Size LDA Align. Time(s) GFlops Residual Residual(norm) Check 2048 2112 4 0.596 9.6303 4.795780e-12 3.950479e-02 pass 2048 2112 4 0.073 78.7107 4.795780e-12 3.950479e-02 pass 2048 2112 4 0.074 77.8766 4.795780e-12 3.950479e-02 pass 4096 6208 4 0.214 214.2289 2.216840e-11 4.613649e-02 pass 4096 6208 4 0.203 225.7619 2.216840e-11 4.613649e-02 pass 4096 6208 4 0.204 224.5814 2.216840e-11 4.613649e-02 pass 6144 6208 4 0.457 338.6425 3.562570e-11 3.301736e-02 pass 6144 6208 4 0.445 347.2770 3.562570e-11 3.301736e-02 pass 6144 6208 4 0.446 346.9953 3.562570e-11 3.301736e-02 pass 8192 8256 4 0.900 407.1775 7.232445e-11 3.782865e-02 pass 8192 8256 4 0.869 421.7898 7.232445e-11 3.782865e-02 pass 8192 8256 4 0.867 422.8278 7.232445e-11 3.782865e-02 pass 10240 10304 4 1.449 494.0793 1.010026e-10 3.389721e-02 pass 10240 10304 4 1.373 521.5753 1.010026e-10 3.389721e-02 pass 10240 10304 4 1.371 522.2989 1.010026e-10 3.389721e-02 pass 12288 12352 4 2.241 552.0942 1.454923e-10 3.393283e-02 pass 12288 12352 4 2.184 566.5285 1.454923e-10 3.393283e-02 pass 12288 12352 4 2.185 566.1465 1.454923e-10 3.393283e-02 pass 14336 14400 4 3.313 592.9472 2.006193e-10 3.448820e-02 pass 14336 14400 4 3.228 608.5453 2.006193e-10 3.448820e-02 pass 14336 14400 4 3.224 609.3674 2.006193e-10 3.448820e-02 pass 16384 18496 4 4.621 634.5835 2.524725e-10 3.324476e-02 pass 16384 18496 4 4.462 657.1922 2.524725e-10 3.324476e-02 pass 16384 18496 4 4.461 657.3274 2.524725e-10 3.324476e-02 pass ./runme_mic: line 45: 5271 Killed ./xlinpack_$arch lininput_$arch Done: Fri Feb 13 10:05:15 CET 2015
How can I debug this? a 'gdb' run shows nothing, it just states that all threads get killed. The "runme_mic" script is from the MKL kit itself:
#!/bin/sh [....] echo "This is a SAMPLE run script for SMP LINPACK. Change it to reflect" echo "the correct number of CPUs/threads, problem input files, etc.." # Setting up affinity for better threading performance export KMP_AFFINITY=explicit,granularity=fine,proclist=[1-$(($(cat /proc/cpuinfo|grep proc|wc -l)-1)),0] arch=mic { date ./xlinpack_$arch lininput_$arch echo -n "Done: " date } | tee lin_$arch.txt
What's going wrong ? how can I debug this? I've tried it with binaries from both the Intel v14 and Intel v15 compilers.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Run micsmc on the host and monitor memory usage.
1) First look at available RAM before starting program
2) Second, look at available RAM as the program runs
You might find that 1) shows less available RAM at program start. This may be due to the RAM disk having too many files loaded into it.
*** do this for both MICs and observe the difference ***
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also notice that the outputs of the two tests are different. Is the second one writing its outputs (one per test) to files on the RAMDISK?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
thanks for the pointer on the RAM usage - I was/am autoinstalling some RPMs on the Phi's and they were eating up just enough memory on the 5110P's to cause the linpack benchmark to fail. On the 7120 there's 16 GB of RAM and the problem never occurs.
With the ramdisk (root partition) as small as possible I can now successfully run the linpack benchmark on the 5110P's as well.
It might be worth mentioning more explicitly in the documentation that all RPMs that are auto-installed (I'm using
# cat /etc/mpss/conf.d/rpm.conf Overlay rpm /var/mpss/rpm on
) will have a direct effect on the RAM available to the applications running on the Phi.
For reference, with the RPMs loaded 'micsmc -m' reports
# micsmc -m mic0 (mem): Free Memory: ............. 7158.93 MB Total Memory: ............ 7697.61 MB Memory Usage: ............ 538.68 MB
and without
# micsmc -m mic0 (mem): Free Memory: ............. 7397.18 MB Total Memory: ............ 7697.61 MB Memory Usage: ............ 300.43 MB
That's just enough to cause linpack to crash, even though it states that it's grabbing "only" 6591927552 bytes (=
6286.55 MB) of memory.
Ticket closed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Don't forget that you also have 240 x size of stack. If you are in the habit of being overly generous with your stack size, it can eat up memory fast.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am having the same issue. The smallest number I can see in the micsmc -m Free Memory is around 80 MB. All I can get from the output is the following:
Maximum memory requested that can be used=5247612160, at the size=24576
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
Is there something I have to change in the script? I already tried export KMP_STACKSIZE=1M; and ulimit -s unlimited.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your KMP_STACKSIZE is way too small.You should start with the default - which I believe is 4 M - and go up from there, as needed. Obviously that is a problem if you really only have 80 MB to play with.
So, what is your free memory, with nothing running on the coprocessor?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Running nothing, I get the following:
Free Memory: ............. 4678.87 MB
Total Memory: ............ 5740.88 MB
Memory Usage: ............ 1062.02 MB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks like you've got a 3000 series Phi with 6 GB of RAM; I actually have no idea how to get linpack to run on that - hopefully someone from Intel will be able to tell us :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With no user codes running, a memory usage of 1062 MB is very high. If you are installing extra rpm files, you might want to go back through them and see what you can do without. You might also want to run 'top' and look for any programs you didn't realize were there, that are consuming a lot of extra memory. With the kernel and the daemons, you should find maybe a dozen programs using any significant memory. The mpssd and coi_daemon will probably be the largest things you find. Be on the lookout for anything using more memory than that.
As far as adapting the Linpack benchmarks for systems with smaller memory, './xlinpack_mic -e' should print out the extended help which tells you how to modify the input files. Basically, omit any of the tests where 8 * problem size * leading dimension won't fit comfortably on your system.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page