Software Archive
Read-only legacy content

Runing linpack with native mode

Chao_M_
Beginner
1,227 Views

Hi all,

I tried to run HPL with native mode, ( I am using the pre-compiled binaries and run script in MKL)

I set OMP_NUM_THREADS and MKL_NUM_THREADS, but I only find one thread running on MIC

Do you have any suggestion about it?

Thanks,

Chao

0 Kudos
8 Replies
Sunny_G_Intel
Employee
1,227 Views

Hello Chao,

Can you please let me know if you are trying to run the benchmark using the runme_hybrid_mic scripts provided with Intel® MKL?  What is the output of the following command on your coprocessor

cat /proc/cpuinfo|grep proc|wc -l

Also when you say "I only find one thread running on MIC", Are you getting this information from micsmc. If not, can you please let me know your method.

I would recommend going through the runme_hybrid_mic file provided with Intel® MKL (default - /opt/intel/mkl/benchmarks/mp_linpack/bin_intel/mic) to verify the steps for running HPL on coprocessor (i.e. if you have copied all the required coprocessor binaries to the coprocessor). 

Let me know if you still have any problems running the benchmark on coprocessor.

Thanks

0 Kudos
Sunny_G_Intel
Employee
1,227 Views

Hi Chao,

In case you were trying to run linpack (and not mp_linpack), you can do that using the following steps:

//Assuming default installation
cd /opt/intel/mkl/benchmarks/linpack
scp *mic* mic0:/tmp/.

//Copy the copocessor openMP binaries  to /usr/lib64 or copy it it any directory on the coprocessor and modify the LD_LIBRARY_PATH
scp /opt/intel/composer_xe_2015.1.133/compiler/lib/mic/libiomp5* mic0:/usr/lib64/. 

ssh mic0
cd /tmp
./runme_mic

If everything is correct, you should see this as part of the benchmark output

...

Number of cores: 244
Number of threads: 244

...

Thanks

 

0 Kudos
Chao_M_
Beginner
1,227 Views

Thank you, Sunny.

I use default runme_hybrid_mic to run the job.

After the job start, I use "top" command to check the job status, I found there is only one xhpl_hybrid_mic thread, and the *CPU* usage is only 0.4%.

For native run, which N and NB should I use?

Thank you,

Chao 

0 Kudos
Sunny_G_Intel
Employee
1,227 Views

Hello Chao,

While using the default runme_hybrid_mic script to run linpack benchmark you can check the number of threads by either looking at the output or by using micsmc.

Method 1: Looking at the output

You can check the output file lin_mic.txt generated in the same directory you started the benchmark

cat lin_mic.txt | head
Wed Feb  4 10:14:16 PST 2015
Intel(R) Optimized LINPACK Benchmark data
 
Current date/time: Wed Feb  4 10:14:16 2015
 
CPU frequency:    1.238 GHz
Number of CPUs: 1                 //Number of host cpu
Number of cores: 244              //Number of hardware threads (61 cores * 4 threads/core) 
Number of threads: 244            

Method 2: Using micsmc

1. Start your benchmark and then on another terminal launch 'micsmc' as follows:

micsmc &

Here if you change the display(view) mode to "Go to Core Histogram View" you will be able to see the real time core utilization on your coprocessor card as follows:

Coprocessor_Histogram_View.JPG

0 Kudos
Chao_M_
Beginner
1,227 Views

Hi, Sunny

There is no lin_mic.txt file, here is the micsmc shows:

mic0.JPG

I use the run script provided by MKL.

Thanks,

Chao

0 Kudos
Sunny_G_Intel
Employee
1,227 Views

Hi Chao,

As per your note earlier, It looks like you are trying to run multi node version of linpack (in mp_linpack directory) rather than single node openMP version(in linpack directory). The multinode version requires additional configuration as given in the runme_hybrid_mic script. Some of the important points to note from the script are:

# USAGE NOTES: This script should be uploaded to MIC and executed there, NOT on host CPU.
# Before running this script on MIC, please upload/copy the following necessary files to MIC:
#
# 1. HPL input file     (scp HPL_hybrid.dat mic0:/tmp)
# 2. HPL binary for MIC (scp xhpl_hybrid_mic mic0:/tmp/xhpl_hybrid_mic)
# 3. OpenMP run-time library for MIC (libiomp5). Change the path set to LD_LIBRARY_PATH below accordingly.
# 4. mpiexec.hydra process manager and MPI run-time libraries (Please follow the instructions documented at <intel mpi installdir>/doc/ for installing Intel MPI on MIC)

So before running the multinode version, can you please run the single node version(runme_mic) from the directory-  /opt/intel/mkl/benchmarks/linpack ( default). I have listed steps to run linpack - native on a coprocessor in one of the notes above. Once you run the script runme_mic you can verify number of threads using micsmc or by looking at the output (lin_mic.txt). 

Thanks

0 Kudos
Chao_M_
Beginner
1,227 Views

Thanks Sunny.

But I only got 887Gflops with 7120p, any suggestion to improve it?

CPU frequency:    1.238 GHz
Number of CPUs: 1
Number of cores: 244
Number of threads: 244

 

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
2048   2112   4       64.1158  91.1890
4096   6208   4       241.1838 256.7588
6144   6208   4       384.2308 397.0202
8192   8256   4       477.7925 482.2928
10240  10304  4       551.0781 563.5887
12288  12352  4       619.4961 621.5703
14336  14400  4       693.8810 701.4037
16384  18496  4       728.9239 732.7727
18432  18496  4       774.1562 779.4835
20480  20544  4       805.1722 810.5539
22528  22592  4       836.2796 839.8996
24576  26688  4       855.9451 859.7642
26624  26688  4       875.5359 877.2316
28672  28736  4       887.0325 887.5356

Regards,

Chao

0 Kudos
McCalpinJohn
Honored Contributor III
1,227 Views

The published results for the Xeon Phi 7120P (http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-linpack-stream.html) use a problem size of N=43072.   The results above are still clearly increasing with problem size, so you might want to continue with larger sizes to see if you can close the gap.

0 Kudos
Reply