Software Archive
Read-only legacy content
17061 Discussions

Binary error on Xeon Phi

Joe
Beginner
1,227 Views

I wanted to run a Fortran code on one coprocessor of Xeon Phi. I've compiled the file on the main processor (gfortran -fopenmp program.f95 -o test) and copied the output file (test) to the coprocessor via scp command and tried to execute the file (./test), while doing that I got the following error "-bash: ./test: cannot execute binary file".

I'm completely new to Xeon Phi, Please help to get rid of this.....

Thank you

 

0 Kudos
1 Solution
Kevin_D_Intel
Employee
1,227 Views

You will need the Intel Development tools. At least the Intel Fortran compiler and RTLs, https://software.intel.com/en-us/intel-parallel-studio-xe.

There are Free trials and Free tools for Students, educators, academic researchers, and open source contributors who qualify, https://software.intel.com/en-us/intel-parallel-studio-xe/try-buy

The article Building a Native Application for Intel® Xeon Phi™ Coprocessors. Note that the article mixes the Linux/Windows forms of compiler options. You would use hypen forms as in: ifort -mmic -qopenmp program.f95

Here is a small example:

PROGRAM  omp_example
IMPLICIT NONE

INTEGER, EXTERNAL :: OMP_GET_THREAD_NUM, OMP_GET_NUM_THREADS
INTEGER :: nthreads, thread_id

!$OMP PARALLEL private(nthreads, thread_id) num_threads(2)
  thread_id = OMP_GET_THREAD_NUM()

  write (*,"(A,I3)")"Thread id: ", thread_id

  if (thread_id == 0) then
     nthreads = OMP_GET_NUM_THREADS()
     write (*,"(A,I3)")"Num threads: ", nthreads
  end if
!$OMP END PARALLEL
END

 

$ export SINK_LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2017.1.118/linux/compiler/lib/intel64_lin_mic

$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.1.132 Build 20161005

$ ifort -qopenmp -mmic sample_omp.f90
$ /opt/intel/mic/bin/micnativeloadex a.out
Thread id:   0
Thread id:   1
Num threads:   2

 

Some additional documentation is available under Programming and Compiling for Intel® Many Integrated Core Architecture.

View solution in original post

0 Kudos
17 Replies
McCalpinJohn
Honored Contributor III
1,227 Views

The first generation Xeon Phi coprocessor ("Knights Corner") uses a different binary format than any other processor.

You need to add the "-mmic" flag to the compiler invocation to produce a binary that will run on these first generation Xeon Phi cards.

0 Kudos
Joe
Beginner
1,227 Views

Thank you for the response, I tried including the "-mmic" flag to the compiler and is showing "unrecognized command line option ‘-mmic’ ".

0 Kudos
Kevin_D_Intel
Employee
1,228 Views

You will need the Intel Development tools. At least the Intel Fortran compiler and RTLs, https://software.intel.com/en-us/intel-parallel-studio-xe.

There are Free trials and Free tools for Students, educators, academic researchers, and open source contributors who qualify, https://software.intel.com/en-us/intel-parallel-studio-xe/try-buy

The article Building a Native Application for Intel® Xeon Phi™ Coprocessors. Note that the article mixes the Linux/Windows forms of compiler options. You would use hypen forms as in: ifort -mmic -qopenmp program.f95

Here is a small example:

PROGRAM  omp_example
IMPLICIT NONE

INTEGER, EXTERNAL :: OMP_GET_THREAD_NUM, OMP_GET_NUM_THREADS
INTEGER :: nthreads, thread_id

!$OMP PARALLEL private(nthreads, thread_id) num_threads(2)
  thread_id = OMP_GET_THREAD_NUM()

  write (*,"(A,I3)")"Thread id: ", thread_id

  if (thread_id == 0) then
     nthreads = OMP_GET_NUM_THREADS()
     write (*,"(A,I3)")"Num threads: ", nthreads
  end if
!$OMP END PARALLEL
END

 

$ export SINK_LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2017.1.118/linux/compiler/lib/intel64_lin_mic

$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.1.132 Build 20161005

$ ifort -qopenmp -mmic sample_omp.f90
$ /opt/intel/mic/bin/micnativeloadex a.out
Thread id:   0
Thread id:   1
Num threads:   2

 

Some additional documentation is available under Programming and Compiling for Intel® Many Integrated Core Architecture.

0 Kudos
Joe
Beginner
1,227 Views

Thank you very much.

That was perfect, I've got the "ifort" compiler,

$ export SINK_LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2017.1.118/linux/compiler/lib/intel64_lin_mic
$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 16.0.3.210 Build 20160415
Copyright (C) 1985-2016 Intel Corporation.  All rights reserved.
$ ifort -qopenmp -mmic sample.f95
x86_64-k1om-linux-ld:sample.f95: file format not recognized; treating as linker script
x86_64-k1om-linux-ld:sample.f95:1: syntax error

"sample.f95" is the one provided above.

I'm still able to compile and run the program on the main processor via the "gfortran" compiler.

0 Kudos
Kevin_D_Intel
Employee
1,227 Views

That error comes from the .f95 file extension. I recommend renaming the file from sample.f95 to sample.f90.  The .f90 extension implies free-form. If you really want to use .f95, then you need to use: ifort -qopenmp -mmic -free -Tf sample.f95

0 Kudos
Joe
Beginner
1,227 Views

Thank you for the response,

There is no problem compiling the file now,

$ export SINK_LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2017.1.118/linux/compiler/lib/intel64_lin_mic
$ ifort -qopenmp -mmic -free -Tf sample.f95
$ /opt/intel/mic/bin/micnativeloadex a.out
Error creating remote process, at least one library dependency is missing.
Please check the list of dependencies below to see which
one is missing and update the SINK_LD_LIBRARY_PATH
environment variable to include the missing library.


Dependency information for a.out

	Full path was resolved as 
	/home/work/JOE/a.out

	Binary was built for Intel(R) Xeon Phi(TM) Coprocessor
	(codename: Knights Corner) architecture

	SINK_LD_LIBRARY_PATH = /opt/intel/compilers_and_libraries_2017.1.118/linux/compiler/lib/intel64_lin_mic

	Dependencies Found:
		(none found)

	Dependencies Not Found Locally (but may exist already on the coprocessor):
		libm.so.6
		libiomp5.so
		libpthread.so.0
		libc.so.6
		libgcc_s.so.1
		libdl.so.2

 

I also tried to run the executable from one of the coprocessor,

$./a.out
./a.out: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory

 

0 Kudos
Kevin_D_Intel
Employee
1,227 Views

Great. Sorry to mis-lead earlier. You need to adjust the SINK_LD_LIBRARY_PATH setting to match your 16.0 compiler and then re-run.

Try setting as follows: SINK_LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries/linux/lib/intel64_lin_mic


 

0 Kudos
Joe
Beginner
1,227 Views

It's working fine in the main processor.

But still showing the same error when I copy the executable and run it on one of the coprocessors.

$./a.out
./a.out: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory


 

0 Kudos
TimP
Honored Contributor III
1,227 Views

You must copy over the libiomp5 from host, e.g. to /usr/lib64/, or set up a mount and add its location in ld_library_path

0 Kudos
Joe
Beginner
1,227 Views

How do I locate the "libiomp5" and copy it to "/usr/lib64/"

0 Kudos
Loc_N_Intel
Employee
1,227 Views

First, verify that libiomp5.so  for mic exists in the current compiler: 

$ ls -l /opt/intel/compilers_and_libraries_2017.1.118/linux/compiler/lib/mic/libiomp5.so

If so, then copy it to a co-processor (e.g., the co-processor mic0):

$ sudo scp /opt/intel/compilers_and_libraries_2017.1.118/linux/compiler/lib/mic/libiomp5.so mic0:/usr/lib64

 

0 Kudos
Kevin_D_Intel
Employee
1,227 Views

As the others noted, if you're logging into the card and running the executable directly as you showed later on then you are responsible for making certain the needed RTLs are present on the card and also that LD_LIBRARY_PATH set according under the login environment on the card.

If you don't need that level of interaction then micnativeloadex will deal with all this for you. The article Building a Native Application for Intel® Xeon Phi™ Coprocessors cited earlier has the needed details for running using micnativeloadex or interactively on the card itself, including transferring files as setting needed environment variables.

0 Kudos
Joe
Beginner
1,227 Views

Kevin D (Intel) ; Nguyen ; Loc Q,Tim P ; Mccalpin, John ; Thank you all, I really appreciate your help.

0 Kudos
Joe
Beginner
1,227 Views

How do I know the coprocessor details from the command line?

The main processor in my machine contains 12 cores and when I run a parallelized code it lakes 2 hrs with 48 threads. But in the coprocessor,  I think which is 61 cores and I tried with 244 threads and the program is taking about 2.30 hrs. Where am I going wrong? I'm just changing the number of threads while running the code on the coprocessor along with setting "KMP_AFFINITY=SCATTER", do I need to do something more?

0 Kudos
Joe
Beginner
1,227 Views

How do I know the coprocessor details from the command line?

The main processor in my machine contains 12 cores and when I run a parallelized code it lakes 2 hrs with 48 threads. But in the coprocessor,  I think which is 61 cores and I tried with 244 threads and the program is taking about 2.30 hrs. Where am I going wrong? I'm just changing the number of threads while running the code on the coprocessor along with setting "KMP_AFFINITY=SCATTER", do I need to do something more?

0 Kudos
Paulius_V_
Beginner
1,227 Views

Joe wrote:

How do I know the coprocessor details from the command line?

The main processor in my machine contains 12 cores and when I run a parallelized code it lakes 2 hrs with 48 threads. But in the coprocessor,  I think which is 61 cores and I tried with 244 threads and the program is taking about 2.30 hrs. Where am I going wrong? I'm just changing the number of threads while running the code on the coprocessor along with setting "KMP_AFFINITY=SCATTER", do I need to do something more?

 

First of all, scalar performance on the KNC is bad. Make sure that your code is heavy on the vector instructions. I suggest Intel Advisor HPC analysis. Compile with -g. Also, if you're using 2017 Intel tools there is a new roofline feature (enable by setting ADVIXE_EXPERIMENTAL=roofline). This will allow you to set some expectations for performance on either architecture. 

Test the scaling of your aplication on x86 and MIC. On KNC you need at least two-way HT to able to exploit all the hardware so for a quick test you could try running on 120 (60-1)*2  I suggest you leave one core for the OS. 

 

0 Kudos
Joe
Beginner
1,227 Views

Hey, here is a part of my code (it calculates the interaction between particles in a system) which I was trying to vectorize with 'simd', and I'm getting the following error "confined_l70.f95(137): (col. 1) warning #13379: loop was not vectorized with "simd"". 137 is the line starting with the first 'do' loop.

subroutine force(ax,ay,xx,xy,mtot,L,m_start,m_end)
implicit none
integer , intent (in) :: mtot,m_start,m_end
integer :: i,j
real (kind = 8), intent (inout) :: ax(mtot),ay(mtot)
real (kind = 8), intent (in) :: xx(mtot),xy(mtot),L
real (kind = 8) :: rsqd,ff,r2i,r6i,rcut,rcut2,virij,rx,ry
rx = 0.D0 ; ry = 0.D0
ax = 0.D0 ; ay = 0.D0
rcut = 1.12246205 !2**(1.0d0/6.0d0)    
rcut2 = rcut*rcut !1.259921054
 call omp_set_num_threads(240)
!$omp do simd private(rx,ry,rsqd,r2i,r6i,virij,ff) collapse(2)
do i=1,mtot-1
    do j=i+1,mtot
        rsqd = 0.D0
        rx = xx(i) - xx(j)
        rx = rx - l*anint(rx/l)
        ry = xy(i) - xy(j)    
        rsqd = rsqd + rx*rx +ry*ry
    if (rsqd.le.rcut2) then;
        r2i = 1/rsqd
        r6i = r2i**3
        virij = 48*(r6i*r6i-0.5D0*r6i)         
            ff = virij*r2i    
            if (ff .gt. 10000) then    
            EXIT
        end if
        !$omp atomic
        ax(i) = ax(i) + rx*ff        
        !$omp atomic                
        ay(i) = ay(i) + ry*ff    
        !$omp atomic                                    
        ax(j) = ax(j) - rx*ff
        !$omp atomic
        ay(j) = ay(j) - ry*ff                    
    end if            
    end do    
end do
!$omp end do simd
end subroutine force

The code was running fine after parallelization and the results were really good. The program compiles fine if only 'simd' is used without the 'do' construct (!$omp simd (...)(....)), in which case the program uses only on one core. I'm not sure whether this is the right way of doing it, could anyone please help me figuring out the problem...

0 Kudos
Reply