Software Archive
Read-only legacy content
17060 Discussions

COIEngineGetHandle fails on SLES11 SP3

Taras_Shapovalov
Beginner
623 Views

Hello,

I have installed MPSS 3.4.1 on SLES11 SP3 and by some reason COI "hello world" does not work. The installed MIC card is working fine (COIEngineGetCount returns 1 device), but COIEngineGetHandle() returns COI_NOT_INITIALIZED:

micdev-sles11sp3:~ # coitrace -engine -buffer -event -pipeline -process ./coi_test
COIEngineGetCount [ThID:0x2aaaac4daca0]
    in_ISA = COI_ISA_MIC
    out_pNumEngines = 0x7fffffffdc74 0x00000001 (hex) : 1 (dec)

1 engines available
COIEngineGetHandle [ThID:0x2aaaac4daca0]
    in_ISA = COI_ISA_MIC
    in_EngineIndex = 0x00000000 (hex) : 0 (dec)
    out_pEngineHandle = 0x7fffffffdc78 0x400870

COIEngineGetHandle result COI_NOT_INITIALIZED
micdev-sles11sp3:~ #

micdev-sles11sp3:~ # miccheck
MicCheck 3.4-r1
Copyright 2013 Intel Corporation All Rights Reserved

Executing default tests for host
  Test 0: Check number of devices the OS sees in the system ... pass
  Test 1: Check mic driver is loaded ... pass
  Test 2: Check number of devices driver sees in the system ... pass
  Test 3: Check mpssd daemon is running ... pass
Executing default tests for device: 0
  Test 4 (mic0): Check device is in online state and its postcode is FF ... pass
  Test 5 (mic0): Check ras daemon is available in device ... pass
  Test 6 (mic0): Check running flash version is correct ... pass
  Test 7 (mic0): Check running SMC firmware version is correct ... pass

Status: OK
micdev-sles11sp3:~ #

micdev-sles11sp3:~ # ssh mic0 hostname
micdev-sles11sp3-mic0
micdev-sles11sp3:~ #

micdev-sles11sp3:~ # cat /sys/class/mic/mic0/family
x100
micdev-sles11sp3:~ #

 

With the similar setup on centos6 the coi examples work as expected. Could you please advise me where and what I can to check. Are any issues known on sles11 sp3 now with coi engine?

 

Best regards,

 

0 Kudos
7 Replies
Sunny_G_Intel
Employee
623 Views

Hello Taras,

Can you please try running tutorial examples provided with Intel® MPSS installation before we try investigating further. You can find the tutorial samples in the /usr/share/doc/intel-coi-<MPSS_Version>/tutorials directory after completing Intel® MPSS installation.

Can you please verify if you are getting the same COI_NOT_INITIALIZED errors after running the tutorial examples.

cd /usr/share/doc/intel-coi-3.4.1-1/tutorials/coi_simple
make
cd release/
./coi_simple_source_host

Thanks

0 Kudos
Taras_Shapovalov
Beginner
623 Views

Hello Sunny,

Thanks for the advice, but it works exactly the same way: it hangs for some time on COIEngineGetHandle and then fails. Further investigation has shown that COIEngineGetHandle() works for several seconds just after coi_daemon is restarted on the MIC. So, if I restart coi_daemon and then immediately run coi_simple_source_host, then it works first several times, but if I run it after several seconds again, it constantly fails.

micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release # ssh micdev-sles11sp3-mic0
[root@micdev-sles11sp3-mic0 ~]# /etc/init.d/coi restart
coi: Stopping COI daemon
coi: Starting coi_daemon...
     Using default values
     Increasing ulimit -n to 10240 for COI daemon
[   OK  ] coi_daemon started...
[root@micdev-sles11sp3-mic0 ~]# logout
Connection to micdev-sles11sp3-mic0 closed.
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release # ./coi_simple_source_host
1 engines available
Got engine handle
Created sink process coi_simple_sink_mic
Created pipeline
Got handle to sink function Foo
Called sink function Foo("Hello COI" [10 bytes])
Function returned "Hello COI"
Destroyed pipeline
Destroyed process
Exiting
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release # ./coi_simple_source_host
1 engines available
Got engine handle
Created sink process coi_simple_sink_mic
Created pipeline
Got handle to sink function Foo
Called sink function Foo("Hello COI" [10 bytes])
Function returned "Hello COI"
Destroyed pipeline
Destroyed process
Exiting
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release # ./coi_simple_source_host
1 engines available
COIEngineGetHandle(COI_ISA_MIC, 0, &engine) returned COI_NOT_INITIALIZED
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release #


According strace of coi_daemon, when it stops to work the daemon also stops to make any system calls (strace shows no new lines when ./coi_simple_source_host is started). It might be some deadlock I guess in the coi_daemon. I see coi_daemon can also write log in a file, but this feature is disabled by default. Could you tell me how to enable it? Maybe there will be something interesting in the log.

 

Thanks,

 

0 Kudos
Sunny_G_Intel
Employee
623 Views

Hello Taras,

Can you give some information about the version of the compiler you are using. I was unable to replicate your problem on SLES SP2 with MPSS version 3.4.1. Also as per my research I have not seen anyone reporting this issue. So I am getting a system prepared with SLES SP3 and MPSS 3.4.1. I will get back to you once I have the results. 

Thank you.

0 Kudos
Taras_Shapovalov
Beginner
623 Views

Hi Sunny,

It is a standard SLES version of g++, nothing custom.

micdev-sles11sp3:~ # cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3

micdev-sles11sp3:~ # g++ -v
Using built-in specs.
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3 --enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.3 --enable-linux-futex --without-system-libunwind --with-cpu=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux)
micdev-sles11sp3:~ #

Thanks,

0 Kudos
Sunny_G_Intel
Employee
623 Views

Hello Taras,

While I am waiting for a system to be setup for SLES 11 SP3, can you please verify if the network settings are all correct. You mentioned that the COI sample programs executes successfully for couple of times before it starts to fail with COI_NOT_INITIALIZED error. Once this happens can you check if you can SSH to the same MIC card immediately after failing. 

In case you see problems with SSH after COI error and if you are using default network settings (STATIC pair topology) with default IP addresses, can you please try resetting the network configuration using the following command

micctrl --network=default [mic card list]

Thanks

0 Kudos
Sunny_G_Intel
Employee
623 Views

Hello Taras,

I was able to configure a SLES 11 SP3 based system with MPSS 3.4.1. However on repeating the above steps I could not reproduce the error   which you are experiencing.  I would recommend first verifying your network configuration. If possible, try resetting your network settings to default configuration, Test COI and then again configure your custom network settings.

Also I would like to know of any additional configuration steps you completed besides the MPSS installation steps listed in readme.txt file for your MPSS version (https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss)  

Thanks

0 Kudos
Taras_Shapovalov
Beginner
623 Views

Hi Sunny,

Sorry for the delay. I found out that coi_daemon is functioning correctly when one 3rd-party process is not running. By some reason process pbs_mom (from TORQUE workload manager) is doing something with coi_daemon, which does not respond to other requests. When the pbs_mom is stopped then coi_simple_source_host works fine. I will proceed to work with TORQUE developers in order to find out the reasons of such behaviour and if something MPSS related we find I will let you know. Thanks for your help!

Best regards,

 

0 Kudos
Reply