- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have installed MPSS 3.4.1 on SLES11 SP3 and by some reason COI "hello world" does not work. The installed MIC card is working fine (COIEngineGetCount returns 1 device), but COIEngineGetHandle() returns COI_NOT_INITIALIZED:
micdev-sles11sp3:~ # coitrace -engine -buffer -event -pipeline -process ./coi_test
COIEngineGetCount [ThID:0x2aaaac4daca0]
in_ISA = COI_ISA_MIC
out_pNumEngines = 0x7fffffffdc74 0x00000001 (hex) : 1 (dec)
1 engines available
COIEngineGetHandle [ThID:0x2aaaac4daca0]
in_ISA = COI_ISA_MIC
in_EngineIndex = 0x00000000 (hex) : 0 (dec)
out_pEngineHandle = 0x7fffffffdc78 0x400870
COIEngineGetHandle result COI_NOT_INITIALIZED
micdev-sles11sp3:~ #
micdev-sles11sp3:~ # miccheck
MicCheck 3.4-r1
Copyright 2013 Intel Corporation All Rights Reserved
Executing default tests for host
Test 0: Check number of devices the OS sees in the system ... pass
Test 1: Check mic driver is loaded ... pass
Test 2: Check number of devices driver sees in the system ... pass
Test 3: Check mpssd daemon is running ... pass
Executing default tests for device: 0
Test 4 (mic0): Check device is in online state and its postcode is FF ... pass
Test 5 (mic0): Check ras daemon is available in device ... pass
Test 6 (mic0): Check running flash version is correct ... pass
Test 7 (mic0): Check running SMC firmware version is correct ... pass
Status: OK
micdev-sles11sp3:~ #
micdev-sles11sp3:~ # ssh mic0 hostname
micdev-sles11sp3-mic0
micdev-sles11sp3:~ #
micdev-sles11sp3:~ # cat /sys/class/mic/mic0/family
x100
micdev-sles11sp3:~ #
With the similar setup on centos6 the coi examples work as expected. Could you please advise me where and what I can to check. Are any issues known on sles11 sp3 now with coi engine?
Best regards,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Taras,
Can you please try running tutorial examples provided with Intel® MPSS installation before we try investigating further. You can find the tutorial samples in the /usr/share/doc/intel-coi-<MPSS_Version>/tutorials directory after completing Intel® MPSS installation.
Can you please verify if you are getting the same COI_NOT_INITIALIZED errors after running the tutorial examples.
cd /usr/share/doc/intel-coi-3.4.1-1/tutorials/coi_simple make cd release/ ./coi_simple_source_host
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sunny,
Thanks for the advice, but it works exactly the same way: it hangs for some time on COIEngineGetHandle and then fails. Further investigation has shown that COIEngineGetHandle() works for several seconds just after coi_daemon is restarted on the MIC. So, if I restart coi_daemon and then immediately run coi_simple_source_host, then it works first several times, but if I run it after several seconds again, it constantly fails.
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release # ssh micdev-sles11sp3-mic0
[root@micdev-sles11sp3-mic0 ~]# /etc/init.d/coi restart
coi: Stopping COI daemon
coi: Starting coi_daemon...
Using default values
Increasing ulimit -n to 10240 for COI daemon
[ OK ] coi_daemon started...
[root@micdev-sles11sp3-mic0 ~]# logout
Connection to micdev-sles11sp3-mic0 closed.
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release # ./coi_simple_source_host
1 engines available
Got engine handle
Created sink process coi_simple_sink_mic
Created pipeline
Got handle to sink function Foo
Called sink function Foo("Hello COI" [10 bytes])
Function returned "Hello COI"
Destroyed pipeline
Destroyed process
Exiting
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release # ./coi_simple_source_host
1 engines available
Got engine handle
Created sink process coi_simple_sink_mic
Created pipeline
Got handle to sink function Foo
Called sink function Foo("Hello COI" [10 bytes])
Function returned "Hello COI"
Destroyed pipeline
Destroyed process
Exiting
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release # ./coi_simple_source_host
1 engines available
COIEngineGetHandle(COI_ISA_MIC, 0, &engine) returned COI_NOT_INITIALIZED
micdev-sles11sp3:/usr/share/doc/intel-coi-3.4-1/tutorials/coi_simple/release #
According strace of coi_daemon, when it stops to work the daemon also stops to make any system calls (strace shows no new lines when ./coi_simple_source_host is started). It might be some deadlock I guess in the coi_daemon. I see coi_daemon can also write log in a file, but this feature is disabled by default. Could you tell me how to enable it? Maybe there will be something interesting in the log.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Taras,
Can you give some information about the version of the compiler you are using. I was unable to replicate your problem on SLES SP2 with MPSS version 3.4.1. Also as per my research I have not seen anyone reporting this issue. So I am getting a system prepared with SLES SP3 and MPSS 3.4.1. I will get back to you once I have the results.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sunny,
It is a standard SLES version of g++, nothing custom.
micdev-sles11sp3:~ # cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3
micdev-sles11sp3:~ # g++ -v
Using built-in specs.
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3 --enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.3 --enable-linux-futex --without-system-libunwind --with-cpu=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux)
micdev-sles11sp3:~ #
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Taras,
While I am waiting for a system to be setup for SLES 11 SP3, can you please verify if the network settings are all correct. You mentioned that the COI sample programs executes successfully for couple of times before it starts to fail with COI_NOT_INITIALIZED error. Once this happens can you check if you can SSH to the same MIC card immediately after failing.
In case you see problems with SSH after COI error and if you are using default network settings (STATIC pair topology) with default IP addresses, can you please try resetting the network configuration using the following command
micctrl --network=default [mic card list]
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Taras,
I was able to configure a SLES 11 SP3 based system with MPSS 3.4.1. However on repeating the above steps I could not reproduce the error which you are experiencing. I would recommend first verifying your network configuration. If possible, try resetting your network settings to default configuration, Test COI and then again configure your custom network settings.
Also I would like to know of any additional configuration steps you completed besides the MPSS installation steps listed in readme.txt file for your MPSS version (https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss)
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sunny,
Sorry for the delay. I found out that coi_daemon is functioning correctly when one 3rd-party process is not running. By some reason process pbs_mom (from TORQUE workload manager) is doing something with coi_daemon, which does not respond to other requests. When the pbs_mom is stopped then coi_simple_source_host works fine. I will proceed to work with TORQUE developers in order to find out the reasons of such behaviour and if something MPSS related we find I will let you know. Thanks for your help!
Best regards,
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page