Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
6427 Discussions

Struggling to get Automatic Off load working with MIC/MKL 2017

AndrewC
New Contributor I
113 Views

I have a MIC card in a Microway XEON Workstation which seems to be functioning as expected (see micinfo debug output)

 

After updating to MKL 2017 Update 3, I am struggling to set AO to function. I created a simple DGEMM test program and have been calling DGEMM with  square matrix sizes up to 16384, and cannot get AO to "kick-in".

In prior versions of MKL, I could see AO working  at sizes of about 4096x4096 on this same machine.

The following env vars are set.

MKL_MIC_ENABLE=1
OFFLOAD_REPORT=2
MKL_MIC_DISABLE_HOST_FALLBACK=1
MIC_LD_LIBRARY_PATH=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017\windows\mkl\lib\intel64_win_mic

 

>micinfo
MicInfo Utility Log
Copyright 2011-2013 Intel Corporation All Rights Reserved.

Created Wed Jun 14 11:34:46 2017


        System Info
                HOST OS                 : Windows
                OS Version              : Microsoft Windows 7 Professi
                Driver Version          : 3.3.30726.0
                MPSS Version            : 3.3.30726.0
                Host Physical Memory    : 32709 MB

Device No: 0, Device Name: mic0

        Version
                Flash Version            : 2.1.02.0390
                SMC Firmware Version     : 1.16.5078
                SMC Boot Loader Version  : 1.8.4326
                uOS Version              : 2.6.38.8+mpss3.3
                Device Serial Number     : ADKC32800563

        Board
                Vendor ID                : 0x8086
                Device ID                : 0x225d
                Subsystem ID             : 0x3608
                Coprocessor Stepping ID  : 2
                PCIe Width               : x16
                PCIe Speed               : 5 GT/s
                PCIe Max payload size    : 256 bytes
                PCIe Max read req size   : 512 bytes
                Coprocessor Model        : 0x01
                Coprocessor Model Ext    : 0x00
                Coprocessor Type         : 0x00
                Coprocessor Family       : 0x0b
                Coprocessor Family Ext   : 0x00
                Coprocessor Stepping     : C0
                Board SKU                : C0PRQ-3120/3140 P/A
                ECC Mode                 : Enabled
                SMC HW Revision          : Product 300W Active CS

        Cores
                Total No of Active Cores : 57
                Voltage                  : 1039000 uV
                Frequency                : 1100000 kHz
              

 

0 Kudos
5 Replies
Jing_Xu
Employee
113 Views

Could you try to upgrade MPSS and OS to Windows 7 SP1, then see whether it's getting better or not?

Gennady_F_Intel
Moderator
113 Views

hello,  our BLAS expert tried the size you mention 4096 x 4096 x 4096 on Linux system ( such Win system is not available now). For update MKL 2017 3 build the code doesn’t offload. This was due to a commit targeting a dynamic offload threshold to KNL that also affected KNC code branch. However, for larger sizes costumer should see offload happen (6000 x 6000 x 6000 see below). If User is using dimensions as large as 16000 he should see offload kick in, unless there is a problem with his environment configuration as I mention in the previous email.

./knc_ao $ 

I,CPU,0.000000e+00

[MKL] [MIC --]   [AO Function] DGEMM

[MKL] [MIC --]   [AO DGEMM Workdivision] 0.48 0.52

[MKL] [MIC 00] [AO DGEMM CPU Time] 4.562893 seconds

[MKL] [MIC 00] [AO DGEMM MIC Time] 0.505998 seconds

[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 443520000 bytes

[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 466560000 bytes

[MKL] [MIC --]   [AO Function] DGEMM

[MKL] [MIC --]   [AO DGEMM Workdivision] 0.48 0.52

[MKL] [MIC 00] [AO DGEMM CPU Time] 0.511411 seconds

[MKL] [MIC 00] [AO DGEMM MIC Time] 0.262164 seconds

[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 443520000 bytes

[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 466560000 bytes

[MKL] [MIC --]   [AO Function] DGEMM

[MKL] [MIC --]   [AO DGEMM Workdivision] 0.48 0.52

[MKL] [MIC 00] [AO DGEMM CPU Time] 0.521218 seconds

[MKL] [MIC 00] [AO DGEMM MIC Time] 0.261800 seconds

[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 443520000 bytes

[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 466560000 bytes

[MKL] [MIC --]   [AO Function] DGEMM

[MKL] [MIC --]   [AO DGEMM Workdivision] 0.48 0.52

[MKL] [MIC 00] [AO DGEMM CPU Time] 0.517664 seconds

[MKL] [MIC 00] [AO DGEMM MIC Time] 0.262245 seconds

[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 443520000 bytes

[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 466560000 bytes

R,dgemm,n,n,1.0,1.0,6000,6000,6000,n,n,6000,6000,6000,6000,6000,6000,8.339619e+11

S,dgemm,n,n,1.0,1.0,6000,6000,6000,8.339619e+11

lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                24

On-line CPU(s) list:   0-23

Thread(s) per core:    1

Core(s) per socket:    12

CPU socket(s):         2

NUMA node(s):          2

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 62

Stepping:              4

CPU MHz:               2693.628

BogoMIPS:              5386.06

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              30720K

NUMA node0 CPU(s):     0-11

NUMA node1 CPU(s):     12-23

AndrewC
New Contributor I
113 Views

Sadly after updating and reflashing to MPSS 3.8 the Phi card seems DOA....

The reflash process went smoothly - no errors.

When I call mkl_mic_enable() the software crashes now.

 

 

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC>micinfo
MicInfo Utility Log
Created Fri Jun 16 08:42:07 2017


        System Info
                HOST OS                 : Windows
                OS Version              : Microsoft Windows 7 Professional
                Driver Version          : 3.8.2.4191
                MPSS Version            : 3.8.2.4191
                Host Physical Memory    : 32709 MB

Device No: 0, Device Name: mic0

        Version
                Flash Version            : NotAvailable
                SMC Firmware Version     : NotAvailable
                SMC Boot Loader Version  : NotAvailable
                Coprocessor OS Version   : NotAvailable
                Device Serial Number     : NotAvailable

        Board
                Vendor ID                : 0x8086
                Device ID                : 0x225d
                Subsystem ID             : 0x3608
                Coprocessor Stepping ID  : 2
                PCIe Width               : x16
                PCIe Speed               : 5 GT/s
                PCIe Max payload size    : 256 bytes
                PCIe Max read req size   : 512 bytes
                Coprocessor Model        : 0x01
                Coprocessor Model Ext    : 0x00
                Coprocessor Type         : 0x00
                Coprocessor Family       : 0x0b
                Coprocessor Family Ext   : 0x00
                Coprocessor Stepping     : C0
                Board SKU                : C0PRQ-3120/3140 P/A
                ECC Mode                 : NotAvailable
                SMC HW Revision          : NotAvailable

        Cores
                Total No of Active Cores : NotAvailable
                Voltage                  : NotAvailable
                Frequency                : NotAvailable

        Thermal
                Fan Speed Control        : NotAvailable
                Fan RPM                  : NotAvailable
                Fan PWM                  : NotAvailable
                Die Temp                 : NotAvailable

        GDDR
                GDDR Vendor              : NotAvailable
                GDDR Version             : NotAvailable
                GDDR Density             : NotAvailable
                GDDR Size                : NotAvailable
                GDDR Technology          : NotAvailable
                GDDR Speed               : NotAvailable
                GDDR Frequency           : NotAvailable
                GDDR Voltage             : NotAvailable

AndrewC
New Contributor I
113 Views

I have moved my discussion to the Xeon PHI forum to figure out why the card is dead.

AndrewC
New Contributor I
113 Views

OK, I finally got my card working again - the install of 3.8.2 had somehow trashed a critical file

Back to the same problem, I am trying with n=16384, and no AO is happening.

    cblas_dgemm (CblasColMajor, CblasNoTrans, CblasNoTrans, n, n, n, 1.0,A, n, B, n, beta, C, n);

 

Reply