Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6977 Discussions

Struggling to get Automatic Off load working with MIC/MKL 2017

AndrewC
New Contributor III
279 Views

I have a MIC card in a Microway XEON Workstation which seems to be functioning as expected (see micinfo debug output)

 

After updating to MKL 2017 Update 3, I am struggling to set AO to function. I created a simple DGEMM test program and have been calling DGEMM with  square matrix sizes up to 16384, and cannot get AO to "kick-in".

In prior versions of MKL, I could see AO working  at sizes of about 4096x4096 on this same machine.

The following env vars are set.

MKL_MIC_ENABLE=1
OFFLOAD_REPORT=2
MKL_MIC_DISABLE_HOST_FALLBACK=1
MIC_LD_LIBRARY_PATH=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017\windows\mkl\lib\intel64_win_mic

 

>micinfo
MicInfo Utility Log
Copyright 2011-2013 Intel Corporation All Rights Reserved.

Created Wed Jun 14 11:34:46 2017


        System Info
                HOST OS                 : Windows
                OS Version              : Microsoft Windows 7 Professi
                Driver Version          : 3.3.30726.0
                MPSS Version            : 3.3.30726.0
                Host Physical Memory    : 32709 MB

Device No: 0, Device Name: mic0

        Version
                Flash Version            : 2.1.02.0390
                SMC Firmware Version     : 1.16.5078
                SMC Boot Loader Version  : 1.8.4326
                uOS Version              : 2.6.38.8+mpss3.3
                Device Serial Number     : ADKC32800563

        Board
                Vendor ID                : 0x8086
                Device ID                : 0x225d
                Subsystem ID             : 0x3608
                Coprocessor Stepping ID  : 2
                PCIe Width               : x16
                PCIe Speed               : 5 GT/s
                PCIe Max payload size    : 256 bytes
                PCIe Max read req size   : 512 bytes
                Coprocessor Model        : 0x01
                Coprocessor Model Ext    : 0x00
                Coprocessor Type         : 0x00
                Coprocessor Family       : 0x0b
                Coprocessor Family Ext   : 0x00
                Coprocessor Stepping     : C0
                Board SKU                : C0PRQ-3120/3140 P/A
                ECC Mode                 : Enabled
                SMC HW Revision          : Product 300W Active CS

        Cores
                Total No of Active Cores : 57
                Voltage                  : 1039000 uV
                Frequency                : 1100000 kHz
              

 

0 Kudos
5 Replies
Jing_Xu
Employee
279 Views

Could you try to upgrade MPSS and OS to Windows 7 SP1, then see whether it's getting better or not?

0 Kudos
Gennady_F_Intel
Moderator
279 Views

hello,  our BLAS expert tried the size you mention 4096 x 4096 x 4096 on Linux system ( such Win system is not available now). For update MKL 2017 3 build the code doesn’t offload. This was due to a commit targeting a dynamic offload threshold to KNL that also affected KNC code branch. However, for larger sizes costumer should see offload happen (6000 x 6000 x 6000 see below). If User is using dimensions as large as 16000 he should see offload kick in, unless there is a problem with his environment configuration as I mention in the previous email.

./knc_ao $ 

I,CPU,0.000000e+00

[MKL] [MIC --]   [AO Function] DGEMM

[MKL] [MIC --]   [AO DGEMM Workdivision] 0.48 0.52

[MKL] [MIC 00] [AO DGEMM CPU Time] 4.562893 seconds

[MKL] [MIC 00] [AO DGEMM MIC Time] 0.505998 seconds

[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 443520000 bytes

[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 466560000 bytes

[MKL] [MIC --]   [AO Function] DGEMM

[MKL] [MIC --]   [AO DGEMM Workdivision] 0.48 0.52

[MKL] [MIC 00] [AO DGEMM CPU Time] 0.511411 seconds

[MKL] [MIC 00] [AO DGEMM MIC Time] 0.262164 seconds

[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 443520000 bytes

[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 466560000 bytes

[MKL] [MIC --]   [AO Function] DGEMM

[MKL] [MIC --]   [AO DGEMM Workdivision] 0.48 0.52

[MKL] [MIC 00] [AO DGEMM CPU Time] 0.521218 seconds

[MKL] [MIC 00] [AO DGEMM MIC Time] 0.261800 seconds

[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 443520000 bytes

[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 466560000 bytes

[MKL] [MIC --]   [AO Function] DGEMM

[MKL] [MIC --]   [AO DGEMM Workdivision] 0.48 0.52

[MKL] [MIC 00] [AO DGEMM CPU Time] 0.517664 seconds

[MKL] [MIC 00] [AO DGEMM MIC Time] 0.262245 seconds

[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 443520000 bytes

[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 466560000 bytes

R,dgemm,n,n,1.0,1.0,6000,6000,6000,n,n,6000,6000,6000,6000,6000,6000,8.339619e+11

S,dgemm,n,n,1.0,1.0,6000,6000,6000,8.339619e+11

lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                24

On-line CPU(s) list:   0-23

Thread(s) per core:    1

Core(s) per socket:    12

CPU socket(s):         2

NUMA node(s):          2

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 62

Stepping:              4

CPU MHz:               2693.628

BogoMIPS:              5386.06

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              30720K

NUMA node0 CPU(s):     0-11

NUMA node1 CPU(s):     12-23

0 Kudos
AndrewC
New Contributor III
279 Views

Sadly after updating and reflashing to MPSS 3.8 the Phi card seems DOA....

The reflash process went smoothly - no errors.

When I call mkl_mic_enable() the software crashes now.

 

 

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC>micinfo
MicInfo Utility Log
Created Fri Jun 16 08:42:07 2017


        System Info
                HOST OS                 : Windows
                OS Version              : Microsoft Windows 7 Professional
                Driver Version          : 3.8.2.4191
                MPSS Version            : 3.8.2.4191
                Host Physical Memory    : 32709 MB

Device No: 0, Device Name: mic0

        Version
                Flash Version            : NotAvailable
                SMC Firmware Version     : NotAvailable
                SMC Boot Loader Version  : NotAvailable
                Coprocessor OS Version   : NotAvailable
                Device Serial Number     : NotAvailable

        Board
                Vendor ID                : 0x8086
                Device ID                : 0x225d
                Subsystem ID             : 0x3608
                Coprocessor Stepping ID  : 2
                PCIe Width               : x16
                PCIe Speed               : 5 GT/s
                PCIe Max payload size    : 256 bytes
                PCIe Max read req size   : 512 bytes
                Coprocessor Model        : 0x01
                Coprocessor Model Ext    : 0x00
                Coprocessor Type         : 0x00
                Coprocessor Family       : 0x0b
                Coprocessor Family Ext   : 0x00
                Coprocessor Stepping     : C0
                Board SKU                : C0PRQ-3120/3140 P/A
                ECC Mode                 : NotAvailable
                SMC HW Revision          : NotAvailable

        Cores
                Total No of Active Cores : NotAvailable
                Voltage                  : NotAvailable
                Frequency                : NotAvailable

        Thermal
                Fan Speed Control        : NotAvailable
                Fan RPM                  : NotAvailable
                Fan PWM                  : NotAvailable
                Die Temp                 : NotAvailable

        GDDR
                GDDR Vendor              : NotAvailable
                GDDR Version             : NotAvailable
                GDDR Density             : NotAvailable
                GDDR Size                : NotAvailable
                GDDR Technology          : NotAvailable
                GDDR Speed               : NotAvailable
                GDDR Frequency           : NotAvailable
                GDDR Voltage             : NotAvailable

0 Kudos
AndrewC
New Contributor III
279 Views

I have moved my discussion to the Xeon PHI forum to figure out why the card is dead.

0 Kudos
AndrewC
New Contributor III
279 Views

OK, I finally got my card working again - the install of 3.8.2 had somehow trashed a critical file

Back to the same problem, I am trying with n=16384, and no AO is happening.

    cblas_dgemm (CblasColMajor, CblasNoTrans, CblasNoTrans, n, n, n, 1.0,A, n, B, n, beta, C, n);

 

0 Kudos
Reply