Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5135 Discussions

VTune does not recognize OpenCL SDK/MediaSDK and the checker fails

user1900
Beginner
863 Views

Hi,

VTune does not recognize OpenCL SDK/MediaSDK. Which is the test that VTune does to locate Intel OpenCL SDK or Intel Media SDK?

What is wrong with my VTune? (I am using the 2018 version).

I have both installed.

If I go to Platform Analysis, GPU Hotspots it says: "Cannot collect GPU hardware metrics. Make sure the Intel OpenCL SDK or Intel Media SDK is installed.".

I am running amplxe-gui as root, and It does not matter what I select, the checkbox "Trace OpenCL and Intel Media SDK programs (Intel Graphics Driver only)" is always unchecked under Advanced Hotspots.

I am in an Arch Linux with vtsspp loaded correctly and Intel Core i5-6200U and HD Graphics 520.

I ran the self-checker script and this is the output (I attach the log):

$ sudo ./amplxe-self-checker.sh

Intel(R) VTune(TM) Amplifier Self Check Utility
Copyright (C) 2009-2017 Intel Corporation. All rights reserved.
Build Number: 525261

Instrumentation based analysis check
Example of analysis types: Hotspots, Concurrency, Locks and Waits
    Collection: Ok
amplxe: Warning: Can't find 32-bit pin tool. 32-bit processes will not be profiled.
    Finalization: Ok
    Report: Fail

HW event-based analysis check (Intel driver)
Example of analysis types: Advanced Hotspots, HPC Performance Characterization, etc.
    Collection: Ok
amplxe: Warning: To enable hardware event-base sampling, VTune Amplifier has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok
    Report: Fail

HW event-based analysis check (Intel driver)
Example of analysis types: General Exploration
    Collection: Ok
amplxe: Warning: To enable hardware event-base sampling, VTune Amplifier has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok
    Report: Fail

HW event-based analysis with uncore events (Intel driver)
Example of analysis types: Memory Access
    Collection: Ok
amplxe: Warning: To enable hardware event-base sampling, VTune Amplifier has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok
    Report: Fail

HW event-based analysis with stacks (Intel driver)
Example of analysis types: Advanced Hotspots with Stacks, etc.
    Collection: Ok
amplxe: Warning: To enable hardware event-base sampling, VTune Amplifier has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok
    Report: Fail

The check observed a product failure on your system.
Review errors in the output above to fix a problem or contact Intel technical support.

Log location: /tmp/amplxe-tmp-root/self-checker-2017.12.11_13.52.17/log.txt

 

It fails in everyone, although If I run VTune it works, although it doesn't show anything related with OpenCL (that is what I need).

And here clinfo:

 

Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.0
  Platform Name:                                 Intel(R) OpenCL
  Platform Vendor:                               Intel(R) Corporation
  Platform Extensions:                           cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir


  Platform Name:                                 Intel(R) OpenCL
Number of devices:                               2
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     8086h
  Max compute units:                             24
  Max work items dimensions:                     3
    Max work items[0]:                           256
    Max work items[1]:                           256
    Max work items[2]:                           256
  Max work group size:                           256
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      16
  Native vector width short:                     8
  Native vector width int:                       4
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1000Mhz
  Address bits:                                  64
  Max memory allocation:                         3296224870
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          128
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            16384
  Max image 3D height:                           16384
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    524288
  Global memory size:                            6592449741
  Constant buffer size:                          3296224870
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  1
  Max pipe packet size:                          1024
  Max global variable size:                      65536
  Max global variable preferred total size:      3296224870
  Max read/write image args:                     128
  Max on device events:                          1024
  Queue on device max size:                      67108864
  Max on device queues:                          1
  Queue on device preferred size:                131072
  SVM capabilities:
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           No
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           64
  Preferred global atomic alignment:             64
  Preferred local atomic alignment:              64
  Kernel Preferred work group size multiple:     32
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    83
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x11b1110
  Name:                                          Intel(R) HD Graphics
  Vendor:                                        Intel(R) Corporation
  Device OpenCL C version:                       OpenCL C 2.0
  Driver version:                                r4.0.59481
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0
  Extensions:                                    cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_driver_diagnostics cl_intel_media_block_io cl_intel_motion_estimation cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_required_subgroup_size cl_intel_subgroups cl_intel_subgroups_short cl_intel_va_api_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp16 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spir cl_khr_subgroups


  Device Type:                                   CL_DEVICE_TYPE_CPU
  Vendor ID:                                     8086h
  Max compute units:                             4
  Max work items dimensions:                     3
    Max work items[0]:                           8192
    Max work items[1]:                           8192
    Max work items[2]:                           8192
  Max work group size:                           8192
  Preferred vector width char:                   1
  Preferred vector width short:                  1
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      32
  Native vector width short:                     16
  Native vector width int:                       8
  Native vector width long:                      4
  Native vector width float:                     8
  Native vector width double:                    4
  Max clock frequency:                           2300Mhz
  Address bits:                                  64
  Max memory allocation:                         2062761984
  Image support:                                 Yes
  Max number of images read arguments:           480
  Max number of images write arguments:          480
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    480
  Max size of kernel argument:                   3840
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               No
    Round to +ve and infinity:                   No
    IEEE754-2008 fused multiply-add:             No
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    262144
  Global memory size:                            8251047936
  Constant buffer size:                          131072
  Max number of constant args:                   480
  Local memory type:                             Global
  Local memory size:                             32768
  Max pipe arguments:                            16
  Max pipe active reservations:                  65535
  Max pipe packet size:                          1024
  Max global variable size:                      65536
  Max global variable preferred total size:      65536
  Max read/write image args:                     480
  Max on device events:                          4294967295
  Queue on device max size:                      4294967295
  Max on device queues:                          4294967295
  Queue on device preferred size:                4294967295
  SVM capabilities:
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           No
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           64
  Preferred global atomic alignment:             64
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     128
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     Yes
  Queue on Host properties:
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x11b1110
  Name:                                          Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
  Vendor:                                        Intel(R) Corporation
  Device OpenCL C version:                       OpenCL C 2.0
  Driver version:                                1.2.0.400
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 (Build 400)
  Extensions:                                    cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer

 

 

0 Kudos
0 Replies
Reply