Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel ICX Compiler , Intel® DPC++ Compatibility Tool, and GDB*
583 Discussions

Request for Intel OpenCL Offline Compiler (OCLOC)

Viet-Duc
Novice
1,503 Views

 

Hi,

 

I am trying to perform ahead of time compilation following the instruction here:

https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compilation/ahead-of-time-compilation.html

https://software.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/install-opencl-offline-compiler-ocloc.html

Using the following compiler options:

dpcpp -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs "-device Gen9,dg1,Gen12HP" vector-add.cpp. 

The error message is:

dpcpp: error: unable to execute command: Executable "ocloc" doesn't exist!

As I understand, the NDA queue is not configured to accept interactive job (qsub -I). 

If I compile on login node and run on Xe_HP, the performance is not as good as expected. For instance, the result of memory bandwidth measurement is aproximately 40% of theoretical maximum of 800GB/s.

Would you make ocloc available system-wide ? If not, is there a way to install it locally ?

Regards.

0 Kudos
1 Solution
Jie_L_Intel
Employee
1,382 Views

The devcloud does not install the ocloc right now. You could try it following the installation guide if you have your own develop machine that could install oneAPI.

https://software.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-opencl-offline-compiler-ocloc.html


View solution in original post

0 Kudos
3 Replies
Gopika_Intel
Moderator
1,468 Views

Hi,

Thank you for reaching out. We are checking this internally. We will get back to you as soon as we get an update.

Regards

Gopika


0 Kudos
Viet-Duc
Novice
1,435 Views

Thanks for moving the post to the suitable forum.

 

I am not sure if I can show performance of NDA hardware here. Nevertheless, I need to make my case.

The measurement is done using BabelStream, which is a extension of Stream benchmark for heterogenous platforms.

The triad kernel implemented in SYCL is as follow:

template <class T>
void SYCLStream<T>::triad()
{
  const T scalar = startScalar;
  queue->submit([&](handler &cgh)
  {
    auto ka = d_a->template get_access<access::mode::write>(cgh);
    auto kb = d_b->template get_access<access::mode::read>(cgh);
    auto kc = d_c->template get_access<access::mode::read>(cgh);
    cgh.parallel_for<triad_kernel>(range<1>{array_size}, [=](id<1> idx)
    {
      ka[idx] = kb[idx] + scalar * kc[idx];
    });
  });
  queue->wait();
}

This implementation does not use USM and work group size is determined by OpenCL runtime, which may affect performance.

 

Result for DG1 is below:

Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using SYCL device Intel(R) Iris(R) Xe MAX Graphics [0x4905]
Driver: 21.11.19310
Function    MBytes/sec  Min (sec)   Max         Average
Triad       59800.403   0.01347     0.01361     0.01354 

This translates to 86% of theoretical bandwidth.

 

Result for Xe_HP is below:

...
Using SYCL device Intel(R) Graphics [0x0205]
Driver: 21.12.019357+embargo458
Function    MBytes/sec  Min (sec)   Max         Averag
Triad       282888.513  0.00285     0.00368     0.00345

This translates to only 36% of the theoretical bandwidth. Tuning workgroup size may give 10% margin gain.

Thus, I am trying to use AOT compilation since the NDA queue does not accept iteractive jobs.

 

Your insights on this issue is much appreciated.

Thanks.

0 Kudos
Jie_L_Intel
Employee
1,383 Views

The devcloud does not install the ocloc right now. You could try it following the installation guide if you have your own develop machine that could install oneAPI.

https://software.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-opencl-offline-compiler-ocloc.html


0 Kudos
Reply