Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and GDB*

Quick random number on SYCL device

danielarndt
Beginner
951 Views

I need a random value in kernels preferably by just reading some hardware counters. For CUDA I would simply use clock64(). Is there something similar I could use with Intel GPUs using SYCL?

0 Kudos
1 Solution
Alina_S_Intel
Employee
765 Views

Thank you for your patience while we were preparing the answer to your question.


There is an OpenCL built-in intrinsic intel_get_cycle_counter. You can use it inside your GPU kernels to get the GPU execution unit timestamp. For more information, please refer to https://github.com/intel/pti-gpu/blob/master/chapters/binary_instrumentation/OpenCLBuiltIn.md


#include <CL/sycl.hpp>

using namespace sycl;

typedef unsigned long ulong;

 

#ifdef __SYCL_DEVICE_ONLY__

extern SYCL_EXTERNAL ulong __attribute__((overloadable)) intel_get_cycle_counter( void );

#endif


int main() {

 queue q(gpu_selector{});

 std::cout << "Device : " << q.get_device().get_info<info::device::name>() << std::endl;


 q.submit([&](handler &h) {

auto out = stream(1024, 768, h);

h.parallel_for(range<1>(100000), [=](auto i) { 

out << "intel_get_cycle_counter: ";

#ifdef __SYCL_DEVICE_ONLY__

ulong cycle_counter = intel_get_cycle_counter();

out << cycle_counter << endl;

#endif

 });

 }).wait();

  

 std::cout << "Done" << std::endl;

 return 0; 

}


View solution in original post

8 Replies
AbhishekD_Intel
Moderator
912 Views

Hi Daniel,


Thanks for reaching out to us.

There are multiple ways to implement the random number generation in your DPCPP code. Depending on the use-case you can either generate the random number inside your kernel or you can pass the randomly generated values to the kernel.


Based on the requirement you can choose different engines and can use their output to generate various random sequences(using different distributions).

Please refer to the below link for in-depth details of the random number generation usage using oneAPI DPC++ and oneDPL library.

https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top/ra...


You can also use Intel oneMKL Library to generate random numbers, for more details please refer to the below link for oneMKL RNG model usage.

https://software.intel.com/content/www/us/en/develop/documentation/oneapi-mkl-dpcpp-developer-refere...


You can choose any of the random number generators depending on your use-case.

Hope the provided details will help to solve your issues.



Warm Regards,

Abhishek


danielarndt
Beginner
905 Views

Hi Abhishek,

 

Thank you for your response. Unfortunately, using the random generators in oneDPL or oneMKL doesn't really seem to fit my use case.

What I need/want is rather an entropy source on the device so that concurrently running threads obtain significantly different values. The generated distribution doesn't need to be particularly elaborate. Also generating these values on the host is not really an option since I  don't control the invocation of the device function (and don't even have access to the sycl::item ideally).

Hence, my question still is if there is a way to access any hardware counter that fulfills this purpose on the device (using assembly or intrinsics would be fine, in the worst case also just for Intel GPUs.

 

Best,

Daniel

AbhishekD_Intel
Moderator
832 Views

Hi Daniel,

 

Thanks for the details. Here is a way to achieve this kind of scenario you are looking for. You can generate some random seeds on the CPU based on the randomness you want and can pass that seed value to the kernel. Then you can feed this seed value and the offset of the threads to the random number engines for generating thread-safe, significantly random, and kind of unpredictable random numbers using different random generation functions inside your kernel.
Please refer to the below code snippet for more insight.

 

 

sycl::queue queue(sycl::default_selector{});

    std::int64_t nsamples = 20;

    //generating seed value;
    auto now = std::chrono::system_clock::now();
    std::uint32_t seed = std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()).count();

    std::vector<float> x(nsamples);
    {
        sycl::buffer<float, 1> x_buf(x.data(), sycl::range<1>(x.size()));

        queue.submit([&] (sycl::handler &cgh) {

            auto x_acc =
            x_buf.template get_access<sycl::access::mode::write>(cgh);

            cgh.parallel_for<class count_kernel>(sycl::range<1>(nsamples),
                [=](sycl::item<1> idx) {
                std::uint64_t offset = idx.get_linear_id();
                // Create minstd_rand engine (use any engine)
                oneapi::dpl::minstd_rand engine(seed, offset);

                // Create float uniform_real_distribution distribution (use any distribution)
                oneapi::dpl::uniform_real_distribution<float> distr;

                // Generate random number
                auto res = distr(engine);

                // Use results from "res"
                ....
            });
        });
    }

 

 

Intel is yet to come up with a dedicated function for hardware that will act as a counter for you to generate unpredictable random numbers.

Hope this will help you to generate the randomness according to your use-case.

 

 

Warm Regards,

Abhishek

 

danielarndt
Beginner
813 Views

Abhishek,

 

Thank you for your answer and the code snippet. Unfortunately, I don't have direct access to the call site and can't generate anything on the host to copy over (with a reasonable amount of changes to the framework to be fitted). I would be happy about any DPC++ extensions going in the direction of my use case.

The actual use case is to manage resources specific to each running thread. For now, a simple compare exchange with a lock array always starting from the same guess is good enough (although slow due to contention).

 

Best,

Daniel

AbhishekD_Intel
Moderator
805 Views

Hi Daniel,

 

Thanks for the details. We are working on it and we will get back to you.

 

-Abhishek

 

Alina_S_Intel
Employee
766 Views

Thank you for your patience while we were preparing the answer to your question.


There is an OpenCL built-in intrinsic intel_get_cycle_counter. You can use it inside your GPU kernels to get the GPU execution unit timestamp. For more information, please refer to https://github.com/intel/pti-gpu/blob/master/chapters/binary_instrumentation/OpenCLBuiltIn.md


#include <CL/sycl.hpp>

using namespace sycl;

typedef unsigned long ulong;

 

#ifdef __SYCL_DEVICE_ONLY__

extern SYCL_EXTERNAL ulong __attribute__((overloadable)) intel_get_cycle_counter( void );

#endif


int main() {

 queue q(gpu_selector{});

 std::cout << "Device : " << q.get_device().get_info<info::device::name>() << std::endl;


 q.submit([&](handler &h) {

auto out = stream(1024, 768, h);

h.parallel_for(range<1>(100000), [=](auto i) { 

out << "intel_get_cycle_counter: ";

#ifdef __SYCL_DEVICE_ONLY__

ulong cycle_counter = intel_get_cycle_counter();

out << cycle_counter << endl;

#endif

 });

 }).wait();

  

 std::cout << "Done" << std::endl;

 return 0; 

}


danielarndt
Beginner
756 Views

Thanks @Alina_S_Intel! That's pretty much what I was looking for.

Alina_S_Intel
Employee
601 Views

We will no longer respond to this thread.  

If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Thanks,


Reply