- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need a random value in kernels preferably by just reading some hardware counters. For CUDA I would simply use clock64(). Is there something similar I could use with Intel GPUs using SYCL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your patience while we were preparing the answer to your question.
There is an OpenCL built-in intrinsic intel_get_cycle_counter. You can use it inside your GPU kernels to get the GPU execution unit timestamp. For more information, please refer to https://github.com/intel/pti-gpu/blob/master/chapters/binary_instrumentation/OpenCLBuiltIn.md
#include <CL/sycl.hpp>
using namespace sycl;
typedef unsigned long ulong;
#ifdef __SYCL_DEVICE_ONLY__
extern SYCL_EXTERNAL ulong __attribute__((overloadable)) intel_get_cycle_counter( void );
#endif
int main() {
queue q(gpu_selector{});
std::cout << "Device : " << q.get_device().get_info<info::device::name>() << std::endl;
q.submit([&](handler &h) {
auto out = stream(1024, 768, h);
h.parallel_for(range<1>(100000), [=](auto i) {
out << "intel_get_cycle_counter: ";
#ifdef __SYCL_DEVICE_ONLY__
ulong cycle_counter = intel_get_cycle_counter();
out << cycle_counter << endl;
#endif
});
}).wait();
std::cout << "Done" << std::endl;
return 0;
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
Thanks for reaching out to us.
There are multiple ways to implement the random number generation in your DPCPP code. Depending on the use-case you can either generate the random number inside your kernel or you can pass the randomly generated values to the kernel.
Based on the requirement you can choose different engines and can use their output to generate various random sequences(using different distributions).
Please refer to the below link for in-depth details of the random number generation usage using oneAPI DPC++ and oneDPL library.
You can also use Intel oneMKL Library to generate random numbers, for more details please refer to the below link for oneMKL RNG model usage.
You can choose any of the random number generators depending on your use-case.
Hope the provided details will help to solve your issues.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Abhishek,
Thank you for your response. Unfortunately, using the random generators in oneDPL or oneMKL doesn't really seem to fit my use case.
What I need/want is rather an entropy source on the device so that concurrently running threads obtain significantly different values. The generated distribution doesn't need to be particularly elaborate. Also generating these values on the host is not really an option since I don't control the invocation of the device function (and don't even have access to the sycl::item ideally).
Hence, my question still is if there is a way to access any hardware counter that fulfills this purpose on the device (using assembly or intrinsics would be fine, in the worst case also just for Intel GPUs.
Best,
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
Thanks for the details. Here is a way to achieve this kind of scenario you are looking for. You can generate some random seeds on the CPU based on the randomness you want and can pass that seed value to the kernel. Then you can feed this seed value and the offset of the threads to the random number engines for generating thread-safe, significantly random, and kind of unpredictable random numbers using different random generation functions inside your kernel.
Please refer to the below code snippet for more insight.
sycl::queue queue(sycl::default_selector{});
std::int64_t nsamples = 20;
//generating seed value;
auto now = std::chrono::system_clock::now();
std::uint32_t seed = std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()).count();
std::vector<float> x(nsamples);
{
sycl::buffer<float, 1> x_buf(x.data(), sycl::range<1>(x.size()));
queue.submit([&] (sycl::handler &cgh) {
auto x_acc =
x_buf.template get_access<sycl::access::mode::write>(cgh);
cgh.parallel_for<class count_kernel>(sycl::range<1>(nsamples),
[=](sycl::item<1> idx) {
std::uint64_t offset = idx.get_linear_id();
// Create minstd_rand engine (use any engine)
oneapi::dpl::minstd_rand engine(seed, offset);
// Create float uniform_real_distribution distribution (use any distribution)
oneapi::dpl::uniform_real_distribution<float> distr;
// Generate random number
auto res = distr(engine);
// Use results from "res"
....
});
});
}
Intel is yet to come up with a dedicated function for hardware that will act as a counter for you to generate unpredictable random numbers.
Hope this will help you to generate the randomness according to your use-case.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Abhishek,
Thank you for your answer and the code snippet. Unfortunately, I don't have direct access to the call site and can't generate anything on the host to copy over (with a reasonable amount of changes to the framework to be fitted). I would be happy about any DPC++ extensions going in the direction of my use case.
The actual use case is to manage resources specific to each running thread. For now, a simple compare exchange with a lock array always starting from the same guess is good enough (although slow due to contention).
Best,
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
Thanks for the details. We are working on it and we will get back to you.
-Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your patience while we were preparing the answer to your question.
There is an OpenCL built-in intrinsic intel_get_cycle_counter. You can use it inside your GPU kernels to get the GPU execution unit timestamp. For more information, please refer to https://github.com/intel/pti-gpu/blob/master/chapters/binary_instrumentation/OpenCLBuiltIn.md
#include <CL/sycl.hpp>
using namespace sycl;
typedef unsigned long ulong;
#ifdef __SYCL_DEVICE_ONLY__
extern SYCL_EXTERNAL ulong __attribute__((overloadable)) intel_get_cycle_counter( void );
#endif
int main() {
queue q(gpu_selector{});
std::cout << "Device : " << q.get_device().get_info<info::device::name>() << std::endl;
q.submit([&](handler &h) {
auto out = stream(1024, 768, h);
h.parallel_for(range<1>(100000), [=](auto i) {
out << "intel_get_cycle_counter: ";
#ifdef __SYCL_DEVICE_ONLY__
ulong cycle_counter = intel_get_cycle_counter();
out << cycle_counter << endl;
#endif
});
}).wait();
std::cout << "Done" << std::endl;
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Alina_S_Intel! That's pretty much what I was looking for.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We will no longer respond to this thread.
If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Thanks,

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page