- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to know what "mysize" in the program below.
In oneAPI GPU Optimization Guide ( oneapi_optimization-guide-gpu_2023.1-771772-773648 ), page 25, There is a code example VectorAdd2().
....
h.parallel_for(
sycl::nd_range<1>(num_groups * wg_size, wg_size),
[=](sycl::nd_item<1> index) [[intel::reqd_sub_group_size(32)]] {
size_t grp_id = index.get_group()[0];
size_t loc_id = index.get_local_id();
size_t start = grp_id * mysize;
....
Would you please give me more information about this "mysize". This appears several times after this example but not definition I could find.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vankudothu,
Thank you for the explanation. Could you upload or send me the code mysize.cpp ?
-Mitsuboh
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting on Intel communities.
"mysize" is the number of elements that each work item in the kernel should process i.e how many no. of elements it should process for each thread in the work group. We tried running a sample code and it worked fine for us.
Please find the attachment for the sample code that we used.
Output:
If you face any issues, please let us know.
Thanks & Regards,
Vankudothu Vaishnavi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vankudothu,
Thank you for the explanation. Could you upload or send me the code mysize.cpp ?
-Mitsuboh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for accepting the solution. Please find below the mysize.cpp code as requested. If your issue is resolved, could you please let us know if we can go ahead and close this thread on our end?
Thanks and Regards,
Vaishnavi Vankudothu.
mysize.cpp
#include <CL/sycl.hpp>
#include <iostream>
#include <chrono>
#include<stdlib.h>
constexpr size_t groups = 2;
constexpr size_t mysize = 1024/groups;
using IntArray = std::vector<int>;
int VectorAdd2(sycl::queue &q, const IntArray &a, const IntArray &b,
IntArray &sum, int iter) {
sycl::range num_items{a.size()};
sycl::buffer a_buf(a);
sycl::buffer b_buf(b);
sycl::buffer sum_buf(sum.data(), num_items);
size_t num_groups = groups;
size_t wg_size = 512;
// get the max wg_sie instead of 512 size_t wg_size = 512;
auto start = std::chrono::steady_clock::now();
q.submit([&](auto &h) {
// Input accessors
sycl::accessor a_acc(a_buf, h, sycl::read_only);
sycl::accessor b_acc(b_buf, h, sycl::read_only);
// Output accessor
sycl::accessor sum_acc(sum_buf, h, sycl::write_only, sycl::no_init);
h.parallel_for(
sycl::nd_range<1>(num_groups * wg_size, wg_size),
[=](sycl::nd_item<1> index) [[intel::reqd_sub_group_size(32)]] {
size_t grp_id = index.get_group()[0];
size_t loc_id = index.get_local_id();
size_t start = grp_id * mysize;
size_t end = start + mysize;
for (int j = 0; j < iter; j++)
for (size_t i = start + loc_id; i < end; i += wg_size) {
sum_acc[i] = a_acc[i] + b_acc[i];
}
});
});
q.wait();
auto end = std::chrono::steady_clock::now();
std::cout << "VectorAdd2<" << groups << "> completed on device - took "
<< (end - start).count() << " u-secs\n";
return ((end - start).count());
return ((end - start).count());
} // end VectorAdd2
int main() {
// Create a SYCL device queue
sycl::queue q{sycl::cpu_selector_v};
const int SIZE =32*32;
IntArray a(SIZE, 1);
IntArray b(SIZE, 2);
IntArray sum(SIZE, 0);
int elapsed_time = VectorAdd2(q, a, b, sum,1);
// Print the output
std::cout << "Elapsed time: " << elapsed_time << " u-secs\n";
std::cout << "Output array: ";
for (const auto& s : sum) {
std::cout << s << " ";
}
std::cout << std::endl;
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Thanks & Regards,
Vankudothu Vaishnavi.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page