Solved: Undefined variable "mysize" in the kernel Vector Add2 example

Mitsuboh · ‎05-05-2023

I would like to know what "mysize" in the program below.

In oneAPI GPU Optimization Guide ( oneapi_optimization-guide-gpu_2023.1-771772-773648 ), page 25, There is a code example VectorAdd2().

....

h.parallel_for(

sycl::nd_range<1>(num_groups * wg_size, wg_size),

[=](sycl::nd_item<1> index) [[intel::reqd_sub_group_size(32)]] {

size_t grp_id = index.get_group()[0];

size_t loc_id = index.get_local_id();

size_t start = grp_id * mysize;

....

Would you please give me more information about this "mysize". This appears several times after this example but not definition I could find.

Mitsuboh · ‎05-08-2023

Vankudothu,

Thank you for the explanation. Could you upload or send me the code mysize.cpp ?

-Mitsuboh

View solution in original post

VaishnaviV_Intel · ‎05-08-2023

Hi,

Thank you for posting on Intel communities.

"mysize" is the number of elements that each work item in the kernel should process i.e how many no. of elements it should process for each thread in the work group. We tried running a sample code and it worked fine for us.

Please find the attachment for the sample code that we used.

Output:

If you face any issues, please let us know.

Thanks & Regards,

Vankudothu Vaishnavi.

Mitsuboh · ‎05-08-2023

Vankudothu,

Thank you for the explanation. Could you upload or send me the code mysize.cpp ?

-Mitsuboh

VaishnaviV_Intel · ‎05-08-2023

Hi,

Thanks for accepting the solution. Please find below the mysize.cpp code as requested. If your issue is resolved, could you please let us know if we can go ahead and close this thread on our end?

Thanks and Regards,

Vaishnavi Vankudothu.

mysize.cpp

#include <CL/sycl.hpp>

#include <iostream>

#include <chrono>

#include<stdlib.h>

 

constexpr size_t groups = 2;

constexpr size_t mysize = 1024/groups;

 

using IntArray = std::vector<int>;

 

int VectorAdd2(sycl::queue &q, const IntArray &a, const IntArray &b,

              IntArray &sum, int iter) {

 sycl::range num_items{a.size()};

 sycl::buffer a_buf(a);

 sycl::buffer b_buf(b);

 sycl::buffer sum_buf(sum.data(), num_items);

 size_t num_groups = groups;

 size_t wg_size = 512;

 // get the max wg_sie instead of 512 size_t wg_size = 512;

 auto start = std::chrono::steady_clock::now();

 q.submit([&](auto &h) {

   // Input accessors

   sycl::accessor a_acc(a_buf, h, sycl::read_only);

   sycl::accessor b_acc(b_buf, h, sycl::read_only);

   // Output accessor

   sycl::accessor sum_acc(sum_buf, h, sycl::write_only, sycl::no_init);

   h.parallel_for(

       sycl::nd_range<1>(num_groups * wg_size, wg_size),

       [=](sycl::nd_item<1> index) [[intel::reqd_sub_group_size(32)]] {

         size_t grp_id = index.get_group()[0];

         size_t loc_id = index.get_local_id();

         size_t start = grp_id * mysize;

         size_t end = start + mysize;

         for (int j = 0; j < iter; j++)

           for (size_t i = start + loc_id; i < end; i += wg_size) {

             sum_acc[i] = a_acc[i] + b_acc[i];

           }

       });

 });

 q.wait();

 auto end = std::chrono::steady_clock::now();

    std::cout << "VectorAdd2<" << groups << "> completed on device - took "

 << (end - start).count() << " u-secs\n";

 return ((end - start).count());

 

 return ((end - start).count());

} // end VectorAdd2

 

int main() {

 // Create a SYCL device queue

 sycl::queue q{sycl::cpu_selector_v};

const int SIZE =32*32;

 IntArray a(SIZE, 1);

 IntArray b(SIZE, 2);

 IntArray sum(SIZE, 0);

 int elapsed_time = VectorAdd2(q, a, b, sum,1);

 

 // Print the output

 std::cout << "Elapsed time: " << elapsed_time << " u-secs\n";

 std::cout << "Output array: ";

 for (const auto& s : sum) {

   std::cout << s << " ";

 }

 std::cout << std::endl;

 

 return 0;

}

VaishnaviV_Intel · ‎05-11-2023

Hi,

We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Thanks & Regards,

Vankudothu Vaishnavi.

Undefined variable "mysize" in the kernel Vector Add2 example

Documentation