Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel ICX Compiler , Intel® DPC++ Compatibility Tool, and GDB*
584 Discussions

Undefined variable "mysize" in the kernel Vector Add2 example

Mitsuboh
Novice
1,036 Views

I would like to know what "mysize" in the program below.

In oneAPI GPU Optimization Guide ( oneapi_optimization-guide-gpu_2023.1-771772-773648 ),  page 25, There is a code example VectorAdd2().

 

....

h.parallel_for(

 sycl::nd_range<1>(num_groups * wg_size, wg_size),

 [=](sycl::nd_item<1> index) [[intel::reqd_sub_group_size(32)]] {

  size_t grp_id = index.get_group()[0];

  size_t loc_id = index.get_local_id();

  size_t start = grp_id * mysize;

....

 

Would you please give me more information about this "mysize". This appears several times after this example but not definition I could find.

Labels (1)
0 Kudos
1 Solution
Mitsuboh
Novice
980 Views

Vankudothu,

Thank you for the explanation.  Could you upload or send me the code mysize.cpp ?

-Mitsuboh

View solution in original post

0 Kudos
4 Replies
VaishnaviV_Intel
Moderator
987 Views

Hi,

 

Thank you for posting on Intel communities.

"mysize" is the number of elements that each work item in the kernel should process i.e how many no. of elements it should process for each thread in the work group. We tried running a sample code and it worked fine for us.

Please find the attachment for the sample code that we used.

Output:

VaishnaviV_Intel_0-1683538902880.png

 

If you face any issues, please let us know.

Thanks & Regards,

Vankudothu Vaishnavi.

 

 

0 Kudos
Mitsuboh
Novice
981 Views

Vankudothu,

Thank you for the explanation.  Could you upload or send me the code mysize.cpp ?

-Mitsuboh

0 Kudos
VaishnaviV_Intel
Moderator
938 Views

Hi,

 

Thanks for accepting the solution. Please find below the mysize.cpp code as requested. If your issue is resolved, could you please let us know if we can go ahead and close this thread on our end?

 

Thanks and Regards,

Vaishnavi Vankudothu.

 

mysize.cpp

 

#include <CL/sycl.hpp>

#include <iostream>

#include <chrono>

#include<stdlib.h>

 

constexpr size_t groups = 2;

constexpr size_t mysize = 1024/groups;

 

using IntArray = std::vector<int>;

 

int VectorAdd2(sycl::queue &q, const IntArray &a, const IntArray &b,

              IntArray &sum, int iter) {

 sycl::range num_items{a.size()};

 sycl::buffer a_buf(a);

 sycl::buffer b_buf(b);

 sycl::buffer sum_buf(sum.data(), num_items);

 size_t num_groups = groups;

 size_t wg_size = 512;

 // get the max wg_sie instead of 512 size_t wg_size = 512;

 auto start = std::chrono::steady_clock::now();

 q.submit([&](auto &h) {

   // Input accessors

   sycl::accessor a_acc(a_buf, h, sycl::read_only);

   sycl::accessor b_acc(b_buf, h, sycl::read_only);

   // Output accessor

   sycl::accessor sum_acc(sum_buf, h, sycl::write_only, sycl::no_init);

   h.parallel_for(

       sycl::nd_range<1>(num_groups * wg_size, wg_size),

       [=](sycl::nd_item<1> index) [[intel::reqd_sub_group_size(32)]] {

         size_t grp_id = index.get_group()[0];

         size_t loc_id = index.get_local_id();

         size_t start = grp_id * mysize;

         size_t end = start + mysize;

         for (int j = 0; j < iter; j++)

           for (size_t i = start + loc_id; i < end; i += wg_size) {

             sum_acc[i] = a_acc[i] + b_acc[i];

           }

       });

 });

 q.wait();

 auto end = std::chrono::steady_clock::now();

    std::cout << "VectorAdd2<" << groups << "> completed on device - took "

 << (end - start).count() << " u-secs\n";

 return ((end - start).count());

 

 return ((end - start).count());

} // end VectorAdd2

 

int main() {

 // Create a SYCL device queue

 sycl::queue q{sycl::cpu_selector_v};

const int SIZE =32*32;

 IntArray a(SIZE, 1);

 IntArray b(SIZE, 2);

 IntArray sum(SIZE, 0);

 int elapsed_time = VectorAdd2(q, a, b, sum,1);

 

 // Print the output

 std::cout << "Elapsed time: " << elapsed_time << " u-secs\n";

 std::cout << "Output array: ";

 for (const auto& s : sum) {

   std::cout << s << " ";

 }

 std::cout << std::endl;

 

 return 0;

}

 

0 Kudos
VaishnaviV_Intel
Moderator
869 Views

Hi,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Vankudothu Vaishnavi.


0 Kudos
Reply