Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
803 Discussions

Undefined variable "mysize" in the kernel Vector Add2 example

Mitsuboh
Novice
2,476 Views

I would like to know what "mysize" in the program below.

In oneAPI GPU Optimization Guide ( oneapi_optimization-guide-gpu_2023.1-771772-773648 ),  page 25, There is a code example VectorAdd2().

 

....

h.parallel_for(

 sycl::nd_range<1>(num_groups * wg_size, wg_size),

 [=](sycl::nd_item<1> index) [[intel::reqd_sub_group_size(32)]] {

  size_t grp_id = index.get_group()[0];

  size_t loc_id = index.get_local_id();

  size_t start = grp_id * mysize;

....

 

Would you please give me more information about this "mysize". This appears several times after this example but not definition I could find.

Labels (1)
0 Kudos
1 Solution
Mitsuboh
Novice
2,420 Views

Vankudothu,

Thank you for the explanation.  Could you upload or send me the code mysize.cpp ?

-Mitsuboh

View solution in original post

0 Kudos
4 Replies
VaishnaviV_Intel
Employee
2,427 Views

Hi,

 

Thank you for posting on Intel communities.

"mysize" is the number of elements that each work item in the kernel should process i.e how many no. of elements it should process for each thread in the work group. We tried running a sample code and it worked fine for us.

Please find the attachment for the sample code that we used.

Output:

VaishnaviV_Intel_0-1683538902880.png

 

If you face any issues, please let us know.

Thanks & Regards,

Vankudothu Vaishnavi.

 

 

0 Kudos
Mitsuboh
Novice
2,421 Views

Vankudothu,

Thank you for the explanation.  Could you upload or send me the code mysize.cpp ?

-Mitsuboh

0 Kudos
VaishnaviV_Intel
Employee
2,378 Views

Hi,

 

Thanks for accepting the solution. Please find below the mysize.cpp code as requested. If your issue is resolved, could you please let us know if we can go ahead and close this thread on our end?

 

Thanks and Regards,

Vaishnavi Vankudothu.

 

mysize.cpp

 

#include <CL/sycl.hpp>

#include <iostream>

#include <chrono>

#include<stdlib.h>

 

constexpr size_t groups = 2;

constexpr size_t mysize = 1024/groups;

 

using IntArray = std::vector<int>;

 

int VectorAdd2(sycl::queue &q, const IntArray &a, const IntArray &b,

              IntArray &sum, int iter) {

 sycl::range num_items{a.size()};

 sycl::buffer a_buf(a);

 sycl::buffer b_buf(b);

 sycl::buffer sum_buf(sum.data(), num_items);

 size_t num_groups = groups;

 size_t wg_size = 512;

 // get the max wg_sie instead of 512 size_t wg_size = 512;

 auto start = std::chrono::steady_clock::now();

 q.submit([&](auto &h) {

   // Input accessors

   sycl::accessor a_acc(a_buf, h, sycl::read_only);

   sycl::accessor b_acc(b_buf, h, sycl::read_only);

   // Output accessor

   sycl::accessor sum_acc(sum_buf, h, sycl::write_only, sycl::no_init);

   h.parallel_for(

       sycl::nd_range<1>(num_groups * wg_size, wg_size),

       [=](sycl::nd_item<1> index) [[intel::reqd_sub_group_size(32)]] {

         size_t grp_id = index.get_group()[0];

         size_t loc_id = index.get_local_id();

         size_t start = grp_id * mysize;

         size_t end = start + mysize;

         for (int j = 0; j < iter; j++)

           for (size_t i = start + loc_id; i < end; i += wg_size) {

             sum_acc[i] = a_acc[i] + b_acc[i];

           }

       });

 });

 q.wait();

 auto end = std::chrono::steady_clock::now();

    std::cout << "VectorAdd2<" << groups << "> completed on device - took "

 << (end - start).count() << " u-secs\n";

 return ((end - start).count());

 

 return ((end - start).count());

} // end VectorAdd2

 

int main() {

 // Create a SYCL device queue

 sycl::queue q{sycl::cpu_selector_v};

const int SIZE =32*32;

 IntArray a(SIZE, 1);

 IntArray b(SIZE, 2);

 IntArray sum(SIZE, 0);

 int elapsed_time = VectorAdd2(q, a, b, sum,1);

 

 // Print the output

 std::cout << "Elapsed time: " << elapsed_time << " u-secs\n";

 std::cout << "Output array: ";

 for (const auto& s : sum) {

   std::cout << s << " ";

 }

 std::cout << std::endl;

 

 return 0;

}

 

0 Kudos
VaishnaviV_Intel
Employee
2,309 Views

Hi,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Vankudothu Vaishnavi.


0 Kudos
Reply