hidden text to trigger early load of fonts ПродукцияПродукцияПродукцияПродукция Các sản phẩmCác sản phẩmCác sản phẩmCác sản phẩm المنتجاتالمنتجاتالمنتجاتالمنتجات מוצריםמוצריםמוצריםמוצרים
Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*

Skipping items in parallel_for()

leilag
新手
4,601 次查看
Hi,
In case I don't want to iterate all the items, is there an efficient way to break parallel_for()?
 
Thanks,
Leila
0 项奖励
1 解答
Sravani_K_Intel
主持人
4,292 次查看

Hi Leila,


Compiler engineers confirmed that the warning was emitted by mistake and the loop is being successfully unrolled. The false warning issue is now fixed and you will no longer see it in the upcoming release of the compiler.


在原帖中查看解决方案

0 项奖励
12 回复数
VidyalathaB_Intel
主持人
4,578 次查看

Hi,

Thanks for reaching out to us.

Could you please let us know the Use Case on which you want to implement this >> way to break parallel_for()

and also the way which you are following currently in your code, for skipping some of the iterations ?

So that we can work on it from our end.

Thanks & Regards,

Vidya.

 

 

0 项奖励
leilag
新手
4,570 次查看

Hi Vidya,

This is my current code:

void init_velocity(queue q, const int m, const int n, double psi[DOMAIN_SIZE],
                                double u[DOMAIN_SIZE], double v[DOMAIN_SIZE], double p[DOMAIN_SIZE]) {
...
// DOMAIN_SIZE = (m+2)*(n+2)
auto R = range<1>{DOMAIN_SIZE};
buffer<double, 1> u_buf(u, R), v_buf(v, R), p_buf(p, R), psi_buf(psi, R);

q.submit([&](handler &h) {

     auto psi = psi_buf.get_access(h, read_only);
     auto u = u_buf.get_access(h, write_only);
     auto v = v_buf.get_access(h, write_only);
     auto p = p_buf.get_access(h, write_only);

     h.parallel_for(R, [=](auto ij) {
         int j = ij%(n+2);
         int i = (int) (ij - j)/(n+2);

        int ijm1 = ij-1;
        int im1j= ij-(n+2);

        if (i==0 || j==0 || i == m+1 || j== n+1) {}
       else {
       u[ij] = -(psi[ij] - psi[ijm1]) / dy;
       v[ij] = (psi[ij] - psi[im1j]) / dx;
      p[ij] = pcf * (cos(2. * (i-1) * di) + cos(2. * (j-1) * dj)) + 50000.;
    }
  });
});
}
 
 
To skip the exterior nodes, I have used `if (i==0 || j==0 || i == m+1 || j==n+1) {}`.  I might also care about some random interior nodes as well!
 
Please let me know if I need to explain more.
 
Thanks,
Leila
0 项奖励
leilag
新手
4,537 次查看

Any comments/thoughts?

0 项奖励
VidyalathaB_Intel
主持人
4,520 次查看

Hi,

>>  I might also care about some random interior nodes as well

Could you please give us some more details regarding the above statement, which might help us in sorting out the issue.

Regards,

Vidya.


0 项奖励
leilag
新手
4,513 次查看

Hello,

 

Thank you for getting back to me.

The following figure is an example of a physical problem which is similar to what I care to solve.

The exterior nodes (red and blue) are the ones that I want to skip. This is what I am trying to do in the code above. ie. is there a way to exclude some items in a parallel_for() operation?

 

Honestly, this is my main concern right now. I am wondering if there are other ways to do this with DPC++ or should I just make my peace with this naive approach.

 

A random interior node that I mentioned above could be something in the middle (is labeled by "Node m,n") that I might want to skip or treat it differently. But, if I can find an answer to my main question (skipping the exterior nodes), this should be easy to figure out.

 

leilag_0-1625577933068.png

 

Thank you for helping me with this and please let me know if I need to explain more.

 

Leila

0 项奖励
leilag
新手
4,506 次查看

Hi again,

 

Another example in C would be 

for (int i=2; i<global_size-1; i++) {}

instead of 

for (int i=0; i<global_size; i++) {}

ie. could we limit the scope of parallel_for() or even increase the stride?

 

Thanks,

Leila

0 项奖励
VidyalathaB_Intel
主持人
4,461 次查看

Hi,

Thanks for your patience.

We are looking into this issue internally. we will get back to you soon.

Regards,

Vidya.


Sravani_K_Intel
主持人
4,371 次查看

Hi Leila,


The exterior nodes can be skipped by using 2 dimensional nd-range (https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_nd_range_kernels) and adjusting the range accordingly. For random interior nodes, what you are doing might be the best possible option.




0 项奖励
leilag
新手
4,347 次查看

Hello Sravani,

 

Thanks for your answer!

I did try 2d range in the beginning which was ideally what I was hoping to do but there was a compiler warning which dissuaded me!

// 2D range

#include <vector>
#include <chrono>
#include <fstream>
#include <iostream>
#include <cmath>
#include "dpc_common.hpp"
#include <limits>
#include <CL/sycl.hpp>
#if FPGA || FPGA_EMULATOR
#include <CL/sycl/INTEL/fpga_extensions.hpp>
#endif

using namespace std;
using namespace sycl;

#define M 4
#define N 5
#define DOMAIN_SIZE (M+2)*(N+2)

int main () {
    
    //int dim = 2;
    auto R = range<2>{M+2, N+2};
    
    double u[DOMAIN_SIZE];
    buffer<double, 2> u_buf(u, R);
    
    for (int i=0;i<DOMAIN_SIZE; i++) u[i] = 0.;
    
    default_selector d_selector;
    queue q(d_selector);
    
    
    
    q.submit([&](handler &h) {
        
        auto u = u_buf.get_access(h, write_only);
        
        h.parallel_for(R, [=](auto index) {
            
            u[index[0]][index[1]] =  index[0] + index[1];
            
        }); 
    });
    
    host_accessor u_read(u_buf, read_only);
    for (int i=0; i<M+2; i++)
        for (int j=0; j<N+2; j++)
            std::cout << "u[" << i << "][" << j << "] = " << u_read[i][j] << std::endl;
    
    return(0);
}

And here is the warning:

In file included from t002.cpp:6:
In file included from /glob/development-tools/versions/oneapi/2021.2/inteloneapi/dev-utilities/2021.2.0/include/dpc_common.hpp:15:
In file included from /glob/development-tools/versions/oneapi/2021.2/inteloneapi/compiler/2021.2.0/linux/bin/../include/sycl/CL/sycl.hpp:11:
In file included from /glob/development-tools/versions/oneapi/2021.2/inteloneapi/compiler/2021.2.0/linux/bin/../include/sycl/CL/sycl/ONEAPI/atomic.hpp:11:
In file included from /glob/development-tools/versions/oneapi/2021.2/inteloneapi/compiler/2021.2.0/linux/bin/../include/sycl/CL/sycl/ONEAPI/atomic_accessor.hpp:14:
/glob/development-tools/versions/oneapi/2021.2/inteloneapi/compiler/2021.2.0/linux/bin/../include/sycl/CL/sycl/accessor.hpp:883:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    for (int I = 0; I < AdjustedDim; ++I) {
    ^
1 warning generated.

Is there a reason why dpcpp cannot optimize a 1d array with a 2d range?

 

Thanks!

Leila

0 项奖励
Sravani_K_Intel
主持人
4,299 次查看

Hi,


Sorry for the delay in my response.

I tried this code with the latest internal compiler build and no longer see the warning. I am seeking clarification on this from the compiler team and will get back to you once I hear back.


0 项奖励
Sravani_K_Intel
主持人
4,293 次查看

Hi Leila,


Compiler engineers confirmed that the warning was emitted by mistake and the loop is being successfully unrolled. The false warning issue is now fixed and you will no longer see it in the upcoming release of the compiler.


0 项奖励
leilag
新手
4,236 次查看

Hello Sravani,

 

Thanks for checking that for me.

 

Best,

Leila

0 项奖励
回复