OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1719 Discussions

structs and compiler optimizations

Richard_S_7
Beginner
436 Views

Hello,

I have a question concerning the usage of structs. My current kernel accesses two buffers using a struct in the following way:

struct pair {
    float first;
    float second;
};

inline const float f(const struct pair param) {
    return param.first * param.second;
}

inline const struct pair access_func(__global float const * const a, __global float const * const b, const int i) {
    struct pair res = {
            a,
            b
    };
    return res;
}

// slow
__kernel ...(__global float const * const a, __global float const * const b)
{
 // ...
 
 x = f( access_func( a, b, i ) );
 
 // ...
}

When I alter the kernel in the following way it runs much faster:

// fast
__kernel ...(__global float const * const a, __global float const * const b)
{
 // ...
 
 x = a * b[ i ];
 
 // ...
}

Is there a way to let the compiler do this optimization? The NVIDIA compiler seems to be able to do this, since I don't see a difference in runtime on a GPU.

Thanks in advance!

0 Kudos
2 Replies
Jeffrey_M_Intel1
Employee
436 Views

As I'm understanding your code, the issue seems to be at least partially about the access function.  Could we summarize your request as that you are looking for better inlining of address calculations instead of executing access_func for each work item?

 

 

0 Kudos
Richard_S_7
Beginner
436 Views

Jeffrey M. (Intel) wrote:

Could we summarize your request as that you are looking for better inlining of address calculations instead of executing access_func for each work item?

Yes that is correct. The code has to be written this way, because it is generated automatically. As I mentioned in my first post, the NVIDIA compiler is able to do the optimization. Maybe the optimization can be supported by additional keywords?

0 Kudos
Reply