structs and compiler optimizations

Richard_S_7 · ‎06-03-2017

Hello,

I have a question concerning the usage of structs. My current kernel accesses two buffers using a struct in the following way:

struct pair {
    float first;
    float second;
};

inline const float f(const struct pair param) {
    return param.first * param.second;
}

inline const struct pair access_func(__global float const * const a, __global float const * const b, const int i) {
    struct pair res = {
            a,
            b
    };
    return res;
}

// slow
__kernel ...(__global float const * const a, __global float const * const b)
{
 // ...
 
 x = f( access_func( a, b, i ) );
 
 // ...
}

When I alter the kernel in the following way it runs much faster:

// fast
__kernel ...(__global float const * const a, __global float const * const b)
{
 // ...
 
 x = a * b[ i ];
 
 // ...
}

Is there a way to let the compiler do this optimization? The NVIDIA compiler seems to be able to do this, since I don't see a difference in runtime on a GPU.

Thanks in advance!

Jeffrey_M_Intel1 · ‎06-04-2017

As I'm understanding your code, the issue seems to be at least partially about the access function. Could we summarize your request as that you are looking for better inlining of address calculations instead of executing access_func for each work item?

Richard_S_7 · ‎06-05-2017

Jeffrey M. (Intel) wrote:

Could we summarize your request as that you are looking for better inlining of address calculations instead of executing access_func for each work item?

Yes that is correct. The code has to be written this way, because it is generated automatically. As I mentioned in my first post, the NVIDIA compiler is able to do the optimization. Maybe the optimization can be supported by additional keywords?