OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1719 Discussions

Bizarre constant memory access for structure in Intel GPU

Khaled__Mahmoud
Beginner
456 Views

Hi all,

I have a strange behavior when trying to access a memory location in __constant memory space represented by an array of structs.
I separated the case as minimal C++/host and OpenCL/kernel codes and attached them with the post.
However, let me give you some insights:
I have an OpenCL kernel with the following struct:

typedef struct __attribute__((packed)) buffer_1_struct {
	uint	s2d1;
	uint	s2d2;
	ulong	s2d3;
	char	s2d4[2];
	char	s2d5[2];
	ulong	s2d6;
	ulong	s2d7;
} struct2_t;

From the host side, I create an array of this structure where each element is 36 bytes (packed) and pass it as buffer to the kernel. In the attached files, I create the array with two elements.
When i read the second array-element and try to access the struct-element (s2d3) at index (8) on the GPU, i get zero value. This how i access it usually:

((__constant struct2_t*)buffer02)[get_global_offset(2)].s2d3.

Where the problem is observed when get_global_offset(2) = 1.


However, when i access it by byte-based memory indicies, i manage to retrieve the data correctly in the GPU. Here how i access it:
 

*((__constant ulong*)(((__constant char*)buffer02)+36+8))

Surprisingly, both ways point to the same address and i cast them using the same address-pointer-type but when i view the values they are different.
Here is what happens as an OpenCL code snippet:

#define STRUCT_2_SIZE (sizeof(struct2_t))
#define STRUCT_2_s2d3_idx (2*sizeof(uint))
......
printf("z-offset=%d\n",get_global_offset(2));
printf("struct-2-size=%d\n",STRUCT_2_SIZE);

__constant ulong* adr1 = ((__constant ulong*)(((__constant char*)buffer02)+STRUCT_2_SIZE+STRUCT_2_s2d3_idx));
__constant ulong* adr2 = &((__constant struct2_t*)buffer02)[get_global_offset(2)].s2d3;

printf("adr1=%d\n",adr1);
printf("adr2=%d\n",adr2);
		
if(adr1 == adr2)
	printf("The two addresses are equal !\n");
else
	printf("The two addresses are diffierent !\n");	

printf("val1=%d\n",*adr1);
printf("val2=%d\n",*adr2);
		
		
if(*adr1 == *adr2)
	printf("The two values are equal !\n");
else
	printf("The two values are diffierent !\n");

 

The full code is attached and here is the output:

z-offset=1
struct-2-size=36
adr1=1770913836
adr2=1770913836
The two addresses are equal !
val1=5632
val2=0
The two values are different !

This happens with the following notes:

1- It happens only in GPU. if you check the attached code in CPU, it works fine.
2- This code is in a kernel function (func2) and the problem happens only when i call some other functions, with some sequence, before this. Check the attached code.
3- The attached code shows the minimal case. Removal of some code lines causes the problem to disappear.
4- I use SDK version 7.0.0.2511 running in Windows 10 and building with x64 OpenCL library. 5- My machine has an Intel Core i5 6200U CPU (with embedded Intel® HD Graphics 520 GPU).

 

I hope anyone from Intel can advise regarding this case or report it is a bug that will be resolved.

 

Remarks,

 

 

0 Kudos
1 Reply
Michael_C_Intel1
Moderator
456 Views

Hi MahmoudK,

Thanks for the write up. Wanted to dig into this a little bit since the original post was so detailed. Reproducer was observed on Windows 10 OS Intel Core i5-6300u Graphics...

Changing __constant to __global allowed the kernel to execute successfully... per OpenCL - C 2.0 spec... there are a maximum number of constant vars available... That number is accessible via macro: https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/constant.html

CL_DEVICE_MAX_CONSTANT_ARGS for clGetDeviceInfo

This is a spec difference between 1.2 and 2.0. I hope this leads into a workable and portable resolution for real code and helps developers modulate __constant usage.

 

Sidebar: __attribute__ ((packed))

For other developers viewing this topic please see the restriction against Bit Fields in OpenCL. Those looking to employ similar structs may run contrary to that restriction.

Workaround approaches tend to involve manual padding OR shipping across raw data, masking, and recomposition.

Recommendation: avoiding relying on undefined behavior outside of the specification... regardless of how consistent the behavior is!

 

-MichaelC.

 

0 Kudos
Reply