OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Counting algorithm

Polar01
Beginner
123 Views

Hi,

I have the following situations :

a) an array of 'int' where each int contains a 'flag' (a value defined by a bit)

b) the set is sorted by 'flag'

I use the following kernel to count the number of 'type of flag'. To do this I simply detect where the flag differ and put this value back to a structure.

Sometimes it works, sometimes not !

typedefstruct
{
uintlastId;
uinttasksCount_CreateCameraRays;
uinttasksCount_Trace;
uinttasksCount_Shade;
uinttasksCount_Shade_AO;
uinttasksCount_ImageReconstruction;
}clTasksTypes;

//ALSODEFINEDIN'Types.cl'
#defineTASK_HASH_CREATE_CAMERA_RAY((uint)1<<24)
#defineTASK_HASH_TRACE((uint)2<<24)
#defineTASK_HASH_SHADE((uint)4<<24)
#defineTASK_HASH_SHADE_AO((uint)8<<24)
#defineTASK_HASH_RECONSTRUCT_IMAGE((uint)16<<24)


//#defineIS_FLAG_SET(VAL,FLAG)(VAL&FLAG)
//#defineIS_FLAG_SET(VAL,FLAG)((VAL&FLAG)==FLAG)
#defineIS_FLAG_SET(VAL,FLAG)((VAL&FLAG)!=0)

__kernel
voidkernel__countTasksTypes(
__globaluint*indices,
__globalclTasksTypes*tasksTypes,
uinttasksCount
)
{
size_tgid=get_global_id(0);

if(gid<1)
{
tasksTypes->tasksCount_CreateCameraRays=0;
tasksTypes->tasksCount_Trace=0;
tasksTypes->tasksCount_Shade=0;
tasksTypes->tasksCount_Shade_AO=0;
tasksTypes->tasksCount_ImageReconstruction=0;
barrier(CLK_GLOBAL_MEM_FENCE);
return;
}

barrier(CLK_GLOBAL_MEM_FENCE);

if(gid>=tasksCount)
return;

//Savethelasthash/keyvalue
if(gid==tasksCount-1)
tasksTypes->lastId=indices[(tasksCount-1)*2];

constuinthashKey1=indices[(gid*2)-2];
constuinthashKey2=indices[gid*2];
if(hashKey1!=hashKey2)
{
if(IS_FLAG_SET(hashKey1,TASK_HASH_CREATE_CAMERA_RAY))
tasksTypes->tasksCount_CreateCameraRays=gid;
elseif(IS_FLAG_SET(hashKey1,TASK_HASH_TRACE))
tasksTypes->tasksCount_Trace=gid;
elseif(IS_FLAG_SET(hashKey1,TASK_HASH_SHADE))
tasksTypes->tasksCount_Shade=gid;
elseif(IS_FLAG_SET(hashKey1,TASK_HASH_SHADE_AO))
tasksTypes->tasksCount_Shade_AO=gid;
elseif(IS_FLAG_SET(hashKey1,TASK_HASH_RECONSTRUCT_IMAGE))
tasksTypes->tasksCount_ImageReconstruction=gid;
}
}

There is only one special case, handled on the host side, it is for the last type of flag.

constunsignedinttasksCount=_tasksBatchSize;
_kernel__CountTasksTypes->setArg(0,*_clBuffer_TasksIndices);
_kernel__CountTasksTypes->setArg(1,*_clBuffer_CountTasksTypes);
_kernel__CountTasksTypes->setArg(2,tasksCount);

_queue->enqueueNDRangeKernel(*_kernel__CountTasksTypes,cl::NullRange,cl::NDRange(globalWork),cl::NDRange(_workGroupSize_kernel__CountTasksTypes),0,0);

_queue->enqueueReadBuffer(*_clBuffer_CountTasksTypes,CL_TRUE,0,sizeof(clTasksTypes),&_cpuBuffer_TasksTypes);

//Computethelastcount
unsignedinttotalCount=_cpuBuffer_TasksTypes.tasksCount_CreateCameraRays+_cpuBuffer_TasksTypes.tasksCount_Trace+_cpuBuffer_TasksTypes.tasksCount_Shade+_cpuBuffer_TasksTypes.tasksCount_Shade_AO+_cpuBuffer_TasksTypes.tasksCount_ImageReconstruction;
if(_cpuBuffer_TasksTypes.lastId&TASK_HASH_CREATE_CAMERA_RAY)
_cpuBuffer_TasksTypes.tasksCount_CreateCameraRays=_tasksBatchSize-totalCount;
elseif(_cpuBuffer_TasksTypes.lastId&TASK_HASH_TRACE)
_cpuBuffer_TasksTypes.tasksCount_Trace=_tasksBatchSize-totalCount;
elseif(_cpuBuffer_TasksTypes.lastId&TASK_HASH_SHADE)
_cpuBuffer_TasksTypes.tasksCount_Shade=_tasksBatchSize-totalCount;
elseif(_cpuBuffer_TasksTypes.lastId&TASK_HASH_SHADE_AO)
_cpuBuffer_TasksTypes.tasksCount_Shade_AO=_tasksBatchSize-totalCount;
elseif(_cpuBuffer_TasksTypes.lastId&TASK_HASH_RECONSTRUCT_IMAGE)
_cpuBuffer_TasksTypes.tasksCount_ImageReconstruction=_tasksBatchSize-totalCount;

I can't find why it does not work, it should be simple no ?

Any suggestion is welcomed, thanks

0 Kudos
5 Replies
Doron_S_Intel
Employee
123 Views
Hey,

I don't know whether it relates to your problem, but your code is illegal. OpenCL requires that if any work item in a work group encounters a barrier instruction, all work items in that workgroup encounter the same barrier instruction. The first if statement (gid < 1) violates this assuming your work-group size is larger than one - other work items will still encounter "a" barrier, but not the same one.

Doron Singer
Polar01
Beginner
123 Views
Thanks,

no it is not the problem :P

About the barrier, it is the case because, either gid <1 and I have a barrier and return, either I have a barrier. So, finally all the workitems have a barrier.

Krys
Doron_S_Intel
Employee
123 Views
My understanding of the spec is that it's not enough that all work items encounter "a barrier", but rather they must all encounter "the barrier", as in the exact same instruction on the exact same line. Please re-visit the spec and let me know if you disagree.
Polar01
Beginner
123 Views
Honnestly,
I don't know... I have read the opencl spec. and it is not really clear ! If it is the case, I'm not sure it is doable on some GPU architecture...
Anyway it limit the way we build our algorithms ! I think that it is something that MUST be defined in the specification... they should update the spec to gives more details ! Maybe you have some contact with peoples at the Khronos group ?
Thx
Doron_S_Intel
Employee
123 Views
The Khronos group have message boards on their webpage. This looks like a good place to ask:
http://www.khronos.org/message_boards/viewforum.php?f=28
Reply