Re: Ordering of channel operations - Page 2

tde_m · ‎06-18-2019

Hello,

I'm currently struggling with enforcing the order of write/read to channels.

I know, from the Intel programming guide, that two independent channels operation can be re-ordered by the compiler to generate efficient hardware.

However, this seems to occur even if there is a clear dependence. I've created a minimal working example for this:

#pragma OPENCL EXTENSION cl_intel_channels : enable
 
//message
typedef struct{
	bool request;
	int data;
}message_t;
 
//represents the status of the computation
typedef struct{
	bool start;
	message_t m;
}computation_t;
 
channel message_t channels[2] __attribute__((depth(2)));
 
// Auxiliary function for receiving data
void receive(computation_t *status, int *data){
    if(status->start){
        //at the beginning send the request for data
        write_channel_intel(channels[0],status->m);
        status_>start=false;
    }
    //receive the data and store it
    status->m=read_channel_intel(channels[1]);
    *data=status->m.data;
}
 
__kernel void comp(const int N, const int start, __global int *mem){
    int data;
    computation_t status;
    status.start=true;
    status.m.data=N;
    for(int i=0;i<N;i++)
    {
        //receive data, increment and store it to memory
        receive(&status,&data);
        data++;
        mem[i]=data;
    }
 
}
 
 
//generates a stream of data upon request
__kernel void generator(){
	//receive the request
	message_t m=read_channel_intel(channels[0]);
	for(int i=0;i<m.data;i++)
	{
            message_t send;
            send.data=i;
            send.request=false;
            write_channel_intel(channels[1],send);
	}
}

The "comp" kernel is characterized from a pipelined loop in which it receives data coming from the "generator" kernel using the "receive" function. At the first iteration, a request is sent to the generator in order to let it generate the right amount of data.

If I try to compile this, the channel operations of the "receive" function are re-oderdered, as can be seen from the report:

This occurs even if there is a clear dependency between the two.

Clearly, if in hardware it is first executed the read, this will lead to deadlock.

This happens with Quartus 18.1 and 19.1 (Stratix 10 as target board).

In you opinion, is it a compiler bug or I have to handle this in a different way?

Thanks

HRZ · ‎06-22-2019

Regarding the channel reordering, I think I now understand that the compiler always detaches channel operations from other read/write operations and uses extra registers (register renaming?) to handle dependencies such as the one discussed here which makes sense. Hence, it this case, if a cycle of channels did not exist, the channel operations in the "receive" kernel would still have been reordered, but no data corruption would have happened because the dependency is handled using extra registers. However, due to the cycle of channels and the channel reordering, a deadlock happens at run-time unless channel ordering is enforced using mem_fence.

Still, since I also thought all this time that channel reordering will not happen when data dependencies are involved, I would say the relationship between channel ordering and data dependencies could be very confusing for people who do not come across this thread and it is probably best if it is explained somewhere in the documentation.

Douglas_P_Intel · ‎06-24-2019

I agree that the documentation needs to be clarified regarding channel ordering. I will request clarification in the documents.