- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm trying to write a basic example of a systolic array. It is essentially a chain of processing elements connected through channels: the first one read data from memory and inject it into the chain. Each of the internal PE read the data from the previous neighbor, increments it by one and forward to the following neighbor. At the end, data is written into memory.
The code is the following. I have used the num_compute_units to have hardware replication.
#pragma OPENCL EXTENSION cl_intel_channels : enable
#define N 16
channel float chan[N-1];
//reads 10 elements and inject into channel
void start_pe(__global const float *A){
for(int i=0;i<10;i++)
write_channel_intel(chan[0],A[0]);
}
//read the element, increments and then pass it to the next one
void generic_pe(const int idx){
for(int i=0;i<10;i++){
float el=read_channel_intel(chan[idx-1]);
el++;
write_channel_intel(chan[idx],el);
}
}
//receives the elements and write them in memory
void end_pe(__global float * B){
for(int i=0;i<10;i++) {
float el=read_channel_intel(chan[N-2]);
B[i]=el;
}
}
__attribute__((max_work_group_size(64,1,1)))
__attribute__((num_compute_units(N)))
__kernel void chain(__global const float * restrict A,__global float * restrict B)
{
const size_t gid = get_global_id(0);
if(gid==0)
start_pe(A);
else
if(gid==N-1)
end_pe(B);
else
generic_pe(gid);
}
Each PEs, loops over 10 elements.
With Quartus 18.0, emulation is ok, but when I try to compile for the arria10, the compilations stops immediately by stating:
Compiler Error: Indexing into channel array chan could not be resolved to all constants
and indicate the line 16.
It looks strange to me, since the index is indeed constant.
In the Programming Guide it is written that "channels extension does not support dynamic indexing into arrays of channel IDs" and to use static indexes.
However, in the same guide (under "Using Channels with Kernel Copies"), it seems that channel arrays indexed with the return value of global id are valid.
Do you have any idea of the source of the problem?
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You seem to be confusing replication of NDRange kernels with Single Work-item ones. The former is completely controlled by the compiler and you CANNOT use it to design a systolic array since you cannot customize the kernel copies. What the guide is referring to is replication of Single Work-item kernels where you can then use the get_compute_id() function to get a compile-time constant ID for each kernel copy and use it to customize each copy and its channel connections. In your case the get_global_id() returns a run-time variable value which cannot be used as channel index.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, thanks. I should use the get_compute_id!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page