Re: Could the global variable shared between two kernels in the same aocx file?

Altera_Forum · ‎12-05-2013

Say, in the same aocx file, one kernel (the host call it first) writes to the global variable, while the other kernel (the host call it second) will read from it. is that that possible?

Altera_Forum · ‎12-05-2013

If by global you mean file scope variables the answer is yes but there would be no syncronization between the kernels accessing that shared data. If the kernels operate sequentially then there would be no properly but if they operate concurrently then you run the risk of a data hazard.

I could write more but if you described what you are trying to achieve I can probably describe a safer method to share data amungst kernels.

Altera_Forum · ‎12-05-2013

Correct, I mean the file scope global variables. Do they stay in DDR? I want to share a lot of intermediate results between two kernels, which are executed sequentially. So, the shared data will be safe. But I do not want to share the the intermediate results via CPU. Another question: if the two kernels access simultaneously, do you plan to use atomic operations to share data?

--- Quote Start ---

If by global you mean file scope variables the answer is yes but there would be no syncronization between the kernels accessing that shared data. If the kernels operate sequentially then there would be no properly but if they operate concurrently then you run the risk of a data hazard.

I could write more but if you described what you are trying to achieve I can probably describe a safer method to share data amungst kernels.

--- Quote End ---

Altera_Forum · ‎12-05-2013

You might want to put the data into a named section and get the linker script(s) to assign it to a fixed address.

You'll need to worry about cache coherenceny.

Also there are no atomic RMW cycles on the Avalon bus, so you need to use something else to ensure correct sequencing of accesses.

Altera_Forum · ‎12-05-2013

I think the linker script is only needed when several files are involved. Even a named section is created? how to guarantee the host does not use the section when calling device malloc function?

--- Quote Start ---

You might want to put the data into a named section and get the linker script(s) to assign it to a fixed address.

--- Quote End ---

According to the Altera SDK for OpenCL Programming Guide", the atomic operation is supported, is it true?

--- Quote Start ---

Also there are no atomic RMW cycles on the Avalon bus, so you need to use something else to ensure correct sequencing of accesses.

--- Quote End ---

If the kernel will write to/read from the DDR directly and sequentially, the coherence will preserve?

--- Quote Start ---

You'll need to worry about cache coherenceny.

--- Quote End ---

Altera_Forum · ‎12-05-2013

A __global memory that is passed to both kernels can achieve this. Lets say I wanted to pass the results from "Kernel A" to "Kernel B" then I would just allocate a __global buffer using clCreateBuffer and just pass the buffer to both kernels. Kernel A writes to this buffer and Kernel B reads from it. Since they both use the same buffer there is no need to copy the data up to the host and then back down to the target. By passing the buffer I mean sending it as an argument to both kernels, you would not need to enqueue a buffer data movement because you are just using it as a scratch pad between kernels.

If you add software pipelining into the mix so that you can operate both kernels concurrently then you would want to have multiple buffers and manage the scheduling on the host side so that you don't have Kernel B trying to read from the same buffer that Kernel A is still writing to.

Altera_Forum · ‎12-08-2013

Thanks.

Can we pass the results from "Kernel A" to "Kernel B" via a inner buffer in the FPGA?

Can the private memory (implemented using FPGA registers) be shared by all work-items in the work-group?

Altera_Forum · ‎12-13-2013

Sorry about the late response. Passing data like that is outside of the spec; however, in 13.1 channels were introduced. This allows one kernel to push data into a channel and another kernel can pull the data out of the channel. If you go to the design example page you'll find examples of using this feature if you search for "kernel-to-kernel channels": http://www.altera.com/support/examples/opencl/opencl.html

Many of these examples include single work-item kernels to implement the algorithms more efficiently on an FPGA. Normally when I think of using kernel-to-kernel channels I normally start thinking about using single work-item kernels as well. The reason why is that with the typical NDRange programming model work-items can be executed in any order, as a result data would be pushed/popped in the channel in that same unknown order. Using single work-item kernels you can control the execution order because only one work-item is in flight and the parallelism occurs at a loop pipelining level instead.

That was probably a lot to chew on so I recommend taking a look at examples like FIR or FFT to learn more.

Altera_Forum · ‎04-15-2014

@BadOmen: Is there any official documentation on channels yet?

Altera_Forum · ‎04-15-2014

I just checked and it doesn't look like the official docs have it covered yet. These slides have some information but typically I just look at an example when I need to remember how to use the API: http://www.altera.com/support/examples/download/exm_opencl_asian_option_opencl_fpga.pdf

The API is fairly easy to use, you create a channel and there are functions for writing data into the channel and read data from the channel. The main thing to remember when using channels is that you can only read or write to a channel from one location in your code. For example you can't do something like this:

if(x > 5)

write_channel_altera(MY_CHANNEL, x);

else

write_channel_altera(MY_CHANNEL, (x + 5));

Instead you would write the code like this to ensure there is only one location writing to the channel:

if (x > 5)

temp_var = x;

else

temp_var = x + 5;

write_channel_altera(MY_CHANNEL, temp_var);