I am new on this forum. I start to design a C langage software based on CPU+GPU programming. The idea is to offload the processing on GPU when the tasks become heavy for some part of the software. I read the Gen8 paper talking about CPU+GPU shared ring memory. I look for information on how to program an application to use theses features. Could you recommend me some API, debugging tools, cache layers (L1, L2, L3) tools to visualize, investigate how the software behave on theses layers.
I currently use a MacBook Air
Device Intel(R) Core(TM) i5-4260U CPU @ 1.40GHz supports OpenCL 1.2