Error using an Opencl Precompiled kernel

tamara_s_ · ‎10-09-2017

I am attempting to use TBB and an OpenCL enabled FPGA. However, I am running into the following error:
"Error: Global work offset has been specified. That is an OpenCL 1.1 feature not supported by this version of the Intel(R) FPGA SDK for OpenCL(TM)
Failed to enqueue a kernel; error code: -56
TBB Warning: Exact exception propagation is requested by application but the linked library is built without support for it
terminate called after throwing an instance of 'tbb::captured_exception'
what(): Unidentified exception
Aborted (core dumped)"

I'm not sure what could be throwing this error. Any suggestions would be greatly appreciated. Thanks. I am using version TBB 2017 update 2.

TBB is able to detect the emulated FPGA as an OpenCL device but there is an error when attempting the call the kernel k1. The kernel simply reads in data from from an opencl buffer and prints to screen. The .cl script that is used to generate the the precompiled .aocx file is:

__kernel void k1(__global int *temp_int){
    const int index = get_global_id(0);
    printf("output = %d\n",temp_int[index]);
}

The main.cpp file is:

#define TBB_PREVIEW_FLOW_GRAPH_NODES 1
#define TBB_PREVIEW_FLOW_GRAPH_FEATURES 1
#include "tbb/tbb_config.h"
#include <cstdio>
#include "../common/utility.h"
#if __TBB_PREVIEW_OPENCL_NODE
#if _MSC_VER
#pragma warning(disable : 4503)
#endif

#include "CL/opencl.h"
//#include "tbb/flow_graph.h"
#include "tbb/flow_graph_opencl_node.h"
#include "tbb/task_scheduler_init.h"
#include "tbb/tick_count.h"
#include "tbb/tbb_thread.h"

#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#if (__INTEL_COMPILER && __INTEL_COMPILER <=1500) || __clang__
#define BROKEN_FUNCTION_POINTER_DEDUCTION(...) __VA_ARGS__()
#else
#define BROKEN_FUNCTION_POINTER_DEDUCTION(...) __VA_ARGS__
#endif

#include "AOCLUtils/aocl_utils.h"
#include <numeric>

using namespace tbb::flow;

int main(int argc, char *argv[]) {

    /*
     * Declare Graph
     */
    opencl_graph g;
    printf("\n**Quering Devices using TBB code**\n");   
    tbb::flow::interface9::opencl_device_list tbb_devices = g.available_devices(); // -std=c++11 must be called in CXX of makefile
    std::cout << "Number of Opencl devices found using TBB code = " << tbb_devices.size() << std::endl;

    tbb::flow::interface9::opencl_device_list::const_iterator it = std::find_if(tbb_devices.cbegin(), tbb_devices.cend(), [](const tbb::flow::interface9::opencl_device &d){
            cl_device_type type;
            d.info(CL_DEVICE_TYPE,type);
            return CL_DEVICE_TYPE_ALL == type;
        });

    for(tbb::flow::interface9::opencl_device d: tbb_devices){
        std::cout << "Device: " << d.name() << std::endl;
        std::cout << " Platform: " << d.platform_name() << std:: endl;
    }
    opencl_device_list chosenDevices;
    // if Only one device selected, use the first by default
    chosenDevices.add(*tbb_devices.cbegin());

    // Initialize Factory
    g.opencl_factory().init({ *tbb_devices.cbegin()});

    /*
     * Declare precompiled file to use
     */
    // EMULATOR BUILD
    opencl_program<> p(g,opencl_program_type::PRECOMPILED,"fpga_emulator.aocx");

    /*
     * Nodes
     */
    typedef opencl_buffer<int> buffer_t; //buffer tuple

    opencl_node< tuple<buffer_t>> k1(g,p.get_kernel("k1"));
    k1.set_range({BROKEN_FUNCTION_POINTER_DEDUCTION({1})}); // reference github test_opencl_node.cpp
    
    /*
     * Graph Configuration
     */
    split_node<tuple<buffer_t>> s(g);
    make_edge(output_port<0>(s),input_port<0>(k1));

    /*
       * Initialize Variables
     */
    buffer_t b(g,10);
    for(int i=0; i< 10; i++){
        b = 1;
    }

    /*
     * Start the graph
     */
    input_port<0>(k1).try_put(b);
    g.wait_for_all();

    return 0;
}

#endif /* __TBB_FLOW_GRAPH_CPP11_FEATURES */

Nikita_P_Intel · ‎10-10-2017

Hi!

As for your error, according to OpenCL 1.0 specification: "global_work_offset must currently be a NULL value". You can try to set this parameter to NULL in clEnqueueNDRangeKernel inside "flow_graph_opencl_node.h" file and see what you get. But I suppose, that will be not enough.

Currently, opencl_node is not supporting execution on FPGA. We did not decide possible usage model for that and API design. So, could you please describe your experience and scenario in which opencl_node and Flow Graph are used with FPGA? It would really help us.

Thanks,

Nikita

tamara_s_ · ‎10-10-2017

Thanks Nikita. I changed the global_work_offset to NULL and it looks like the code is hanging now. But it is not giving me an error like before. I'll keep working around in the flow_graph_opencl_node.cl to see if I can get it working now that I know where to look.

We are working on a problem that works better in a pipeline, which is why we prefer the fpga.