Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2471 Discussions

opencl_node (or streaming_node) with more outputs than inputs



Is there a way to make an opencl node (or custom streaming_node) which has more output ports than input ports.

I have tried, but I cannot seem to get the graph to execute, as it wants me to call try_put() on the output ports as well before executing.

I have this example, which doesn't work:

    graph g;

    gpu_device_selector gpu_selector;

    opencl_program<> program("");

    opencl_node< tuple<opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>> > myopenclnode(g, program.get_kernel("clCopy2"), gpu_selector);

    join_node < tuple<opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>>> join_node(g);

    function_node< tuple<opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>> > myOutputWriter(g, unlimited, [](const tuple<opencl_buffer<cl_uchar>, opencl_buffer<cl_uchar>>& input) {
        opencl_buffer<cl_uchar> buffer1 = std::get<0>(input);
        opencl_buffer<cl_uchar> buffer2 = std::get<1>(input);

        printf("'%s' '%s'\r\n",,;

    make_edge(output_port<1>(myopenclnode), input_port<0>(join_node));
    make_edge(output_port<2>(myopenclnode), input_port<1>(join_node));

    make_edge(join_node, myOutputWriter);

    const char str[] = "Hello world";
    opencl_buffer<cl_uchar> a(sizeof(str));
    std::copy_n(str, sizeof(str), a.begin());

    opencl_buffer<cl_uchar> b(sizeof(str));
    opencl_buffer<cl_uchar> c(sizeof(str));

    myopenclnode.set_args(port_ref<0>(), b, c);



The kernel just copies argument 1 to argument 2 and 3

However, the kernel is never executed in this example.

If I do a try_put() on inport_port<1> and <2>, it works fine.

0 Kudos
1 Reply

Hi Nikolaj,

Implementation of opencl_node waits for the input on each input port before starting execute the kernel.

Let's try to understand use case in a bit more detail. Since it copies the first argument to the second and the third, the memory for the last two arguments should also be provided somehow, right. Otherwise, from where the node "understands" where to copy the data coming from the first parameter? The call to "try_put" to all of its ports is actually the way to "tell" the node about all the memory necessary to execute its encapsulated kernel.

If you have the use case where the described logic does not apply please tell us the details so we can better understand it and discuss.

Regards, Aleksei

0 Kudos