Intel® oneAPI Base Toolkit
Support for the core tools and libraries within the base toolkit that are used to build and deploy high-performance data-centric applications.

range<2> in DPC++

nnain1
New Contributor I
1,523 Views

Can I use range<2> for variable declaration like

range<2> optimalLocalSize;

In my code I have compile error as call to deleted constructor of range<2>.

I am compiling the following code.

 

 

#include <CL/sycl.hpp>

#include <cmath>
#include <iostream>

  /* These public-domain headers implement useful image reading and writing
   * functions. */
#ifdef _MSC_VER
typedef unsigned int uint;
#endif
#include "stb/stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb/stb_image_write.h"

class fillGaussian;
class GaussianKernel;

using namespace cl::sycl;
/* It is possible to refer to the enum name in these using statements, used
 * here to make referencing the members more convenient (for example). */
using co = cl::sycl::image_channel_order;
using ct = cl::sycl::image_channel_type;

/* Attempts to determine a good local size. The OpenCL implementation can
 * do the same, but the best way to *control* performance is to choose the
 * sizes. The method here is to choose the largest number, leq 64, which is
 * a power-of-two, and divides the global work size evenly. In this code,
 * it might prove most optimal to pad the image along one dimension so that
 * the local size could be 64, but this introduces other complexities. */
range<2> get_optimal_local_range(cl::sycl::range<2> globalSize,
    cl::sycl::device d) {
    range<2> optimalLocalSize;
    /* 64 is a good local size on GPU-like devices, as each compute unit is
     * made of many smaller processors. On non-GPU devices, 4 is a common vector
     * width. */
    if (d.is_gpu()) {
        optimalLocalSize = range<2>(64, 1);
    }
    else {
        optimalLocalSize = range<2>(4, 1);
    }
    /* Here, for each dimension, we make sure that it divides the global size
     * evenly. If it doesn't, we try the next lowest power of two. Eventually
     * it will reach one, if the global size has no power of two component. */
    for (int i = 0; i < 2; ++i) {
        while (globalSize % optimalLocalSize) {
            optimalLocalSize = optimalLocalSize >> 1;
        }
    }
    return optimalLocalSize;
}

int main(int argc, char* argv[]) {
    /* The image dimensions will be set by the library, as will the number of
     * channels. However, passing a number of channels will force the image
     * data to be returned in that format, regardless of what the original image
     * looked like. The header has a mapping from int values to types - 4 means
     * RGBA. */
    int inputWidth, inputHeight, inputChannels;
    /* The data is returned as an unsigned char *, but due to OpenCL
     * restrictions, we must use it as a void *. Data is deallocated on program
     * exit. */
    const int numChannels = 4;
    void* inputData = nullptr;
    void* outputData = nullptr;

    if (argc < 2) {
        std::cout
            << "Please provide a JPEG or PNG image as an argument to this program."
            << std::endl;
    }

    inputData = stbi_load(argv[1], &inputWidth, &inputHeight, &inputChannels,
        numChannels);
    if (inputData == nullptr) {
        std::cout << "Failed to load image file (is argv[1] a valid image file?)"
            << std::endl;
        return 1;
    }
    outputData = new char[inputWidth * inputHeight * numChannels];

    const float pi = std::atan(1) * 4;
    static constexpr auto stddev = 2;

    /* This range represents the full amount of work to be done across the
     * image. We dispatch one thread per pixel. */
    range<2> imgRange(inputHeight, inputWidth);
    /* This is the range representing the size of the blur. */
    range<2> gaussianRange(6 * stddev, 6 * stddev);
    queue myQueue([](cl::sycl::exception_list l) {
        for (auto ep : l) {
            try {
                std::rethrow_exception(ep);
            }
            catch (const cl::sycl::exception& e) {
                std::cout << "Async exception caught:\n" << e.what() << "\n";
                throw;
            }
        }
        });

    {
        buffer<float, 2> gaussian(gaussianRange);
        myQueue.submit([&](cl::sycl::handler& cgh) {
            auto globalGaussian =
                gaussian.get_access<access::mode::discard_write>(cgh);
            cgh.parallel_for<fillGaussian>(gaussianRange, [=](cl::sycl::item<2> i) {
                auto x = i[0] - 3 * stddev, y = i[1] - 3 * stddev;
                auto elem =
                    cl::sycl::exp(-1.f * (x * x + y * y) / (2 * stddev * stddev)) /
                    (2 * pi * stddev * stddev);
                globalGaussian = elem;
                });
            });

        /* Images need a void * pointing to the data, and enums describing the
         * type of the image (since a void * carries no type information). It
         * also needs a range which describes the image's dimensions. */
        image<2> image_in(inputData, co::rgba, ct::unorm_int8, imgRange);
        image<2> image_out(outputData, co::rgba, ct::unorm_int8, imgRange);

        myQueue.submit([&](handler& cgh) {
            /* The nd_range contains the total work (as mentioned previously) as
             * well as the local work size (i.e. the number of threads in the local
             * group). Here, we attempt to find a range close to the device's
             * preferred size that also divides the global size neatly. */
            auto r = get_optimal_local_range(imgRange, myQueue.get_device());
            auto myRange = nd_range<2>(imgRange, r);
            /* Images still require accessors, like buffers, except the target is
             * always access::target::image. */
            accessor<float4, 2, access::mode::read, access::target::image> inPtr(
                image_in, cgh);
            accessor<float4, 2, access::mode::write, access::target::image> outPtr(
                image_out, cgh);
            auto globalGaussian = gaussian.get_access<access::mode::read>(cgh);
            /* The sampler is used to map user-provided co-ordinates to pixels in
             * the image. */
            sampler smpl(coordinate_normalization_mode::unnormalized,
                addressing_mode::clamp, filtering_mode::nearest);

            cgh.parallel_for<GaussianKernel>(myRange, [=](nd_item<2> itemID) {
                float4 newPixel = float4(0.0f, 0.0f, 0.0f, 0.0f);
                constexpr auto offset = 3 * stddev;

                for (int x = -offset; x < offset; x++) {
                    for (int y = -offset; y < offset; y++) {
                        auto inputCoords =
                            int2(itemID.get_global_id(1) + x, itemID.get_global_id(0) + y);
                        newPixel += inPtr.read(inputCoords, smpl) *
                            globalGaussian[y + offset][x + offset];
                    }
                }

                auto outputCoords =
                    int2(itemID.get_global_id(1), itemID.get_global_id(0));
                newPixel.w() = 1.f;
                outPtr.write(outputCoords, newPixel);
                });
            });
        myQueue.wait_and_throw();
    }

    /* Attempt to change the name from x.png or x.jpg to x-blurred.png and so
     * on. If the code cannot find a '.', it simply appends "-blurred" to the
     * name. */
    std::string outputFilePath;
    std::string inputName(argv[1]);
    auto pos = inputName.find_last_of(".");
    if (pos == std::string::npos) {
        outputFilePath = inputName + "-blurred";
    }
    else {
        std::string ext = inputName.substr(pos, inputName.size() - pos);
        inputName.erase(pos, inputName.size());
        outputFilePath = inputName + "-blurred" + ext;
    }

    stbi_write_png(outputFilePath.c_str(), inputWidth, inputHeight, numChannels,
        outputData, 0);

    std::cout << "Image successfully blurred!\n";
    return 0;
}

 

 

0 Kudos
1 Solution
RahulV_intel
Moderator
1,523 Views

Hi Nyan,

A small correction in your code. You have declared range<2> optimalLocalSize, but you haven't initialized it. Hence, as a result range<2> delete constructor is being called. 

Correction should be as follows:

range<2> optimalLocalSize(0,0); //Instead of range<2> optimalLocalSize; on line number 32

Attaching modified code for your reference. Let us know if you are able to compile/run it.

 

--Rahul

View solution in original post

0 Kudos
5 Replies
RahulV_intel
Moderator
1,524 Views

Hi Nyan,

A small correction in your code. You have declared range<2> optimalLocalSize, but you haven't initialized it. Hence, as a result range<2> delete constructor is being called. 

Correction should be as follows:

range<2> optimalLocalSize(0,0); //Instead of range<2> optimalLocalSize; on line number 32

Attaching modified code for your reference. Let us know if you are able to compile/run it.

 

--Rahul

0 Kudos
nnain1
New Contributor I
1,522 Views

Thanks a lot.

0 Kudos
RahulV_intel
Moderator
1,522 Views

Hi Nyan,

Let us know if we can close this thread.

 

--Rahul

0 Kudos
nnain1
New Contributor I
1,522 Views

Yes please. thank you.

0 Kudos
RahulV_intel
Moderator
1,522 Views

Thanks for the confirmation, Nyan. 

 

0 Kudos
Reply