<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Debugging in DPC++ code in CPU in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162090#M239</link>
    <description>&lt;P&gt;I am exploring the following attached code and trying to debug using CPU in VS2009 environment with Windows 10.&lt;/P&gt;&lt;P&gt;Followd the instruction in this link &lt;A href="https://software.intel.com/en-us/get-started-with-debugging-dpcpp-windows"&gt;https://software.intel.com/en-us/get-started-with-debugging-dpcpp-windows.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;(1)Open the Registry Editor and set the data as&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="regedit.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10682i3976A0BCCF5B4918/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="regedit.png" alt="regedit.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(2)Start VS2009&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="vstd.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10683i2B069A7D113A108C/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="vstd.png" alt="vstd.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;(3)Settings at project properties&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="general.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10684i86FB8A0628493F9A/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="general.png" alt="general.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="debugging.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10685i6E417D64334BF1BA/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="debugging.png" alt="debugging.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;But break point set inside parallel_for loop is never hit.&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;cgh.parallel_for&amp;lt;fillGaussian&amp;gt;(gaussianRange, [=](cl::sycl::item&amp;lt;2&amp;gt; i) {
                auto x = i[0] - 3 * stddev, y = i[1] - 3 * stddev;
                auto elem = exp(-1.f * (x * x + y * y) / (2 * stddev * stddev)) / (2 * pi * stddev * stddev);
                globalGaussian&lt;I&gt; = elem;
            });&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;What could be the issue?&lt;/P&gt;
&lt;P&gt;The whole code is as follows. Then the program crush at above parallel_for loop. Why did the program crush at parallel_for loop?&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;#include &amp;lt;CL/sycl.hpp&amp;gt;
#include &amp;lt;cmath&amp;gt;
#include &amp;lt;iostream&amp;gt;
#ifdef _MSC_VER
typedef unsigned int uint;
#endif
#include "stb/stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb/stb_image_write.h"
class fillGaussian;
class GaussianKernel;
using namespace cl::sycl;
using namespace std;
/* It is possible to refer to the enum name in these using statements, used
 * here to make referencing the members more convenient (for example). */
using co = cl::sycl::image_channel_order;
using ct = cl::sycl::image_channel_type;
/* Attempts to determine a good local size. The OpenCL implementation can
 * do the same, but the best way to *control* performance is to choose the
 * sizes. The method here is to choose the largest number, leq 64, which is
 * a power-of-two, and divides the global work size evenly. In this code,
 * it might prove most optimal to pad the image along one dimension so that
 * the local size could be 64, but this introduces other complexities. */
range&amp;lt;2&amp;gt; get_optimal_local_range(cl::sycl::range&amp;lt;2&amp;gt; globalSize, cl::sycl::device d) {
    range&amp;lt;2&amp;gt; optimalLocalSize(0,0);
    /* 64 is a good local size on GPU-like devices, as each compute unit is
     * made of many smaller processors. On non-GPU devices, 4 is a common vector
     * width. */
    if (d.is_gpu()) {
        optimalLocalSize = range&amp;lt;2&amp;gt;(64, 1);
    }
    else {
        optimalLocalSize = range&amp;lt;2&amp;gt;(4, 1);
    }
    /* Here, for each dimension, we make sure that it divides the global size
     * evenly. If it doesn't, we try the next lowest power of two. Eventually
     * it will reach one, if the global size has no power of two component. */
    for (int i = 0; i &amp;lt; 2; ++i) {
        while (globalSize&lt;I&gt; % optimalLocalSize&lt;I&gt;) {
            optimalLocalSize&lt;I&gt; = optimalLocalSize&lt;I&gt; &amp;gt;&amp;gt; 1;
        }
    }
    return optimalLocalSize;
}

int main(int argc, char* argv[]) {
    /* The image dimensions will be set by the library, as will the number of
     * channels. However, passing a number of channels will force the image
     * data to be returned in that format, regardless of what the original image
     * looked like. The header has a mapping from int values to types - 4 means
     * RGBA. */
    int inputWidth, inputHeight, inputChannels;
    /* The data is returned as an unsigned char *, but due to OpenCL
     * restrictions, we must use it as a void *. Data is deallocated on program
     * exit. */
    const int numChannels = 4;
    void* inputData = nullptr;
    void* outputData = nullptr;

    /*if (argc &amp;lt; 2) {
        std::cout &amp;lt;&amp;lt; "Please provide a JPEG or PNG image as an argument to this program." &amp;lt;&amp;lt; std::endl;
    }*/

    inputData = stbi_load("SBLA3510014B.18128057.0.3_n.jpeg"/*argv[1]*/, &amp;amp;inputWidth, &amp;amp;inputHeight, &amp;amp;inputChannels, numChannels);
    if (inputData == nullptr) {
        std::cout &amp;lt;&amp;lt; "Failed to load image file (is argv[1] a valid image file?)" &amp;lt;&amp;lt; std::endl;
        return 1;
    }
    outputData = new char[inputWidth * inputHeight * numChannels];

    const float pi = atan(1) * 4;
    static constexpr auto stddev = 2;

    /* This range represents the full amount of work to be done across the
     * image. We dispatch one thread per pixel. */
    range&amp;lt;2&amp;gt; imgRange(inputHeight, inputWidth);
    /* This is the range representing the size of the blur. */
    range&amp;lt;2&amp;gt; gaussianRange(6 * stddev, 6 * stddev);
    queue myQueue([](cl::sycl::exception_list l) {
        for (auto ep : l) {
            try {
                std::rethrow_exception(ep);
            }
            catch (const cl::sycl::exception&amp;amp; e) {
                std::cout &amp;lt;&amp;lt; "Async exception caught:\n" &amp;lt;&amp;lt; e.what() &amp;lt;&amp;lt; "\n";
                throw;
            }
        }
    });

    {
        buffer&amp;lt;float, 2&amp;gt; gaussian(gaussianRange);
        myQueue.submit([&amp;amp;](cl::sycl::handler&amp;amp; cgh) {
            auto globalGaussian = gaussian.get_access&amp;lt;access::mode::discard_write&amp;gt;(cgh);
            cgh.parallel_for&amp;lt;fillGaussian&amp;gt;(gaussianRange, [=](cl::sycl::item&amp;lt;2&amp;gt; i) {
                auto x = i[0] - 3 * stddev, y = i[1] - 3 * stddev;
                auto elem = exp(-1.f * (x * x + y * y) / (2 * stddev * stddev)) / (2 * pi * stddev * stddev);
                globalGaussian&lt;I&gt; = elem;
            });
        });

        /* Images need a void * pointing to the data, and enums describing the
         * type of the image (since a void * carries no type information). It
         * also needs a range which describes the image's dimensions. */
        image&amp;lt;2&amp;gt; image_in(inputData, co::rgba, ct::unorm_int8, imgRange);
        image&amp;lt;2&amp;gt; image_out(outputData, co::rgba, ct::unorm_int8, imgRange);

        myQueue.submit([&amp;amp;](handler&amp;amp; cgh) {
            /* The nd_range contains the total work (as mentioned previously) as
             * well as the local work size (i.e. the number of threads in the local
             * group). Here, we attempt to find a range close to the device's
             * preferred size that also divides the global size neatly. */
            auto r = get_optimal_local_range(imgRange, myQueue.get_device());
            auto myRange = nd_range&amp;lt;2&amp;gt;(imgRange, r);
            /* Images still require accessors, like buffers, except the target is
             * always access::target::image. */
            accessor&amp;lt;float4, 2, access::mode::read, access::target::image&amp;gt; inPtr(image_in, cgh);
            accessor&amp;lt;float4, 2, access::mode::write, access::target::image&amp;gt; outPtr(image_out, cgh);
            auto globalGaussian = gaussian.get_access&amp;lt;access::mode::read&amp;gt;(cgh);
            /* The sampler is used to map user-provided co-ordinates to pixels in
             * the image. */
            sampler smpl(coordinate_normalization_mode::unnormalized, addressing_mode::clamp, filtering_mode::nearest);

            cgh.parallel_for&amp;lt;GaussianKernel&amp;gt;(myRange, [=](nd_item&amp;lt;2&amp;gt; itemID) {
                float4 newPixel = float4(0.0f, 0.0f, 0.0f, 0.0f);
                constexpr auto offset = 3 * stddev;

                for (int x = -offset; x &amp;lt; offset; x++) {
                    for (int y = -offset; y &amp;lt; offset; y++) {
                        auto inputCoords = int2(itemID.get_global_id(1) + x, itemID.get_global_id(0) + y);
                        newPixel += inPtr.read(inputCoords, smpl) * globalGaussian[y + offset][x + offset];
                    }
                }

                auto outputCoords = int2(itemID.get_global_id(1), itemID.get_global_id(0));
                newPixel.w() = 1.f;
                outPtr.write(outputCoords, newPixel);
            });
        });
        myQueue.wait_and_throw();
    }

    /* Attempt to change the name from x.png or x.jpg to x-blurred.png and so
     * on. If the code cannot find a '.', it simply appends "-blurred" to the
     * name. */
    string outputFilePath;
    string inputName(argv[1]);
    auto pos = inputName.find_last_of(".");
    if (pos == std::string::npos) {
        outputFilePath = inputName + "-blurred";
    }
    else {
        string ext = inputName.substr(pos, inputName.size() - pos);
        inputName.erase(pos, inputName.size());
        outputFilePath = inputName + "-blurred" + ext;
    }

    stbi_write_png(outputFilePath.c_str(), inputWidth, inputHeight, numChannels,
        outputData, 0);

    std::cout &amp;lt;&amp;lt; "Image successfully blurred!\n";
    return 0;
}
&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 14 Apr 2020 15:49:20 GMT</pubDate>
    <dc:creator>nnain1</dc:creator>
    <dc:date>2020-04-14T15:49:20Z</dc:date>
    <item>
      <title>Debugging in DPC++ code in CPU</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162090#M239</link>
      <description>&lt;P&gt;I am exploring the following attached code and trying to debug using CPU in VS2009 environment with Windows 10.&lt;/P&gt;&lt;P&gt;Followd the instruction in this link &lt;A href="https://software.intel.com/en-us/get-started-with-debugging-dpcpp-windows"&gt;https://software.intel.com/en-us/get-started-with-debugging-dpcpp-windows.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;(1)Open the Registry Editor and set the data as&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="regedit.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10682i3976A0BCCF5B4918/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="regedit.png" alt="regedit.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(2)Start VS2009&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="vstd.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10683i2B069A7D113A108C/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="vstd.png" alt="vstd.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;(3)Settings at project properties&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="general.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10684i86FB8A0628493F9A/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="general.png" alt="general.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="debugging.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10685i6E417D64334BF1BA/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="debugging.png" alt="debugging.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;But break point set inside parallel_for loop is never hit.&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;cgh.parallel_for&amp;lt;fillGaussian&amp;gt;(gaussianRange, [=](cl::sycl::item&amp;lt;2&amp;gt; i) {
                auto x = i[0] - 3 * stddev, y = i[1] - 3 * stddev;
                auto elem = exp(-1.f * (x * x + y * y) / (2 * stddev * stddev)) / (2 * pi * stddev * stddev);
                globalGaussian&lt;I&gt; = elem;
            });&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;What could be the issue?&lt;/P&gt;
&lt;P&gt;The whole code is as follows. Then the program crush at above parallel_for loop. Why did the program crush at parallel_for loop?&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;#include &amp;lt;CL/sycl.hpp&amp;gt;
#include &amp;lt;cmath&amp;gt;
#include &amp;lt;iostream&amp;gt;
#ifdef _MSC_VER
typedef unsigned int uint;
#endif
#include "stb/stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb/stb_image_write.h"
class fillGaussian;
class GaussianKernel;
using namespace cl::sycl;
using namespace std;
/* It is possible to refer to the enum name in these using statements, used
 * here to make referencing the members more convenient (for example). */
using co = cl::sycl::image_channel_order;
using ct = cl::sycl::image_channel_type;
/* Attempts to determine a good local size. The OpenCL implementation can
 * do the same, but the best way to *control* performance is to choose the
 * sizes. The method here is to choose the largest number, leq 64, which is
 * a power-of-two, and divides the global work size evenly. In this code,
 * it might prove most optimal to pad the image along one dimension so that
 * the local size could be 64, but this introduces other complexities. */
range&amp;lt;2&amp;gt; get_optimal_local_range(cl::sycl::range&amp;lt;2&amp;gt; globalSize, cl::sycl::device d) {
    range&amp;lt;2&amp;gt; optimalLocalSize(0,0);
    /* 64 is a good local size on GPU-like devices, as each compute unit is
     * made of many smaller processors. On non-GPU devices, 4 is a common vector
     * width. */
    if (d.is_gpu()) {
        optimalLocalSize = range&amp;lt;2&amp;gt;(64, 1);
    }
    else {
        optimalLocalSize = range&amp;lt;2&amp;gt;(4, 1);
    }
    /* Here, for each dimension, we make sure that it divides the global size
     * evenly. If it doesn't, we try the next lowest power of two. Eventually
     * it will reach one, if the global size has no power of two component. */
    for (int i = 0; i &amp;lt; 2; ++i) {
        while (globalSize&lt;I&gt; % optimalLocalSize&lt;I&gt;) {
            optimalLocalSize&lt;I&gt; = optimalLocalSize&lt;I&gt; &amp;gt;&amp;gt; 1;
        }
    }
    return optimalLocalSize;
}

int main(int argc, char* argv[]) {
    /* The image dimensions will be set by the library, as will the number of
     * channels. However, passing a number of channels will force the image
     * data to be returned in that format, regardless of what the original image
     * looked like. The header has a mapping from int values to types - 4 means
     * RGBA. */
    int inputWidth, inputHeight, inputChannels;
    /* The data is returned as an unsigned char *, but due to OpenCL
     * restrictions, we must use it as a void *. Data is deallocated on program
     * exit. */
    const int numChannels = 4;
    void* inputData = nullptr;
    void* outputData = nullptr;

    /*if (argc &amp;lt; 2) {
        std::cout &amp;lt;&amp;lt; "Please provide a JPEG or PNG image as an argument to this program." &amp;lt;&amp;lt; std::endl;
    }*/

    inputData = stbi_load("SBLA3510014B.18128057.0.3_n.jpeg"/*argv[1]*/, &amp;amp;inputWidth, &amp;amp;inputHeight, &amp;amp;inputChannels, numChannels);
    if (inputData == nullptr) {
        std::cout &amp;lt;&amp;lt; "Failed to load image file (is argv[1] a valid image file?)" &amp;lt;&amp;lt; std::endl;
        return 1;
    }
    outputData = new char[inputWidth * inputHeight * numChannels];

    const float pi = atan(1) * 4;
    static constexpr auto stddev = 2;

    /* This range represents the full amount of work to be done across the
     * image. We dispatch one thread per pixel. */
    range&amp;lt;2&amp;gt; imgRange(inputHeight, inputWidth);
    /* This is the range representing the size of the blur. */
    range&amp;lt;2&amp;gt; gaussianRange(6 * stddev, 6 * stddev);
    queue myQueue([](cl::sycl::exception_list l) {
        for (auto ep : l) {
            try {
                std::rethrow_exception(ep);
            }
            catch (const cl::sycl::exception&amp;amp; e) {
                std::cout &amp;lt;&amp;lt; "Async exception caught:\n" &amp;lt;&amp;lt; e.what() &amp;lt;&amp;lt; "\n";
                throw;
            }
        }
    });

    {
        buffer&amp;lt;float, 2&amp;gt; gaussian(gaussianRange);
        myQueue.submit([&amp;amp;](cl::sycl::handler&amp;amp; cgh) {
            auto globalGaussian = gaussian.get_access&amp;lt;access::mode::discard_write&amp;gt;(cgh);
            cgh.parallel_for&amp;lt;fillGaussian&amp;gt;(gaussianRange, [=](cl::sycl::item&amp;lt;2&amp;gt; i) {
                auto x = i[0] - 3 * stddev, y = i[1] - 3 * stddev;
                auto elem = exp(-1.f * (x * x + y * y) / (2 * stddev * stddev)) / (2 * pi * stddev * stddev);
                globalGaussian&lt;I&gt; = elem;
            });
        });

        /* Images need a void * pointing to the data, and enums describing the
         * type of the image (since a void * carries no type information). It
         * also needs a range which describes the image's dimensions. */
        image&amp;lt;2&amp;gt; image_in(inputData, co::rgba, ct::unorm_int8, imgRange);
        image&amp;lt;2&amp;gt; image_out(outputData, co::rgba, ct::unorm_int8, imgRange);

        myQueue.submit([&amp;amp;](handler&amp;amp; cgh) {
            /* The nd_range contains the total work (as mentioned previously) as
             * well as the local work size (i.e. the number of threads in the local
             * group). Here, we attempt to find a range close to the device's
             * preferred size that also divides the global size neatly. */
            auto r = get_optimal_local_range(imgRange, myQueue.get_device());
            auto myRange = nd_range&amp;lt;2&amp;gt;(imgRange, r);
            /* Images still require accessors, like buffers, except the target is
             * always access::target::image. */
            accessor&amp;lt;float4, 2, access::mode::read, access::target::image&amp;gt; inPtr(image_in, cgh);
            accessor&amp;lt;float4, 2, access::mode::write, access::target::image&amp;gt; outPtr(image_out, cgh);
            auto globalGaussian = gaussian.get_access&amp;lt;access::mode::read&amp;gt;(cgh);
            /* The sampler is used to map user-provided co-ordinates to pixels in
             * the image. */
            sampler smpl(coordinate_normalization_mode::unnormalized, addressing_mode::clamp, filtering_mode::nearest);

            cgh.parallel_for&amp;lt;GaussianKernel&amp;gt;(myRange, [=](nd_item&amp;lt;2&amp;gt; itemID) {
                float4 newPixel = float4(0.0f, 0.0f, 0.0f, 0.0f);
                constexpr auto offset = 3 * stddev;

                for (int x = -offset; x &amp;lt; offset; x++) {
                    for (int y = -offset; y &amp;lt; offset; y++) {
                        auto inputCoords = int2(itemID.get_global_id(1) + x, itemID.get_global_id(0) + y);
                        newPixel += inPtr.read(inputCoords, smpl) * globalGaussian[y + offset][x + offset];
                    }
                }

                auto outputCoords = int2(itemID.get_global_id(1), itemID.get_global_id(0));
                newPixel.w() = 1.f;
                outPtr.write(outputCoords, newPixel);
            });
        });
        myQueue.wait_and_throw();
    }

    /* Attempt to change the name from x.png or x.jpg to x-blurred.png and so
     * on. If the code cannot find a '.', it simply appends "-blurred" to the
     * name. */
    string outputFilePath;
    string inputName(argv[1]);
    auto pos = inputName.find_last_of(".");
    if (pos == std::string::npos) {
        outputFilePath = inputName + "-blurred";
    }
    else {
        string ext = inputName.substr(pos, inputName.size() - pos);
        inputName.erase(pos, inputName.size());
        outputFilePath = inputName + "-blurred" + ext;
    }

    stbi_write_png(outputFilePath.c_str(), inputWidth, inputHeight, numChannels,
        outputData, 0);

    std::cout &amp;lt;&amp;lt; "Image successfully blurred!\n";
    return 0;
}
&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2020 15:49:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162090#M239</guid>
      <dc:creator>nnain1</dc:creator>
      <dc:date>2020-04-14T15:49:20Z</dc:date>
    </item>
    <item>
      <title>Hi Nyan,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162091#M240</link>
      <description>&lt;P&gt;Hi Nyan,&lt;/P&gt;&lt;P&gt;Thanks for reaching out to us!&lt;/P&gt;&lt;P&gt;We are able to reproduce your issue, this is a known issue. We already raised an internal ticket for this issue.&lt;/P&gt;&lt;P&gt;It is likely to be fixed in future releases of oneAPI basekit. We are escalating this to the concerned team.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Goutham&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2020 10:15:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162091#M240</guid>
      <dc:creator>GouthamK_Intel</dc:creator>
      <dc:date>2020-04-15T10:15:35Z</dc:date>
    </item>
    <item>
      <title>Thanks for the reply.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162092#M241</link>
      <description>&lt;P&gt;Thanks for the reply.&lt;/P&gt;&lt;P&gt;In addition to "can't set break point inside parallel_for kernel",&lt;/P&gt;&lt;P&gt;can you run the program? Does the program crush at line 96 and 97?&lt;/P&gt;&lt;P&gt;auto x = i[0] - 3 * stddev, y = i[1] - 3 * stddev;&lt;/P&gt;&lt;P&gt;&amp;nbsp;auto elem = exp(-1.f * (x * x + y * y) / (2 * stddev * stddev)) / (2 * pi * stddev * stddev);&lt;/P&gt;&lt;P&gt;What could be the reason?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="crush.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10690i4E824DD6937CFE9A/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="crush.png" alt="crush.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 18 Apr 2020 09:06:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162092#M241</guid>
      <dc:creator>nnain1</dc:creator>
      <dc:date>2020-04-18T09:06:20Z</dc:date>
    </item>
    <item>
      <title>Hi Nyan,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162093#M242</link>
      <description>&lt;P&gt;Hi Nyan,&lt;/P&gt;&lt;P&gt;Please do the following changes in the debugging and let me know whether it works.&lt;/P&gt;&lt;P&gt;1. Set CL_CONFIG_USE_NATIVE_DEBUGGER=1&amp;nbsp; and&amp;nbsp;SYCL_PROGRAM_COMPILE_OPTIONS=-g -cl-opt-disable&amp;nbsp; additionally.&lt;/P&gt;&lt;P&gt;2. Uncheck 'Require source files to exactly match the original version' in Tools Tab of Visual Studio.&amp;nbsp;&lt;/P&gt;&lt;P&gt;It should work! Let me know for further queries&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2020 09:38:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162093#M242</guid>
      <dc:creator>Subarnarek_G_Intel</dc:creator>
      <dc:date>2020-04-21T09:38:20Z</dc:date>
    </item>
    <item>
      <title>Yes program doesn't crush</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162094#M243</link>
      <description>&lt;P&gt;Yes program doesn't crush anymore at parallel loop using this line SYCL_PROGRAM_COMPILE_OPTIONS=-g -cl-opt-disable.&lt;/P&gt;&lt;P&gt;I can't find this 'Require source files to exactly match the original version' in Tools Tab of Visual Studio. in VS2009.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But still breakpoint inside the parallel_for is not hit yet.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2020 13:18:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162094#M243</guid>
      <dc:creator>nnain1</dc:creator>
      <dc:date>2020-04-21T13:18:52Z</dc:date>
    </item>
    <item>
      <title>But program still crush</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162095#M244</link>
      <description>&lt;P&gt;But program still crush running second parallel_for loop at line 124.&lt;/P&gt;&lt;P&gt;The error is as in the attached image.&lt;span class="lia-inline-image-display-wrapper" image-alt="exception.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10693i24BF6E7CE99ABBAB/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="exception.png" alt="exception.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2020 13:45:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1162095#M244</guid>
      <dc:creator>nnain1</dc:creator>
      <dc:date>2020-04-21T13:45:02Z</dc:date>
    </item>
    <item>
      <title>Re:Debugging in DPC++ code in CPU</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1199686#M637</link>
      <description>&lt;P&gt;Please find the attached screenshot for yes for unchecking&lt;STRONG&gt;&amp;nbsp;'Require source files to exactly match the original version' in Tools Tab of Visual Studio.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Aug 2020 10:42:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1199686#M637</guid>
      <dc:creator>Subarnarek_G_Intel</dc:creator>
      <dc:date>2020-08-12T10:42:25Z</dc:date>
    </item>
    <item>
      <title>Re:Debugging in DPC++ code in CPU</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1282234#M1149</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Since you mentioned your program does not crash anymore and we have not heard back from you, we assume that your issue is resolved and we are closing this case.&lt;/P&gt;&lt;P&gt;We will no longer respond to this thread.&amp;nbsp;If you require any additional assistance from Intel, please start a new thread.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any further interaction in this thread will be considered community only.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have a Good day.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Thanks &amp;amp; Regards&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Goutham&lt;/I&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 18 May 2021 07:26:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Debugging-in-DPC-code-in-CPU/m-p/1282234#M1149</guid>
      <dc:creator>GouthamK_Intel</dc:creator>
      <dc:date>2021-05-18T07:26:50Z</dc:date>
    </item>
  </channel>
</rss>

