<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic one API runtime causing segmentation issues while using joint_matrix calls in GPU Compute Software</title>
    <link>https://community.intel.com/t5/GPU-Compute-Software/one-API-runtime-causing-segmentation-issues-while-using-joint/m-p/1661176#M1714</link>
    <description>&lt;P&gt;I have a intel NUC with onboard A770 GPU. NUC12SNKi72.&amp;nbsp;&lt;/P&gt;&lt;P&gt;CPU - ADL i7 12700H&amp;nbsp;&lt;/P&gt;&lt;P&gt;RAM 64GB&lt;/P&gt;&lt;P&gt;GPU A770&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have installed one API 2025 base tool kit using WSL ubuntu on my windows machine. Its detecting my a770 GPU, which can be seen through syscl-ls.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x5690] OpenCL 3.0 NEO&amp;nbsp; [23.17.26241.33]&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[opencl:gpu][opencl:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x46a6] OpenCL 3.0 NEO&amp;nbsp; [23.17.26241.33]&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;when i execute any GPU kernel which calls joint_matrix* APIs. It causes seg fault. This is a sample code for xmx using joint_matrix_fill--&amp;gt;&lt;/P&gt;&lt;P&gt;#include &amp;lt;sycl/sycl.hpp&amp;gt;&lt;BR /&gt;#include &amp;lt;sycl/ext/oneapi/matrix/matrix.hpp&amp;gt;&lt;BR /&gt;#include &amp;lt;iostream&amp;gt;&lt;/P&gt;&lt;P&gt;using namespace sycl::ext::oneapi::experimental::matrix;&lt;/P&gt;&lt;P&gt;constexpr size_t TM = 8; // Tile dimensions&lt;BR /&gt;constexpr size_t TN = 8;&lt;/P&gt;&lt;P&gt;void test_joint_matrix_fill(sycl::queue &amp;amp;q, size_t SG_SZ) {&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Testing joint_matrix_fill with subgroup size: " &amp;lt;&amp;lt; SG_SZ &amp;lt;&amp;lt; "\n";&lt;/P&gt;&lt;P&gt;try {&lt;BR /&gt;sycl::buffer&amp;lt;float, 2&amp;gt; bufC(sycl::range&amp;lt;2&amp;gt;(TM, TN)); // Buffer for storing results&lt;/P&gt;&lt;P&gt;if (SG_SZ == &lt;LI-EMOJI id="lia_smiling-face-with-sunglasses" title=":smiling_face_with_sunglasses:"&gt;&lt;/LI-EMOJI&gt; {&lt;BR /&gt;q.submit([&amp;amp;](sycl::handler &amp;amp;h) {&lt;BR /&gt;auto accC = bufC.get_access&amp;lt;sycl::access::mode::write&amp;gt;(h);&lt;/P&gt;&lt;P&gt;h.parallel_for(&lt;BR /&gt;sycl::nd_range&amp;lt;2&amp;gt;({1, 8}, {1, 8}),&lt;BR /&gt;[=](sycl::nd_item&amp;lt;2&amp;gt; it) [[intel::reqd_sub_group_size(8)]] {&lt;BR /&gt;sycl::sub_group sg = it.get_sub_group();&lt;/P&gt;&lt;P&gt;joint_matrix&amp;lt;sycl::sub_group, float, use::accumulator, TM, TN&amp;gt; sub_acc;&lt;/P&gt;&lt;P&gt;// Step 1: Initialize joint_matrix with a constant value&lt;BR /&gt;joint_matrix_fill(sg, sub_acc, 1.0f);&lt;/P&gt;&lt;P&gt;// Step 2: Store the joint_matrix result back to global memory&lt;BR /&gt;joint_matrix_store(sg, sub_acc, accC.get_pointer(), TN, layout::row_major);&lt;BR /&gt;});&lt;BR /&gt;}).wait();&lt;BR /&gt;} else if (SG_SZ == 16) {&lt;BR /&gt;q.submit([&amp;amp;](sycl::handler &amp;amp;h) {&lt;BR /&gt;auto accC = bufC.get_access&amp;lt;sycl::access::mode::write&amp;gt;(h);&lt;/P&gt;&lt;P&gt;h.parallel_for(&lt;BR /&gt;sycl::nd_range&amp;lt;2&amp;gt;({1, 16}, {1, 16}),&lt;BR /&gt;[=](sycl::nd_item&amp;lt;2&amp;gt; it) [[intel::reqd_sub_group_size(16)]] {&lt;BR /&gt;sycl::sub_group sg = it.get_sub_group();&lt;/P&gt;&lt;P&gt;joint_matrix&amp;lt;sycl::sub_group, float, use::accumulator, TM, TN&amp;gt; sub_acc;&lt;/P&gt;&lt;P&gt;// Step 1: Initialize joint_matrix with a constant value&lt;BR /&gt;joint_matrix_fill(sg, sub_acc, 1.0f);&lt;/P&gt;&lt;P&gt;// Step 2: Store the joint_matrix result back to global memory&lt;BR /&gt;joint_matrix_store(sg, sub_acc, accC.get_pointer(), TN, layout::row_major);&lt;BR /&gt;});&lt;BR /&gt;}).wait();&lt;BR /&gt;} else if (SG_SZ == 32) {&lt;BR /&gt;q.submit([&amp;amp;](sycl::handler &amp;amp;h) {&lt;BR /&gt;auto accC = bufC.get_access&amp;lt;sycl::access::mode::write&amp;gt;(h);&lt;/P&gt;&lt;P&gt;h.parallel_for(&lt;BR /&gt;sycl::nd_range&amp;lt;2&amp;gt;({1, 32}, {1, 32}),&lt;BR /&gt;[=](sycl::nd_item&amp;lt;2&amp;gt; it) [[intel::reqd_sub_group_size(32)]] {&lt;BR /&gt;sycl::sub_group sg = it.get_sub_group();&lt;/P&gt;&lt;P&gt;joint_matrix&amp;lt;sycl::sub_group, float, use::accumulator, TM, TN&amp;gt; sub_acc;&lt;/P&gt;&lt;P&gt;// Step 1: Initialize joint_matrix with a constant value&lt;BR /&gt;joint_matrix_fill(sg, sub_acc, 1.0f);&lt;/P&gt;&lt;P&gt;// Step 2: Store the joint_matrix result back to global memory&lt;BR /&gt;joint_matrix_store(sg, sub_acc, accC.get_pointer(), TN, layout::row_major);&lt;BR /&gt;});&lt;BR /&gt;}).wait();&lt;BR /&gt;} else {&lt;BR /&gt;std::cerr &amp;lt;&amp;lt; "Unsupported subgroup size: " &amp;lt;&amp;lt; SG_SZ &amp;lt;&amp;lt; "\n";&lt;BR /&gt;return;&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;// Retrieve and print results&lt;BR /&gt;auto hostC = bufC.get_access&amp;lt;sycl::access::mode::read&amp;gt;();&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Resultant matrix C:\n";&lt;BR /&gt;for (size_t i = 0; i &amp;lt; TM; i++) {&lt;BR /&gt;for (size_t j = 0; j &amp;lt; TN; j++) {&lt;BR /&gt;std::cout &amp;lt;&amp;lt; hostC[i][j] &amp;lt;&amp;lt; " ";&lt;BR /&gt;}&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "\n";&lt;BR /&gt;}&lt;BR /&gt;} catch (sycl::exception const &amp;amp;e) {&lt;BR /&gt;std::cerr &amp;lt;&amp;lt; "SYCL exception caught: " &amp;lt;&amp;lt; e.what() &amp;lt;&amp;lt; "\n";&lt;BR /&gt;}&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;int main() {&lt;BR /&gt;try {&lt;BR /&gt;// Initialize the SYCL queue&lt;BR /&gt;sycl::queue q{sycl::default_selector{}};&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Running on device: " &amp;lt;&amp;lt; q.get_device().get_info&amp;lt;sycl::info::device::name&amp;gt;() &amp;lt;&amp;lt; "\n";&lt;/P&gt;&lt;P&gt;// Query supported subgroup sizes&lt;BR /&gt;auto subgroup_sizes = q.get_device().get_info&amp;lt;sycl::info::device::sub_group_sizes&amp;gt;();&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Supported subgroup sizes: ";&lt;BR /&gt;for (const auto &amp;amp;size : subgroup_sizes) {&lt;BR /&gt;std::cout &amp;lt;&amp;lt; size &amp;lt;&amp;lt; " ";&lt;BR /&gt;}&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "\n";&lt;/P&gt;&lt;P&gt;// Test with each supported subgroup size&lt;BR /&gt;for (const auto &amp;amp;SG_SZ : subgroup_sizes) {&lt;BR /&gt;test_joint_matrix_fill(q, SG_SZ);&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;// Optional: Force CPU execution to isolate GPU-specific issues&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Testing on CPU...\n";&lt;BR /&gt;sycl::queue cpu_queue{sycl::cpu_selector{}};&lt;BR /&gt;test_joint_matrix_fill(cpu_queue, 8); // Default subgroup size for CPU&lt;/P&gt;&lt;P&gt;} catch (sycl::exception const &amp;amp;e) {&lt;BR /&gt;std::cerr &amp;lt;&amp;lt; "SYCL exception caught during initialization: " &amp;lt;&amp;lt; e.what() &amp;lt;&amp;lt; "\n";&lt;BR /&gt;return 1;&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;return 0;&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;Compile this code with icpx -fsycl &amp;lt;file&amp;gt; -o &amp;lt;output&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From the GDB back trace --&amp;gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#35&amp;nbsp; 0x0000000000404888 in test_joint_matrix_fill (q=..., SG_SZ=8) at xmx.cpp:17&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#34&amp;nbsp; 0x0000000000404d61 in sycl::_V1::queue::submit&amp;lt;test_joint_matrix_fill(...)&amp;gt;::submit(...) at sycl/queue.hpp:359&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#8&amp;nbsp; 0x00007fffd9e10973 in ?? () from /lib/x86_64-linux-gnu/libigc.so.1&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#14 0x00007fffe4edfc0f in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#17 0x00007ffff4edf47c in urProgramBuild () from /home/vaibhav/intel/oneapi/compiler/2025.0/lib/libur_adapter_opencl.so.0&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;libigc.so.1&lt;/STRONG&gt;&lt;SPAN&gt;: Intel Graphics Compiler (IGC), which compiles kernels for Intel GPUs.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;libigdrcl.so&lt;/STRONG&gt;&lt;SPAN&gt;: Intel GPU runtime library responsible for managing GPU tasks.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;urProgramBuild&lt;/STRONG&gt;&lt;SPAN&gt;: Part of the Unified Runtime (UR) that manages kernel program building.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;The crash occurs during kernel compilation or execution by the GPU runtime.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Need your teams support to check this issue with oneAPI runtime on linux using NUC with A770.&amp;nbsp; i tired this on another NUC with A770, the issue is reproducible.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 28 Jan 2025 09:52:01 GMT</pubDate>
    <dc:creator>Vaibhav_S_Intel</dc:creator>
    <dc:date>2025-01-28T09:52:01Z</dc:date>
    <item>
      <title>one API runtime causing segmentation issues while using joint_matrix calls</title>
      <link>https://community.intel.com/t5/GPU-Compute-Software/one-API-runtime-causing-segmentation-issues-while-using-joint/m-p/1661176#M1714</link>
      <description>&lt;P&gt;I have a intel NUC with onboard A770 GPU. NUC12SNKi72.&amp;nbsp;&lt;/P&gt;&lt;P&gt;CPU - ADL i7 12700H&amp;nbsp;&lt;/P&gt;&lt;P&gt;RAM 64GB&lt;/P&gt;&lt;P&gt;GPU A770&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have installed one API 2025 base tool kit using WSL ubuntu on my windows machine. Its detecting my a770 GPU, which can be seen through syscl-ls.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x5690] OpenCL 3.0 NEO&amp;nbsp; [23.17.26241.33]&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[opencl:gpu][opencl:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x46a6] OpenCL 3.0 NEO&amp;nbsp; [23.17.26241.33]&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;when i execute any GPU kernel which calls joint_matrix* APIs. It causes seg fault. This is a sample code for xmx using joint_matrix_fill--&amp;gt;&lt;/P&gt;&lt;P&gt;#include &amp;lt;sycl/sycl.hpp&amp;gt;&lt;BR /&gt;#include &amp;lt;sycl/ext/oneapi/matrix/matrix.hpp&amp;gt;&lt;BR /&gt;#include &amp;lt;iostream&amp;gt;&lt;/P&gt;&lt;P&gt;using namespace sycl::ext::oneapi::experimental::matrix;&lt;/P&gt;&lt;P&gt;constexpr size_t TM = 8; // Tile dimensions&lt;BR /&gt;constexpr size_t TN = 8;&lt;/P&gt;&lt;P&gt;void test_joint_matrix_fill(sycl::queue &amp;amp;q, size_t SG_SZ) {&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Testing joint_matrix_fill with subgroup size: " &amp;lt;&amp;lt; SG_SZ &amp;lt;&amp;lt; "\n";&lt;/P&gt;&lt;P&gt;try {&lt;BR /&gt;sycl::buffer&amp;lt;float, 2&amp;gt; bufC(sycl::range&amp;lt;2&amp;gt;(TM, TN)); // Buffer for storing results&lt;/P&gt;&lt;P&gt;if (SG_SZ == &lt;LI-EMOJI id="lia_smiling-face-with-sunglasses" title=":smiling_face_with_sunglasses:"&gt;&lt;/LI-EMOJI&gt; {&lt;BR /&gt;q.submit([&amp;amp;](sycl::handler &amp;amp;h) {&lt;BR /&gt;auto accC = bufC.get_access&amp;lt;sycl::access::mode::write&amp;gt;(h);&lt;/P&gt;&lt;P&gt;h.parallel_for(&lt;BR /&gt;sycl::nd_range&amp;lt;2&amp;gt;({1, 8}, {1, 8}),&lt;BR /&gt;[=](sycl::nd_item&amp;lt;2&amp;gt; it) [[intel::reqd_sub_group_size(8)]] {&lt;BR /&gt;sycl::sub_group sg = it.get_sub_group();&lt;/P&gt;&lt;P&gt;joint_matrix&amp;lt;sycl::sub_group, float, use::accumulator, TM, TN&amp;gt; sub_acc;&lt;/P&gt;&lt;P&gt;// Step 1: Initialize joint_matrix with a constant value&lt;BR /&gt;joint_matrix_fill(sg, sub_acc, 1.0f);&lt;/P&gt;&lt;P&gt;// Step 2: Store the joint_matrix result back to global memory&lt;BR /&gt;joint_matrix_store(sg, sub_acc, accC.get_pointer(), TN, layout::row_major);&lt;BR /&gt;});&lt;BR /&gt;}).wait();&lt;BR /&gt;} else if (SG_SZ == 16) {&lt;BR /&gt;q.submit([&amp;amp;](sycl::handler &amp;amp;h) {&lt;BR /&gt;auto accC = bufC.get_access&amp;lt;sycl::access::mode::write&amp;gt;(h);&lt;/P&gt;&lt;P&gt;h.parallel_for(&lt;BR /&gt;sycl::nd_range&amp;lt;2&amp;gt;({1, 16}, {1, 16}),&lt;BR /&gt;[=](sycl::nd_item&amp;lt;2&amp;gt; it) [[intel::reqd_sub_group_size(16)]] {&lt;BR /&gt;sycl::sub_group sg = it.get_sub_group();&lt;/P&gt;&lt;P&gt;joint_matrix&amp;lt;sycl::sub_group, float, use::accumulator, TM, TN&amp;gt; sub_acc;&lt;/P&gt;&lt;P&gt;// Step 1: Initialize joint_matrix with a constant value&lt;BR /&gt;joint_matrix_fill(sg, sub_acc, 1.0f);&lt;/P&gt;&lt;P&gt;// Step 2: Store the joint_matrix result back to global memory&lt;BR /&gt;joint_matrix_store(sg, sub_acc, accC.get_pointer(), TN, layout::row_major);&lt;BR /&gt;});&lt;BR /&gt;}).wait();&lt;BR /&gt;} else if (SG_SZ == 32) {&lt;BR /&gt;q.submit([&amp;amp;](sycl::handler &amp;amp;h) {&lt;BR /&gt;auto accC = bufC.get_access&amp;lt;sycl::access::mode::write&amp;gt;(h);&lt;/P&gt;&lt;P&gt;h.parallel_for(&lt;BR /&gt;sycl::nd_range&amp;lt;2&amp;gt;({1, 32}, {1, 32}),&lt;BR /&gt;[=](sycl::nd_item&amp;lt;2&amp;gt; it) [[intel::reqd_sub_group_size(32)]] {&lt;BR /&gt;sycl::sub_group sg = it.get_sub_group();&lt;/P&gt;&lt;P&gt;joint_matrix&amp;lt;sycl::sub_group, float, use::accumulator, TM, TN&amp;gt; sub_acc;&lt;/P&gt;&lt;P&gt;// Step 1: Initialize joint_matrix with a constant value&lt;BR /&gt;joint_matrix_fill(sg, sub_acc, 1.0f);&lt;/P&gt;&lt;P&gt;// Step 2: Store the joint_matrix result back to global memory&lt;BR /&gt;joint_matrix_store(sg, sub_acc, accC.get_pointer(), TN, layout::row_major);&lt;BR /&gt;});&lt;BR /&gt;}).wait();&lt;BR /&gt;} else {&lt;BR /&gt;std::cerr &amp;lt;&amp;lt; "Unsupported subgroup size: " &amp;lt;&amp;lt; SG_SZ &amp;lt;&amp;lt; "\n";&lt;BR /&gt;return;&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;// Retrieve and print results&lt;BR /&gt;auto hostC = bufC.get_access&amp;lt;sycl::access::mode::read&amp;gt;();&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Resultant matrix C:\n";&lt;BR /&gt;for (size_t i = 0; i &amp;lt; TM; i++) {&lt;BR /&gt;for (size_t j = 0; j &amp;lt; TN; j++) {&lt;BR /&gt;std::cout &amp;lt;&amp;lt; hostC[i][j] &amp;lt;&amp;lt; " ";&lt;BR /&gt;}&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "\n";&lt;BR /&gt;}&lt;BR /&gt;} catch (sycl::exception const &amp;amp;e) {&lt;BR /&gt;std::cerr &amp;lt;&amp;lt; "SYCL exception caught: " &amp;lt;&amp;lt; e.what() &amp;lt;&amp;lt; "\n";&lt;BR /&gt;}&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;int main() {&lt;BR /&gt;try {&lt;BR /&gt;// Initialize the SYCL queue&lt;BR /&gt;sycl::queue q{sycl::default_selector{}};&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Running on device: " &amp;lt;&amp;lt; q.get_device().get_info&amp;lt;sycl::info::device::name&amp;gt;() &amp;lt;&amp;lt; "\n";&lt;/P&gt;&lt;P&gt;// Query supported subgroup sizes&lt;BR /&gt;auto subgroup_sizes = q.get_device().get_info&amp;lt;sycl::info::device::sub_group_sizes&amp;gt;();&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Supported subgroup sizes: ";&lt;BR /&gt;for (const auto &amp;amp;size : subgroup_sizes) {&lt;BR /&gt;std::cout &amp;lt;&amp;lt; size &amp;lt;&amp;lt; " ";&lt;BR /&gt;}&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "\n";&lt;/P&gt;&lt;P&gt;// Test with each supported subgroup size&lt;BR /&gt;for (const auto &amp;amp;SG_SZ : subgroup_sizes) {&lt;BR /&gt;test_joint_matrix_fill(q, SG_SZ);&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;// Optional: Force CPU execution to isolate GPU-specific issues&lt;BR /&gt;std::cout &amp;lt;&amp;lt; "Testing on CPU...\n";&lt;BR /&gt;sycl::queue cpu_queue{sycl::cpu_selector{}};&lt;BR /&gt;test_joint_matrix_fill(cpu_queue, 8); // Default subgroup size for CPU&lt;/P&gt;&lt;P&gt;} catch (sycl::exception const &amp;amp;e) {&lt;BR /&gt;std::cerr &amp;lt;&amp;lt; "SYCL exception caught during initialization: " &amp;lt;&amp;lt; e.what() &amp;lt;&amp;lt; "\n";&lt;BR /&gt;return 1;&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;return 0;&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;Compile this code with icpx -fsycl &amp;lt;file&amp;gt; -o &amp;lt;output&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From the GDB back trace --&amp;gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#35&amp;nbsp; 0x0000000000404888 in test_joint_matrix_fill (q=..., SG_SZ=8) at xmx.cpp:17&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#34&amp;nbsp; 0x0000000000404d61 in sycl::_V1::queue::submit&amp;lt;test_joint_matrix_fill(...)&amp;gt;::submit(...) at sycl/queue.hpp:359&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#8&amp;nbsp; 0x00007fffd9e10973 in ?? () from /lib/x86_64-linux-gnu/libigc.so.1&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#14 0x00007fffe4edfc0f in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;#17 0x00007ffff4edf47c in urProgramBuild () from /home/vaibhav/intel/oneapi/compiler/2025.0/lib/libur_adapter_opencl.so.0&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;libigc.so.1&lt;/STRONG&gt;&lt;SPAN&gt;: Intel Graphics Compiler (IGC), which compiles kernels for Intel GPUs.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;libigdrcl.so&lt;/STRONG&gt;&lt;SPAN&gt;: Intel GPU runtime library responsible for managing GPU tasks.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;urProgramBuild&lt;/STRONG&gt;&lt;SPAN&gt;: Part of the Unified Runtime (UR) that manages kernel program building.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;The crash occurs during kernel compilation or execution by the GPU runtime.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Need your teams support to check this issue with oneAPI runtime on linux using NUC with A770.&amp;nbsp; i tired this on another NUC with A770, the issue is reproducible.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Jan 2025 09:52:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/GPU-Compute-Software/one-API-runtime-causing-segmentation-issues-while-using-joint/m-p/1661176#M1714</guid>
      <dc:creator>Vaibhav_S_Intel</dc:creator>
      <dc:date>2025-01-28T09:52:01Z</dc:date>
    </item>
  </channel>
</rss>

