<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Write a parallel code in C++ 11 and OpenMP that solves efficiently the main operation in a (CNN) in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Write-a-parallel-code-in-C-11-and-OpenMP-that-solves-efficiently/m-p/1603422#M3816</link>
    <description>&lt;P&gt;To maximize parallel operations on a computer with an M3 processor that contains 11 CPUs and 11 threads, it is essential to optimize the workload distribution and ensure efficient resource utilization.&lt;BR /&gt;Implementation details:&lt;BR /&gt;Inputs: file with input matrix (you choose the size) and kernel (fixed size 4x4)&lt;BR /&gt;The goal is to first have a working sequential code for the four operations. Then, parallelize the&lt;BR /&gt;operations that can be efficiently parallelized. Pay special attention to data races (more threads&lt;BR /&gt;requesting the same input) and to concurrency/conflicts (multiple threads updating the same&lt;BR /&gt;output). As a general suggestion, to achieve the best performance you should group in one&lt;BR /&gt;thread (or in multiple threads executed in the same CPU) all the operations that work on the&lt;BR /&gt;same input data. This avoids costly data copy to multiple locations.&lt;BR /&gt;There are multiple ways to parallelize the code. You can parallelize the single convolution or you&lt;BR /&gt;can parallelize the convolutions (each thread executes a 4x4 convolution). Please discuss the&lt;BR /&gt;benefit of each solution and evaluate the performance of both.&lt;BR /&gt;Suggestion: when you parallelize convolutions pay attention that if multiple threads take&lt;BR /&gt;subsequent sliding convolutions they all will need the same part of the input data, thus....&lt;BR /&gt;You need to create an OpenMP file with the implementation of the convolution and a main&lt;BR /&gt;file for testing the function. The main will:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;read the input matrix from a text file (matrix.txt) - randomly generated or static, you choose&lt;/LI&gt;&lt;LI&gt;read the kernel from a text file (kernel.txt) - fixed 4x4 size, you choose the values&lt;/LI&gt;&lt;LI&gt;apply convolution and save the result in a file&lt;BR /&gt;You need to present a performance report where you show the measurements of the&lt;BR /&gt;execution time of the sequential implementation (to simply, simply set the number of threads to&lt;/LI&gt;&lt;/UL&gt;&lt;OL&gt;&lt;LI&gt;and of various parallel implementations (degree of parallelism, threads distribution, threads&lt;BR /&gt;grouping etc...). Write your consideration in a PDF document to add to the submission.&lt;BR /&gt;1 - consider the "zero padding", by performing convolution in the whole input matrix, till the last&lt;BR /&gt;column and the last row. Please do not add 3 extra rows and 3 extra columns of zeros in the&lt;BR /&gt;input matrix but try more smart solutions. The output matrix will have the same size as the input&lt;BR /&gt;matrix.&lt;BR /&gt;2 - consider bigger input matrix sizes and discuss if/why the performance improves.&lt;BR /&gt;Someone has some ideas?&lt;BR /&gt;&lt;SPAN&gt;My idea was to create 10 submatrices, each managed by a thread, with a master thread overseeing the operation. Each submatrix will be of appropriate size. For instance, if I have an input matrix of 100x100, the matrix will be divided into 10 submatrices of 52x22 to ensure all possible combinations are covered&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
    <pubDate>Mon, 03 Jun 2024 20:47:09 GMT</pubDate>
    <dc:creator>martaasperanza</dc:creator>
    <dc:date>2024-06-03T20:47:09Z</dc:date>
    <item>
      <title>Write a parallel code in C++ 11 and OpenMP that solves efficiently the main operation in a (CNN)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Write-a-parallel-code-in-C-11-and-OpenMP-that-solves-efficiently/m-p/1603422#M3816</link>
      <description>&lt;P&gt;To maximize parallel operations on a computer with an M3 processor that contains 11 CPUs and 11 threads, it is essential to optimize the workload distribution and ensure efficient resource utilization.&lt;BR /&gt;Implementation details:&lt;BR /&gt;Inputs: file with input matrix (you choose the size) and kernel (fixed size 4x4)&lt;BR /&gt;The goal is to first have a working sequential code for the four operations. Then, parallelize the&lt;BR /&gt;operations that can be efficiently parallelized. Pay special attention to data races (more threads&lt;BR /&gt;requesting the same input) and to concurrency/conflicts (multiple threads updating the same&lt;BR /&gt;output). As a general suggestion, to achieve the best performance you should group in one&lt;BR /&gt;thread (or in multiple threads executed in the same CPU) all the operations that work on the&lt;BR /&gt;same input data. This avoids costly data copy to multiple locations.&lt;BR /&gt;There are multiple ways to parallelize the code. You can parallelize the single convolution or you&lt;BR /&gt;can parallelize the convolutions (each thread executes a 4x4 convolution). Please discuss the&lt;BR /&gt;benefit of each solution and evaluate the performance of both.&lt;BR /&gt;Suggestion: when you parallelize convolutions pay attention that if multiple threads take&lt;BR /&gt;subsequent sliding convolutions they all will need the same part of the input data, thus....&lt;BR /&gt;You need to create an OpenMP file with the implementation of the convolution and a main&lt;BR /&gt;file for testing the function. The main will:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;read the input matrix from a text file (matrix.txt) - randomly generated or static, you choose&lt;/LI&gt;&lt;LI&gt;read the kernel from a text file (kernel.txt) - fixed 4x4 size, you choose the values&lt;/LI&gt;&lt;LI&gt;apply convolution and save the result in a file&lt;BR /&gt;You need to present a performance report where you show the measurements of the&lt;BR /&gt;execution time of the sequential implementation (to simply, simply set the number of threads to&lt;/LI&gt;&lt;/UL&gt;&lt;OL&gt;&lt;LI&gt;and of various parallel implementations (degree of parallelism, threads distribution, threads&lt;BR /&gt;grouping etc...). Write your consideration in a PDF document to add to the submission.&lt;BR /&gt;1 - consider the "zero padding", by performing convolution in the whole input matrix, till the last&lt;BR /&gt;column and the last row. Please do not add 3 extra rows and 3 extra columns of zeros in the&lt;BR /&gt;input matrix but try more smart solutions. The output matrix will have the same size as the input&lt;BR /&gt;matrix.&lt;BR /&gt;2 - consider bigger input matrix sizes and discuss if/why the performance improves.&lt;BR /&gt;Someone has some ideas?&lt;BR /&gt;&lt;SPAN&gt;My idea was to create 10 submatrices, each managed by a thread, with a master thread overseeing the operation. Each submatrix will be of appropriate size. For instance, if I have an input matrix of 100x100, the matrix will be divided into 10 submatrices of 52x22 to ensure all possible combinations are covered&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Mon, 03 Jun 2024 20:47:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Write-a-parallel-code-in-C-11-and-OpenMP-that-solves-efficiently/m-p/1603422#M3816</guid>
      <dc:creator>martaasperanza</dc:creator>
      <dc:date>2024-06-03T20:47:09Z</dc:date>
    </item>
    <item>
      <title>Re: Write a parallel code in C++ 11 and OpenMP that solves efficiently the main operation in a (CNN)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Write-a-parallel-code-in-C-11-and-OpenMP-that-solves-efficiently/m-p/1604564#M3826</link>
      <description>&lt;P&gt;This forum is specifically for people to discuss problems related to Intel oneAPI DPC++/C++ compiler.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jun 2024 19:36:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Write-a-parallel-code-in-C-11-and-OpenMP-that-solves-efficiently/m-p/1604564#M3826</guid>
      <dc:creator>Alex_Y_Intel</dc:creator>
      <dc:date>2024-06-06T19:36:30Z</dc:date>
    </item>
  </channel>
</rss>

