<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Temporary arrays in Sparse Matrix Vector Multiply Format Prototype Package in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026897#M40489</link>
    <description>&lt;P&gt;Considering "The Intel® Math Kernel Library Sparse Matrix Vector Multiply Format Prototype Package", I have two questions:&lt;/P&gt;

&lt;P&gt;1) The use of a temporary array for each thread may not pay off when the number of threads increases, especially when Xeon Phi is considered. Is this problem efficiently solved in the package? In which call are memory allocated for the temporary arrays?&lt;STRONG&gt; &lt;/STRONG&gt;In sparseCreateESBMatrix()&lt;SPAN&gt;,&lt;/SPAN&gt; sparseDcsr2esb() or sparseDesbmv() function?&lt;/P&gt;

&lt;P&gt;2) How is the reduction of temporary arrays performed on Xeon Phi? Is there any architecture-specific way? As far as I know, it is not mentioned in the paper describing the package.&lt;/P&gt;</description>
    <pubDate>Thu, 29 May 2014 13:21:38 GMT</pubDate>
    <dc:creator>kadir</dc:creator>
    <dc:date>2014-05-29T13:21:38Z</dc:date>
    <item>
      <title>Temporary arrays in Sparse Matrix Vector Multiply Format Prototype Package</title>
      <link>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026897#M40489</link>
      <description>&lt;P&gt;Considering "The Intel® Math Kernel Library Sparse Matrix Vector Multiply Format Prototype Package", I have two questions:&lt;/P&gt;

&lt;P&gt;1) The use of a temporary array for each thread may not pay off when the number of threads increases, especially when Xeon Phi is considered. Is this problem efficiently solved in the package? In which call are memory allocated for the temporary arrays?&lt;STRONG&gt; &lt;/STRONG&gt;In sparseCreateESBMatrix()&lt;SPAN&gt;,&lt;/SPAN&gt; sparseDcsr2esb() or sparseDesbmv() function?&lt;/P&gt;

&lt;P&gt;2) How is the reduction of temporary arrays performed on Xeon Phi? Is there any architecture-specific way? As far as I know, it is not mentioned in the paper describing the package.&lt;/P&gt;</description>
      <pubDate>Thu, 29 May 2014 13:21:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026897#M40489</guid>
      <dc:creator>kadir</dc:creator>
      <dc:date>2014-05-29T13:21:38Z</dc:date>
    </item>
    <item>
      <title>Kadir,</title>
      <link>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026898#M40490</link>
      <description>&lt;P&gt;Kadir,&lt;/P&gt;

&lt;P&gt;One of our MKL team is looking at your question and will get back to you soon.&lt;/P&gt;

&lt;P&gt;Regards&lt;BR /&gt;
	--&lt;BR /&gt;
	Tayulor&lt;/P&gt;</description>
      <pubDate>Fri, 30 May 2014 16:57:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026898#M40490</guid>
      <dc:creator>TaylorIoTKidd</dc:creator>
      <dc:date>2014-05-30T16:57:51Z</dc:date>
    </item>
    <item>
      <title>Hi Kadir,The package performs</title>
      <link>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026899#M40491</link>
      <description>&lt;P&gt;Hi Kadir,&lt;/P&gt;
&lt;P&gt;The package performs sparse matrix-vector multiplication for general case only, i.e. y = alpha * A * x + beta * y. So this case requires almost no synchronization between threads. For ESB functionality we need to allocate the following arrays in sparseDcsr2esb() function: arrays for matrix representation in ESB format and some arrays for workload balancing and synchronization - sizes of these arrays depend on sparse matrix structure and don't depend on number of threads.&lt;/P&gt;
&lt;P&gt;Because synchronization&amp;nbsp;may require for consecutive threads only, so SpMV functionality should demonstrate good scalability on Intel Xeon Phi.&lt;/P&gt;
&lt;P&gt;Could you please clarify your questions about temporary arrays?&lt;/P&gt;
&lt;P&gt;Regards, Sergey&lt;/P&gt;</description>
      <pubDate>Mon, 02 Jun 2014 07:58:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026899#M40491</guid>
      <dc:creator>Sergey_P_Intel2</dc:creator>
      <dc:date>2014-06-02T07:58:01Z</dc:date>
    </item>
    <item>
      <title>2nd paragraph of Subsection 4</title>
      <link>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026900#M40492</link>
      <description>&lt;P&gt;2nd paragraph of Subsection 4.3 entitled "SpMV Kernel with ESB Format" taken from the paper describing the package is as follows:&lt;/P&gt;

&lt;P&gt;"In practice, we also parallelize the SpMV operation across col-&lt;BR /&gt;
	umn blocks. Here, we use a temporary copy of y for each block and&lt;BR /&gt;
	use a reduce operation across these temporary copies at the end of&lt;BR /&gt;
	the computation."&lt;/P&gt;

&lt;P&gt;Caption of Algorithm 2 is as follows:&lt;/P&gt;

&lt;P&gt;"Algorithm 2: Multiply the i th column block of matrix in ESB&lt;BR /&gt;
	format"&lt;/P&gt;

&lt;P&gt;As far as I understand, there is a synchronization point after the multiplication of column blocks. After the synchronization point, the&amp;nbsp; temporary copy of y for each block undergoes a reduction operation.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jun 2014 07:04:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026900#M40492</guid>
      <dc:creator>kadir</dc:creator>
      <dc:date>2014-06-04T07:04:09Z</dc:date>
    </item>
    <item>
      <title>Hi Kadir,Matrix in ESB format</title>
      <link>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026901#M40493</link>
      <description>&lt;P&gt;Hi Kadir,&lt;/P&gt;
&lt;P&gt;Matrix in ESB format is stored in slices: each slice consists of&amp;nbsp;8 rows&amp;nbsp;and stored in ELLPACK format. In fact, each thread computes update for output vector Y in the form of vector register variable of 512 bit length. So when several threads computed parts of the&amp;nbsp;same slice (parallelize across column blocks), they need to synchronize update of just these small variables. So&amp;nbsp;we don't need to allocate a lot of additional memory for every thread.&lt;/P&gt;
&lt;P&gt;Regards, Sergey&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jun 2014 04:23:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026901#M40493</guid>
      <dc:creator>Sergey_P_Intel2</dc:creator>
      <dc:date>2014-06-06T04:23:32Z</dc:date>
    </item>
    <item>
      <title>How are the synchronizations</title>
      <link>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026902#M40494</link>
      <description>&lt;P&gt;How are the synchronizations and atomic updates implemented? Performance of #pragma omp atomic is too much low. Are there any other efficient solutions for synchronizations and atomic updates when there are more than 200 threads on Intel Xeon Phi? I was not able to find any information about implementation of the synchronizations and atomic updates in the paper.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jun 2014 08:01:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Temporary-arrays-in-Sparse-Matrix-Vector-Multiply-Format/m-p/1026902#M40494</guid>
      <dc:creator>kadir</dc:creator>
      <dc:date>2014-06-06T08:01:47Z</dc:date>
    </item>
  </channel>
</rss>

