<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Edoardo, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Asynchronus-offloading-problem-on-the-Intel-Xeon-Phic-7120P/m-p/1158000#M27730</link>
    <description>&lt;P&gt;Hi Edoardo,&lt;BR /&gt;
	​&lt;/P&gt;

&lt;P&gt;A: MKL provide mic offload sample under mkl install fodler, could you please try them first and see if they are workable?&lt;BR /&gt;
	&lt;BR /&gt;
	​B:&amp;nbsp;There is simple sample in MKL user guide&lt;/P&gt;

&lt;P&gt;&lt;SPAN class="fontstyle0"&gt;&lt;FONT size="2"&gt;/* Upload A and B to the card, and do not deallocate them after the pragma.&lt;BR /&gt;
	* C is uploaded and downloaded back, but the allocated memory is retained. */&lt;BR /&gt;
	#pragma offload target(mic:0) \&lt;BR /&gt;
	in(A: length(matrix_elements) alloc_if(1) free_if(0)) \&lt;BR /&gt;
	in(B: length(matrix_elements) alloc_if(1) free_if(0)) \&lt;BR /&gt;
	in(transa, transb, N, alpha, beta) \&lt;BR /&gt;
	inout(C:length(matrix_elements) alloc_if(1) free_if(0))&lt;BR /&gt;
	{&lt;BR /&gt;
	sgemm(&amp;amp;transa, &amp;amp;transb, &amp;amp;N, &amp;amp;N, &amp;amp;N, &amp;amp;alpha, A, &amp;amp;N, B, &amp;amp;N,&lt;BR /&gt;
	&amp;amp;beta, C, &amp;amp;N);&lt;BR /&gt;
	}&lt;BR /&gt;
	/* Change C here */&lt;BR /&gt;
	/* Reuse A and B on the card, and upload the new C. Free all the memory on&lt;BR /&gt;
	* the card. */&lt;BR /&gt;
	#pragma offload target(mic:0) \&lt;BR /&gt;
	nocopy(A: length(matrix_elements) alloc_if(0) free_if(1)) \&lt;BR /&gt;
	nocopy(B: length(matrix_elements) alloc_if(0) free_if(1)) \&lt;BR /&gt;
	in(transa, transb, N, alpha, beta) \&lt;BR /&gt;
	inout(C:length(matrix_elements) alloc_if(0) free_if(1))&lt;BR /&gt;
	{&lt;BR /&gt;
	sgemm(&amp;amp;transa, &amp;amp;transb, &amp;amp;N, &amp;amp;N, &amp;amp;N, &amp;amp;alpha, A, &amp;amp;N, B, &amp;amp;N,&lt;BR /&gt;
	&amp;amp;beta, C, &amp;amp;N);&lt;BR /&gt;
	}&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;B&gt;&lt;FONT color="#0860a8" size="3"&gt;See Also&lt;/FONT&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3"&gt;&lt;FONT color="#0860a8" face="Verdana" size="2"&gt;Intel&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN class="fontstyle0" style="color:rgb(8,96,168);font-size:9pt;"&gt;® &lt;/SPAN&gt;&lt;SPAN class="fontstyle3"&gt;&lt;FONT color="#0860a8" face="Verdana" size="2"&gt;Software Documentation Library &lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN class="fontstyle3" style="color:rgb(0,0,0);"&gt;&lt;FONT face="Verdana" size="2"&gt;for Intel&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN class="fontstyle0" style="font-size:9pt;"&gt;® &lt;/SPAN&gt;&lt;SPAN class="fontstyle3" style="color:rgb(0,0,0);"&gt;&lt;FONT face="Verdana" size="2"&gt;Compiler User and Reference Guides&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3" style="color:rgb(0,0,0);font-size:9pt;"&gt;for Intel&lt;/SPAN&gt;&lt;SPAN class="fontstyle0" style="font-size:8pt;"&gt;® &lt;/SPAN&gt;&lt;SPAN class="fontstyle3" style="color:rgb(0,0,0);font-size:9pt;"&gt;Compile&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;and in your code , seem the&amp;nbsp;x and &amp;nbsp;out array&amp;nbsp;Ax haven't transferred or allocated on coprocessor, please consider this.&lt;/P&gt;

&lt;P&gt;#pragma offload_transfer target(mic:0) signal(mat.val) in(nrows, nnz) in(mat.row:length(nrows+1) ALLOC RETAIN) in(mat.col:length(nnz) ALLOC RETAIN) in(mat.val:length(nnz) ALLOC RETAIN)&lt;BR /&gt;
	&amp;nbsp;{}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;#pragma offload target(mic:0) wait(mat.val) in(transa, nrows) nocopy(mat.row:length(nrows+1) REUSE RETAIN) nocopy(mat.col:length(nnz) REUSE RETAIN) nocopy(mat.val:length(nnz) REUSE RETAIN) &lt;STRONG&gt;in(x:length(nrows)) out(Ax:length(nrows)) &lt;/STRONG&gt;num_threads( numThrds )&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;BR /&gt;
	​Ying&lt;/P&gt;</description>
    <pubDate>Tue, 27 Mar 2018 06:00:00 GMT</pubDate>
    <dc:creator>Ying_H_Intel</dc:creator>
    <dc:date>2018-03-27T06:00:00Z</dc:date>
    <item>
      <title>Asynchronus offloading problem on the Intel Xeon Phic 7120P coprocessor</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Asynchronus-offloading-problem-on-the-Intel-Xeon-Phic-7120P/m-p/1157999#M27729</link>
      <description>&lt;DIV class="field field-name-body field-type-text-with-summary field-label-hidden"&gt;
	&lt;DIV class="field-items"&gt;
		&lt;DIV class="field-item even"&gt;
			&lt;P&gt;Hello,&lt;/P&gt;

			&lt;P&gt;I coded the Conjugate Gradient algorithm using the MKL library functions on an the Intel Xeon familiy product.&lt;BR /&gt;
				The code's version of the CG runs fine on the Intel Xeon processor (without offloadin); the problem surges when I&lt;BR /&gt;
				try to run the code by offloading some operations (the sparse matrix vector products) to the Intel Xeon Phi 7120P&lt;BR /&gt;
				coprocessor.&lt;/P&gt;

			&lt;P&gt;In line 209 of the cg_mkl_csr_intel.c (that I am attaching) I initiate an asynchronus transfer of the matrix's&lt;BR /&gt;
				arrays while performing some operation until line 237 (of the same file) where the execution waits to receive the&lt;BR /&gt;
				data in order to perform the A * x product. From the cg_execution.txt file that contains the execution of the&lt;BR /&gt;
				cg_mkl_csr_intel.c executable (also attached to this post) I observe that the starting asynchronus data transfer&lt;BR /&gt;
				has no problem, but when the data is needed to perform the product of line 238 (of the cg_mkl_csr_intel.c file)&lt;BR /&gt;
				the following error is generated: "offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)".&lt;BR /&gt;
				I had been unable to identify the cause for this error, hence this post.&lt;/P&gt;

			&lt;P&gt;I compile the cg_mkl_csr_intel.c file with the following command line:&lt;BR /&gt;
				icc -O3 -qopenmp cg_mkl_csr_intel.c -lm -mkl -o cg_mkl_csr_intel&lt;/P&gt;

			&lt;P&gt;I run the executable with:&lt;BR /&gt;
				./cg_mkl_csr_intel msym8.txt 8 1e-12&lt;/P&gt;

			&lt;P&gt;where the msym8.csr is a text file containing a sparse symmetric matrix in CSR format (which I am also attaching&lt;BR /&gt;
				to this post).&lt;/P&gt;

			&lt;P&gt;I appreciate any help you can provide to solve this issue.&lt;/P&gt;

			&lt;P&gt;Kindly regards.&lt;BR /&gt;
				Edoardo&lt;/P&gt;
		&lt;/DIV&gt;
	&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 22 Mar 2018 15:47:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Asynchronus-offloading-problem-on-the-Intel-Xeon-Phic-7120P/m-p/1157999#M27729</guid>
      <dc:creator>Coronado__Edoardo</dc:creator>
      <dc:date>2018-03-22T15:47:24Z</dc:date>
    </item>
    <item>
      <title>Hi Edoardo,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Asynchronus-offloading-problem-on-the-Intel-Xeon-Phic-7120P/m-p/1158000#M27730</link>
      <description>&lt;P&gt;Hi Edoardo,&lt;BR /&gt;
	​&lt;/P&gt;

&lt;P&gt;A: MKL provide mic offload sample under mkl install fodler, could you please try them first and see if they are workable?&lt;BR /&gt;
	&lt;BR /&gt;
	​B:&amp;nbsp;There is simple sample in MKL user guide&lt;/P&gt;

&lt;P&gt;&lt;SPAN class="fontstyle0"&gt;&lt;FONT size="2"&gt;/* Upload A and B to the card, and do not deallocate them after the pragma.&lt;BR /&gt;
	* C is uploaded and downloaded back, but the allocated memory is retained. */&lt;BR /&gt;
	#pragma offload target(mic:0) \&lt;BR /&gt;
	in(A: length(matrix_elements) alloc_if(1) free_if(0)) \&lt;BR /&gt;
	in(B: length(matrix_elements) alloc_if(1) free_if(0)) \&lt;BR /&gt;
	in(transa, transb, N, alpha, beta) \&lt;BR /&gt;
	inout(C:length(matrix_elements) alloc_if(1) free_if(0))&lt;BR /&gt;
	{&lt;BR /&gt;
	sgemm(&amp;amp;transa, &amp;amp;transb, &amp;amp;N, &amp;amp;N, &amp;amp;N, &amp;amp;alpha, A, &amp;amp;N, B, &amp;amp;N,&lt;BR /&gt;
	&amp;amp;beta, C, &amp;amp;N);&lt;BR /&gt;
	}&lt;BR /&gt;
	/* Change C here */&lt;BR /&gt;
	/* Reuse A and B on the card, and upload the new C. Free all the memory on&lt;BR /&gt;
	* the card. */&lt;BR /&gt;
	#pragma offload target(mic:0) \&lt;BR /&gt;
	nocopy(A: length(matrix_elements) alloc_if(0) free_if(1)) \&lt;BR /&gt;
	nocopy(B: length(matrix_elements) alloc_if(0) free_if(1)) \&lt;BR /&gt;
	in(transa, transb, N, alpha, beta) \&lt;BR /&gt;
	inout(C:length(matrix_elements) alloc_if(0) free_if(1))&lt;BR /&gt;
	{&lt;BR /&gt;
	sgemm(&amp;amp;transa, &amp;amp;transb, &amp;amp;N, &amp;amp;N, &amp;amp;N, &amp;amp;alpha, A, &amp;amp;N, B, &amp;amp;N,&lt;BR /&gt;
	&amp;amp;beta, C, &amp;amp;N);&lt;BR /&gt;
	}&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;B&gt;&lt;FONT color="#0860a8" size="3"&gt;See Also&lt;/FONT&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3"&gt;&lt;FONT color="#0860a8" face="Verdana" size="2"&gt;Intel&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN class="fontstyle0" style="color:rgb(8,96,168);font-size:9pt;"&gt;® &lt;/SPAN&gt;&lt;SPAN class="fontstyle3"&gt;&lt;FONT color="#0860a8" face="Verdana" size="2"&gt;Software Documentation Library &lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN class="fontstyle3" style="color:rgb(0,0,0);"&gt;&lt;FONT face="Verdana" size="2"&gt;for Intel&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN class="fontstyle0" style="font-size:9pt;"&gt;® &lt;/SPAN&gt;&lt;SPAN class="fontstyle3" style="color:rgb(0,0,0);"&gt;&lt;FONT face="Verdana" size="2"&gt;Compiler User and Reference Guides&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3" style="color:rgb(0,0,0);font-size:9pt;"&gt;for Intel&lt;/SPAN&gt;&lt;SPAN class="fontstyle0" style="font-size:8pt;"&gt;® &lt;/SPAN&gt;&lt;SPAN class="fontstyle3" style="color:rgb(0,0,0);font-size:9pt;"&gt;Compile&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;and in your code , seem the&amp;nbsp;x and &amp;nbsp;out array&amp;nbsp;Ax haven't transferred or allocated on coprocessor, please consider this.&lt;/P&gt;

&lt;P&gt;#pragma offload_transfer target(mic:0) signal(mat.val) in(nrows, nnz) in(mat.row:length(nrows+1) ALLOC RETAIN) in(mat.col:length(nnz) ALLOC RETAIN) in(mat.val:length(nnz) ALLOC RETAIN)&lt;BR /&gt;
	&amp;nbsp;{}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;#pragma offload target(mic:0) wait(mat.val) in(transa, nrows) nocopy(mat.row:length(nrows+1) REUSE RETAIN) nocopy(mat.col:length(nnz) REUSE RETAIN) nocopy(mat.val:length(nnz) REUSE RETAIN) &lt;STRONG&gt;in(x:length(nrows)) out(Ax:length(nrows)) &lt;/STRONG&gt;num_threads( numThrds )&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;BR /&gt;
	​Ying&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 06:00:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Asynchronus-offloading-problem-on-the-Intel-Xeon-Phic-7120P/m-p/1158000#M27730</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-03-27T06:00:00Z</dc:date>
    </item>
    <item>
      <title>A) The sgemm.c example that</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Asynchronus-offloading-problem-on-the-Intel-Xeon-Phic-7120P/m-p/1158001#M27731</link>
      <description>&lt;P&gt;A) The sgemm.c example that is in the MKL install folders runs fine.&lt;/P&gt;

&lt;P&gt;B) I removed the instruction where I started the asynchronus transfer. Now, I start all transfers on the first sparse matrix-vector product (outside the loop):&lt;/P&gt;

&lt;PRE class="brush:csharp;"&gt;#pragma offload target( mic: 0 ) \
                in( transa, nrows ) \
                in( mat.val: length(nnz)     ALLOC RETAIN ) \
                in( mat.row: length(nrows+1) ALLOC RETAIN ) \
                in( mat.col: length(nnz)     ALLOC RETAIN ) \
                in(       x: length(nrows)   ALLOC FREE   ) \
                out(     Ax: length(nrows)   ALLOC FREE   ) \
                num_threads( numThrds )
{
     mkl_cspblas_dcsrgemv( &amp;amp;transa, &amp;amp;nrows, mat.val, mat.row, mat.col, x, Ax );     //     Ax = A * x
}&lt;/PRE&gt;

&lt;P&gt;on the second product (inside the loop) I have:&lt;/P&gt;

&lt;PRE class="brush:csharp;"&gt;#pragma offload target( mic: 0 ) \
                in( transa, nrows ) \
                nocopy( mat.val: length(nnz)     REUSE RETAIN ) \
                nocopy( mat.row: length(nrows+1) REUSE RETAIN ) \
                nocopy( mat.col: length(nnz)     REUSE RETAIN ) \
                in(           p: length(nrows)   ALLOC FREE   ) \
                out(          v: length(nrows)   ALLOC FREE   ) \
                num_threads( numThrds )
{
     mkl_cspblas_dcsrgemv( &amp;amp;transa, &amp;amp;nrows, mat.val, mat.row, mat.col, p, v ); //      v = A * p
}&lt;/PRE&gt;

&lt;P&gt;I free the allocated memory if the convergence condition is fulfilled and after the loop is completed with :&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#pragma offload_transfer target( mic: 0) \
                         nocopy( mat.val: length(nnz)     REUSE FREE ) \
                         nocopy( mat.row: length(nrows+1) REUSE FREE ) \
                         nocopy( mat.col: length(nnz)     REUSE FREE )&lt;/PRE&gt;

&lt;P&gt;As you can see I am allocating and deallocating all requested memory on the device, and I am still&amp;nbsp; having the same error message:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)&lt;/PRE&gt;

&lt;P&gt;Again, I am compiling the source file with:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;icc -O3 cg_mkl_csr_intel.c -lm -mkl -o cg_mkl_csr_intel
&lt;/PRE&gt;

&lt;P&gt;and running the executable with:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;./cg_mkl_csr_intel msym8.txt 10 1e-12&lt;/PRE&gt;

&lt;P&gt;I am attaching the source file (cg_mkl_csr_intel.c) and the matrix file (msym8.txt).&lt;/P&gt;

&lt;P&gt;Regards&lt;/P&gt;

&lt;P&gt;Edoardo&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Apr 2018 15:39:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Asynchronus-offloading-problem-on-the-Intel-Xeon-Phic-7120P/m-p/1158001#M27731</guid>
      <dc:creator>Coronado__Edoardo</dc:creator>
      <dc:date>2018-04-13T15:39:43Z</dc:date>
    </item>
  </channel>
</rss>

