<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fast 3D DCT for N~1M in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888410#M10179</link>
    <description>Hi, Alexander, is there openmp version of such trig_transform, or how can I use such trig_transform&lt;BR /&gt;associated with openmp? I have 3D variable and 2D of that variable need to be trig_transformed. So I want to&lt;BR /&gt;use such subroutines with openmp.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Could you please give me an example? I appreciate that. I tried many forms but I only succeed in without openmp case.&lt;BR /&gt;&lt;BR /&gt;thanks!&lt;BR /&gt;</description>
    <pubDate>Wed, 02 Jun 2010 23:52:30 GMT</pubDate>
    <dc:creator>zgsun</dc:creator>
    <dc:date>2010-06-02T23:52:30Z</dc:date>
    <item>
      <title>Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888405#M10174</link>
      <description>I am writing a 3D acoustics simulator for which I need to repeatedly compute a DCT (more precisely, DCT-II) on a 3D array of size in the range 128x128x128. I have been exploring all the options I have. Earlier, the only good option I had was FFTW, which unfortunately, doesn't SSE optimize its DCT functions. But now, MKL 10.0 has DCT support which is really great. But as I went through the docs, I didn't see any functions for doing 3D DCT transforms in the Trignometric toolset. All I saw were functions for doing 1D transformations. Of course, since internally the FFT interface is being used, and MKL's FFT works for 3D, it is possible to do a 3D transform too.&lt;BR /&gt;&lt;BR /&gt;So, two questions:&lt;BR /&gt;1. Does MKL have an API for doing fast 3D DCTs? &lt;BR /&gt;2. A (real-&amp;gt;real) DCT should ideally be about 4x faster than a complex-&amp;gt;complex FFT on the same number of elements. Does MKL exhibit this behavior? How efficiently does it reduce the DCT to an FFT.&lt;BR /&gt;&lt;BR /&gt;thanks a lot!!&lt;BR /&gt;-Nikunj.&lt;BR /&gt;--&lt;BR /&gt;</description>
      <pubDate>Sun, 25 Nov 2007 09:07:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888405#M10174</guid>
      <dc:creator>nikunj1729</dc:creator>
      <dc:date>2007-11-25T09:07:03Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888406#M10175</link>
      <description>I was looking for the same, but I wasn't able to find anything on the manual. Since the poisson solver works for 3d and it uses the DCT, it should be trivial to implement a working 3D DCT (provided the source code, which is not available :-)). Until then, I'll keep on using FFTW (which supports SSE btw).&lt;BR /&gt;</description>
      <pubDate>Wed, 27 Feb 2008 21:20:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888406#M10175</guid>
      <dc:creator>mambru37</dc:creator>
      <dc:date>2008-02-27T21:20:14Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888407#M10176</link>
      <description>Really, does FFTW SSE-optimize DCT as well? I know the older versions (when I had posted the message) didn't have SSE optimizations for DCT and they had mentioned this in the docs. There was some talk of putting it in for the R2R transforms, but I don't know if its actually been done.&lt;BR /&gt;&lt;BR /&gt;thanks for the reply!&lt;BR /&gt;-Nikunj.&lt;BR /&gt;--&lt;BR /&gt;</description>
      <pubDate>Wed, 27 Feb 2008 21:29:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888407#M10176</guid>
      <dc:creator>nikunj1729</dc:creator>
      <dc:date>2008-02-27T21:29:17Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888408#M10177</link>
      <description>Hi,&lt;BR /&gt;Is there any updates related to this thread question?&lt;BR /&gt;Is the 3D DCT available in MKL? If yes, please give an example.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Andriy&lt;BR /&gt;</description>
      <pubDate>Thu, 28 May 2009 04:04:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888408#M10177</guid>
      <dc:creator>amyron</dc:creator>
      <dc:date>2009-05-28T04:04:34Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888409#M10178</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/419744"&gt;amyron&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Hi,&lt;BR /&gt;Is there any updates related to this thread question?&lt;BR /&gt;Is the 3D DCT available in MKL? If yes, please give an example.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Andriy&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;
&lt;P&gt;Hi Andriy,&lt;/P&gt;
&lt;P&gt;3D DCT is not in MKL exactly but there is a variant to create it by combination of 1D DCT, for example:&lt;/P&gt;
&lt;P&gt;tt_type = MKL_COSINE_TRANSFORM;&lt;BR /&gt;d_init_trig_transform(&amp;amp;n_x,&amp;amp;tt_type_x,ipar_x,dpar_x,&amp;amp;ir);&lt;BR /&gt;d_init_trig_transform(&amp;amp;n_y,&amp;amp;tt_type_y,ipar_y,dpar_y,&amp;amp;ir);&lt;BR /&gt;d_init_trig_transform(&amp;amp;n_z,&amp;amp;tt_type_z,ipar_z,dpar_z,&amp;amp;ir);&lt;BR /&gt;d_commit_trig_transform(f,&amp;amp;handle_x,ipar_x,dpar_x,&amp;amp;ir);&lt;BR /&gt;d_commit_trig_transform(f,&amp;amp;handle_y,ipar_y,dpar_y,&amp;amp;ir);&lt;BR /&gt;d_commit_trig_transform(f,&amp;amp;handle_z,ipar_z,dpar_z,&amp;amp;ir);&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Loop over j,k {d_backward_trig_transform(f(:,j,k),&amp;amp;handle_x,ipar_x,dpar_x,&amp;amp;ir);}&lt;BR /&gt;Loop over i,k {d_backward_trig_transform(f(i,:,k),&amp;amp;handle_y,ipar_y,dpar_y,&amp;amp;ir);}&lt;BR /&gt;Loop over i,j {d_backward_trig_transform(f(i,j,:),&amp;amp;handle_z,ipar_z,dpar_z,&amp;amp;ir);}&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;free_trig_transform(&amp;amp;handle_x,ipar_x,&amp;amp;ir);&lt;BR /&gt;free_trig_transform(&amp;amp;handle_y,ipar_y,&amp;amp;ir);&lt;BR /&gt;free_trig_transform(&amp;amp;handle_z,ipar_z,&amp;amp;ir);&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;But if you want to see 3D DCT functionality in MKL you can file feature request at &lt;A href="https://premier.intel.com/"&gt;https://premier.intel.com&lt;/A&gt;&lt;BR /&gt;With best regards,&lt;BR /&gt;Alexander&lt;/P&gt;</description>
      <pubDate>Thu, 28 May 2009 04:51:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888409#M10178</guid>
      <dc:creator>Alexander_K_Intel2</dc:creator>
      <dc:date>2009-05-28T04:51:36Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888410#M10179</link>
      <description>Hi, Alexander, is there openmp version of such trig_transform, or how can I use such trig_transform&lt;BR /&gt;associated with openmp? I have 3D variable and 2D of that variable need to be trig_transformed. So I want to&lt;BR /&gt;use such subroutines with openmp.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Could you please give me an example? I appreciate that. I tried many forms but I only succeed in without openmp case.&lt;BR /&gt;&lt;BR /&gt;thanks!&lt;BR /&gt;</description>
      <pubDate>Wed, 02 Jun 2010 23:52:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888410#M10179</guid>
      <dc:creator>zgsun</dc:creator>
      <dc:date>2010-06-02T23:52:30Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888411#M10180</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;There is not 2D version
of TT with OpenMP parallelization but one can construct it from 1D TT himself. For
example:&lt;/P&gt;

&lt;P&gt;tt_type = MKL_COSINE_TRANSFORM;&lt;/P&gt;&lt;P&gt;d_init_trig_transform(&amp;amp;n_x,&amp;amp;tt_type_x,ipar_x,dpar_x,&amp;amp;ir);&lt;/P&gt;&lt;P&gt;d_init_trig_transform(&amp;amp;n_y,&amp;amp;tt_type_y,ipar_y,dpar_y,&amp;amp;ir);&lt;/P&gt;&lt;P&gt;
ipar_x[9] = number_of_threads;&lt;/P&gt;

&lt;P&gt;ipar_y[9] =
number_of_threads;&lt;/P&gt;

&lt;P&gt;d_commit_trig_transform(f,&amp;amp;handle_x,ipar_x,dpar_x,&amp;amp;ir);&lt;BR /&gt;
d_commit_trig_transform(f,&amp;amp;handle_y,ipar_y,dpar_y,&amp;amp;ir);&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;P&gt;#OMP parallel for&lt;/P&gt;

&lt;P&gt;Loop over j
{d_backward_trig_transform(f(:,j,k),&amp;amp;handle_x,ipar_x,dpar_x,&amp;amp;ir);}&lt;/P&gt;

&lt;P&gt;#OMP end parallel&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;

&lt;P&gt;#OMP parallel for&lt;BR /&gt;
&lt;PLACE st="on"&gt;Loop&lt;/PLACE&gt; over i
{d_backward_trig_transform(f(i,:,k),&amp;amp;handle_y,ipar_y,dpar_y,&amp;amp;ir);}&lt;/P&gt;

&lt;P&gt;#OMP end parallel&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;P&gt;free_trig_transform(&amp;amp;handle_x,ipar_x,&amp;amp;ir);&lt;BR /&gt;
free_trig_transform(&amp;amp;handle_y,ipar_y,&amp;amp;ir);&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;



&lt;P&gt;Ipar[9] Specifies the number of OpenMP threads to run TT
routines in the OpenMP&lt;/P&gt;

&lt;P&gt;environment of the Poisson Library.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Is this variant suitable for you or not?&lt;/P&gt;

&lt;P&gt;With best regards,&lt;/P&gt;

&lt;P&gt;Alexander Kalinkin&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jun 2010 01:54:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888411#M10180</guid>
      <dc:creator>Alexander_K_Intel2</dc:creator>
      <dc:date>2010-06-03T01:54:03Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888412#M10181</link>
      <description>thank you so much! &lt;BR /&gt;&lt;BR /&gt;I really make it work with 8 threads, but, only with variable f(33,33,10) , for f of other size, the results&lt;BR /&gt;are incorrect, strange. Could you please help me with some hints?&lt;BR /&gt;&lt;BR /&gt;At the same time, I notice that the overhead problem for trig_transform is serious---with 8 threads, it&lt;BR /&gt;is even slower than using single threads! Surpringly! Maybe I make something wrong here?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;thanks, zhigang</description>
      <pubDate>Thu, 03 Jun 2010 07:07:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888412#M10181</guid>
      <dc:creator>zgsun</dc:creator>
      <dc:date>2010-06-03T07:07:07Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888413#M10182</link>
      <description>Hi everybody,&lt;DIV&gt;are you interesingin C examples only or in F90 API also?&lt;/DIV&gt;&lt;DIV&gt;--Gennady&lt;/DIV&gt;</description>
      <pubDate>Thu, 03 Jun 2010 07:21:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888413#M10182</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-06-03T07:21:27Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888414#M10183</link>
      <description>zhigang,&lt;DIV&gt;it's a really strange situation. Is your test work correct on 1 threads? And what processors do you use?&lt;/DIV&gt;&lt;DIV&gt;With best regards,&lt;/DIV&gt;&lt;DIV&gt;Alexander Kalinkin&lt;/DIV&gt;</description>
      <pubDate>Thu, 03 Jun 2010 07:24:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888414#M10183</guid>
      <dc:creator>Alexander_K_Intel2</dc:creator>
      <dc:date>2010-06-03T07:24:13Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888415#M10184</link>
      <description>Hi, Alexander, the results with 1 threads are just correct. &lt;BR /&gt;The machine model I am using is: Intel Xeon CPU E5430 @ 2.66GHz&lt;BR /&gt;&lt;BR /&gt;In fact, I was delighted by the performace of trig_transform at first. Then I want to see what is its performance&lt;BR /&gt;with openMP. I appreciate that if you can help me with this. I tried in many ways but failed.&lt;BR /&gt;&lt;BR /&gt;thanks!&lt;BR /&gt;</description>
      <pubDate>Thu, 03 Jun 2010 15:27:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888415#M10184</guid>
      <dc:creator>zgsun</dc:creator>
      <dc:date>2010-06-03T15:27:59Z</dc:date>
    </item>
    <item>
      <title>Re: Fast 3D DCT for N~1M</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888416#M10185</link>
      <description>Hizhigang,&lt;DIV&gt;I've checked testcase prepared myself and it's work correctly. Could you attach here your example? And it would be really fine if this example notextremalbig!&lt;/DIV&gt;&lt;DIV&gt;With best regards,&lt;/DIV&gt;&lt;DIV&gt;Alexander Kalinkin&lt;/DIV&gt;</description>
      <pubDate>Wed, 09 Jun 2010 00:38:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-3D-DCT-for-N-1M/m-p/888416#M10185</guid>
      <dc:creator>Alexander_K_Intel2</dc:creator>
      <dc:date>2010-06-09T00:38:33Z</dc:date>
    </item>
  </channel>
</rss>

