<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: OpenMP &amp; KAP preprocessor in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933245#M2586</link>
    <description>I forgot ... when instead I use the fortran subroutine parallelized by KAP with the OpenMP directive and compile with -openmp option, the activities of the two CPUs confirm that the code runs with two threads each one on a processore:&lt;BR /&gt;&lt;BR /&gt;top - 21:51:55 up 29 days,  2:44,  2 users,  load average: 1.31, 0.52, 0.42 &lt;BR /&gt;Tasks:  62 total,   2 running,  60 sleeping,   0 stopped,   0 zombie &lt;BR /&gt; Cpu0 : 99.7% us,  0.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si &lt;BR /&gt; Cpu1 : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si &lt;BR /&gt; Cpu2 :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si &lt;BR /&gt; Cpu3 :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si &lt;BR /&gt;Mem:   3106328k total,  2433712k used,   672616k free,   127052k buffers &lt;BR /&gt;Swap:  6193016k total,    10172k used,  6182844k free,  1664648k cached &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;bye&lt;BR /&gt;Filippo</description>
    <pubDate>Fri, 29 Oct 2004 01:10:22 GMT</pubDate>
    <dc:creator>denaro</dc:creator>
    <dc:date>2004-10-29T01:10:22Z</dc:date>
    <item>
      <title>OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933236#M2577</link>
      <description>Hi all,&lt;BR /&gt;I was wondering what happened to KAP products since Intel acquired KAI. Does still exist the kf90 for Intel CPUs'?&lt;BR /&gt;&lt;BR /&gt;I have my own code of CFD that, owing to my old KAP preprocessor working on a Compaq Alpha, is written with OpenMP directive. On the two-processors Alpha I have very good speed-up (almost 1.8) bu when I compile the code on a two-processor Xeon with the options:&lt;BR /&gt;&lt;BR /&gt;-fpp -tpp7 -xN -axN -O3 -ipo -align -openmp&lt;BR /&gt;&lt;BR /&gt;I get the parallel code running slower than the sequential !! Should I suppose that the code from KAP on Alpha does not meet the requirement for Intel Cpu? &lt;BR /&gt;&lt;BR /&gt;This is my installation envirenment for the Xeon-based machine:&lt;BR /&gt;&lt;BR /&gt;...........................&lt;BR /&gt;[les@venere provasor3d]$ uname -a&lt;BR /&gt;Linux venere 2.6.3-7mdk-p3-smp-64GB #1 SMP Wed Mar 17 15:34:39 CET 2004 i686 unknown unknown GNU/Linux&lt;BR /&gt;[les@venere provasor3d]$ &lt;BR /&gt;&lt;BR /&gt;...........................&lt;BR /&gt;&lt;BR /&gt;[les@venere provasor3d]$ ifort -V&lt;BR /&gt;Intel Fortran Compiler for 32-bit applications, Version 8.0   Build 20031016Z Package ID: l_fc_p_8.0.035&lt;BR /&gt;Copyright (C) 1985-2003 Intel Corporation.  All rights reserved.&lt;BR /&gt;...........................&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;please, anyone experienced something similar?&lt;BR /&gt;Thanks</description>
      <pubDate>Mon, 25 Oct 2004 23:09:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933236#M2577</guid>
      <dc:creator>denaro</dc:creator>
      <dc:date>2004-10-25T23:09:38Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933237#M2578</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Hello,&lt;BR /&gt;If your OpenMP code had a 1.8 speedup on a dual-Alpha, I would expect to see good speedup on a dual-Xeon. The -xN optimization could be interfering with parallelization. Try removing the -xN and -axN options (by the way, the -x and -ax options are mutually exclusive) and see if that helps parallelism at the expense of serial performance. Either way, I suggest contacting &lt;A href="https://shale.intel.com/registrationcenter/protected/Default.asp" target="_blank"&gt;Intel Premier Support&lt;/A&gt; for tech support questions like this.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;The Intel compilers print a message whenever a loop is vectorized or parallelized. What message does the compiler print for your source file when you compile with both -xN and -openmp?&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;The KAI KAP products are no longer for sale. However, the Intel compilers use the same OpenMP implementation as the old KAP products. In fact, the library is still called Guide.&lt;/DIV&gt;
&lt;DIV&gt;Best regards,&lt;BR /&gt;Henry&lt;/DIV&gt;</description>
      <pubDate>Wed, 27 Oct 2004 03:42:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933237#M2578</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2004-10-27T03:42:23Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933238#M2579</link>
      <description>Dear Henry,&lt;BR /&gt;thanks for the response I will try all tests without the options you have indicated and let you know the results. &lt;BR /&gt;&lt;BR /&gt;What is real strange is that the ifort with the option -parallel does not recognize as parallelizable the same loops (written in the original serial fortran code) that the KAP automatically parallelizes on the Alpha ... Therefore, I used the transformed sources (the *.cmp.f files) provided by KAP and containing the OpenMP directive. I compiled them with the -openmp options and ifort thereafter recognize the parallel region. However the execution times are greater than those of the sequential run... &lt;BR /&gt;It is possible to obtain, with some ifort options, the transformed sources with OpenMP directives ? I could compare the two sources.&lt;BR /&gt;sincerely&lt;BR /&gt;Filippo</description>
      <pubDate>Wed, 27 Oct 2004 04:20:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933238#M2579</guid>
      <dc:creator>denaro</dc:creator>
      <dc:date>2004-10-27T04:20:52Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933239#M2580</link>
      <description>I'm curious about your notation that you are running the 64GB kernel.  If you are needing that in order to run parallel, maybe the overhead of the PAE addressing is overcoming the benefit of parallelization.  I have no experience with that, as usually we don't expect any chance of good 32-bit performance with more than a 4GB kernel.</description>
      <pubDate>Wed, 27 Oct 2004 06:46:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933239#M2580</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2004-10-27T06:46:45Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933240#M2581</link>
      <description>&lt;DIV&gt;Hi Filippo,&lt;/DIV&gt;
&lt;DIV&gt;Unfortunately, there's no compiler option to output the transformed code from automatic parallelization.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Henry&lt;/DIV&gt;</description>
      <pubDate>Thu, 28 Oct 2004 04:31:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933240#M2581</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2004-10-28T04:31:36Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933241#M2582</link>
      <description>Hi,&lt;BR /&gt;I see too, I think Intel should add this option ... everyone working  on parallel code would see the effects of the transformed source and  possibly working on it.&lt;BR /&gt;&lt;BR /&gt;However, I made some other tests without the xN option and the performances actually improved. The -O3 optimization does not influence the execution. Anyway, the performances are still far from what I get on the Alpha ... some of the differences should be due to the 64-bit OS ... but I supposed that Xeon could be better ..&lt;BR /&gt;&lt;BR /&gt;Best Regards&lt;BR /&gt;Filippo</description>
      <pubDate>Thu, 28 Oct 2004 04:43:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933241#M2582</guid>
      <dc:creator>denaro</dc:creator>
      <dc:date>2004-10-28T04:43:46Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933242#M2583</link>
      <description>Dear all&lt;BR /&gt;&lt;BR /&gt;after some work I have completed several tests about execution time on Alpha and Xeon two-processor. I hope someone could be interested in reading the results and give me some suggestions.&lt;BR /&gt;&lt;BR /&gt;Rather than on all my code, I concentrated all tests on a simple iterative solver for linear system derived from finite difference discretization of a three-dimensional elliptic equation (hope someone has experience with this). I attach the tar file with the fortran sources and the .ini file, in case someone want to repeat my experiences.&lt;BR /&gt;Again, performances of Intel compiler is someway strange.. it can not parallelize what I am sure must be parallelized and KAP actually does!&lt;BR /&gt;Following are the reports of the tests organized for Alpha and Xeon.&lt;BR /&gt;I hope some is interested in this topic ...&lt;BR /&gt;With Regards&lt;BR /&gt;Filippo &lt;BR /&gt;&lt;BR /&gt;####################  XEON  ###################&lt;BR /&gt;_____________________________________________________________________________________&lt;BR /&gt;&lt;BR /&gt;--- Execution times on Xeon two-processors.&lt;BR /&gt;The original sources (i.e. the  *.f files) are compiled with ifort and auto-parallelizer:&lt;BR /&gt;&lt;BR /&gt;[les@venere provasor3d]$ make&lt;BR /&gt;ifort -fpp -tpp7 -O3 -ipo -align -parallel  -par_threshold0  -c  -I/dati/provasor3d/ calcphi3d.f&lt;BR /&gt;ifort -I/dati/provasor3d/ -fpp -tpp7 -O3 -ipo -align -parallel  -par_threshold0  provasor3d.f -o provasor3d 
&lt;BR /&gt;calcphi3d.o &lt;BR /&gt;IPO: using IR for /home/les/tmp/ifortUPvFYg.o&lt;BR /&gt;IPO: using IR for calcphi3d.o&lt;BR /&gt;IPO: performing multi-file optimizations&lt;BR /&gt;provasor3d.f(127) : (col. 2) remark: LOOP WAS AUTO-PARALLELIZED.&lt;BR /&gt;provasor3d.f(127) : (col. 26) remark: LOOP WAS AUTO-PARALLELIZED.&lt;BR /&gt;[les@venere provasor3d]$ &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;1) sequential run, &lt;BR /&gt;by means of variable nsc=1 in the file sor.ini, it is called the sequential subroutine 2&lt;BR /&gt;that compiler ifort is unable to parallelize (the same as KAP due to dependendances), i.e. there are no &lt;BR /&gt;parallel directive.&lt;BR /&gt;&lt;BR /&gt;[les@venere provasor3d]$ time -p provasor3d&lt;BR /&gt; nx , ny , nz =          40          40          40&lt;BR /&gt; n , m , l =          41          42          41&lt;BR /&gt; 1 - SOR sequenziale con coef.matrice scalari;&lt;BR /&gt; 2 - SOR sequenziale con coef.matrice vettori;&lt;BR /&gt; 3 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.scalari&lt;BR /&gt; 4 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.vettori&lt;BR /&gt; 5 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.scalari&lt;BR /&gt; 6 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.vettori&lt;BR /&gt;  Si sceglie:            1&lt;BR /&gt; SOR sequenziale - coef. scalari x, z&lt;BR /&gt; Chiamata routine 2&lt;BR /&gt; N. iterazioni per convergenza ellittica:        4847&lt;BR /&gt;real 7.93&lt;BR /&gt;user 7.68&lt;BR /&gt;sys 0.24&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;2) parallel run, &lt;BR /&gt;by means of variable nsc=5 in the file sor.ini, it is called the subroutine 9&lt;BR /&gt;that compiler KAP was able to parallelize (due to white/black-red coloring) but ifort CAN NOT!&lt;BR /&gt;&lt;BR /&gt;[les@venere provasor3d]$ time -p provasor3d&lt;BR /&gt; nx , ny , nz =          40          40          40&lt;BR /&gt; n , m , l =          41          42          41&lt;BR /&gt; 1 - SOR sequenziale con coef.matrice scalari;&lt;BR /&gt; 2 - SOR sequenziale con coef.matrice vettori;&lt;BR /&gt; 3 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.scalari&lt;BR /&gt; 4 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.vettori&lt;BR /&gt; 5 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.scalari&lt;BR /&gt; 6 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.vettori&lt;BR /&gt;  Si sceglie:            5&lt;BR /&gt; SOR parallelo (i,j - b/r) (k-zebra) -coef.scalari x,y&lt;BR /&gt; Chiamata routine 9&lt;BR /&gt; N. iterazioni per convergen
za ellittica:        4845&lt;BR /&gt;real 9.54&lt;BR /&gt;user 9.30&lt;BR /&gt;sys 0.24&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;__________________________________________________________________________________________&lt;BR /&gt;&lt;BR /&gt;--- Execution times on Xeon two-processors.&lt;BR /&gt;The sources generated from KAP (i.e. the  *.cmp.f files) are compiled with ifort without &lt;BR /&gt;auto-parallelizer but with the -openmp option:&lt;BR /&gt;&lt;BR /&gt;[les@venere provasor3d]$ make&lt;BR /&gt;ifort -fpp -tpp7 -O3 -ipo -align -openmp -openmp  -c  -I/dati/provasor3d/ calcphi3d.cmp.f&lt;BR /&gt;ifort -I/dati/provasor3d/ -fpp -tpp7 -O3 -ipo -align -openmp  provasor3d.cmp.f -o provasor3d 
&lt;BR /&gt;calcphi3d.cmp.o &lt;BR /&gt;IPO: using IR for /home/les/tmp/ifortMhqYPF.o&lt;BR /&gt;IPO: using IR for calcphi3d.cmp.o&lt;BR /&gt;IPO: performing multi-file optimizations&lt;BR /&gt;provasor3d.cmp.f(159) : (col. 6) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(156) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(234) : (col. 6) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(231) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(332) : (col. 6) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(331) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(415) : (col. 6) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(413) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(442) : (col. 6) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(440) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(468) : (col. 6) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.&lt;BR /&gt;provasor3d.cmp.f(466) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.&lt;BR /&gt;calcphi3d.cmp.f(4814) : (col. 6) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.&lt;BR /&gt;calcphi3d.cmp.f(4808) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.&lt;BR /&gt; ...    truncated for the sake of brevity  ...&lt;BR /&gt; ...   ....                    ....&lt;BR /&gt;calcphi3d.cmp.f(4045) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.&lt;BR /&gt;[les@venere provasor3d]$ &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;3) same calling as 1)&lt;BR /&gt;by means of variable nsc=1 in the file sor.ini, it is called the subroutine 2 that is sequential&lt;BR /&gt;&lt;BR /&gt;[les@venere provasor3d]$ time -p provasor3d&lt;BR /&gt; nx , ny , nz =          40          40          40&lt;BR /&gt; n , m , l =          41          42          41&lt;BR /&gt; 1 - SOR sequenziale con coef.matrice scalari;&lt;BR /&gt; 2 - SOR sequenziale con coef.matrice vettori;&lt;BR /&gt; 3 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.scalari&lt;BR /&gt; 4 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.vettori&lt;BR /&gt; 5 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.scalari&lt;BR /&gt; 6 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.vettori&lt;BR /&gt;  Si sceglie:            1&lt;BR /&gt; SOR sequenziale - coef. scalari x, z&lt;BR /&gt; Chiamata routine 2&lt;BR /&gt; N. iterazioni per convergenza ellittica:        4847&lt;BR /&gt;real 7.14&lt;BR /&gt;user 7.07&lt;BR /&gt;sys 0.23&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;4) same calling as 2)&lt;BR /&gt;by means of variable nsc=5 in the file sor.ini, it is called the subroutine 9&lt;BR /&gt;with the white/black-red couloring and OpenMp directives&lt;BR /&gt;&lt;BR /&gt;[les@venere provasor3d]$ time -p provasor3d&lt;BR /&gt; nx , ny , nz =          40          40          40&lt;BR /&gt; n , m , l =          41          42          41&lt;BR /&gt; 1 - SOR sequenziale con coef.matrice scalari;&lt;BR /&gt; 2 - SOR sequenziale con coef.matrice vettori;&lt;BR /&gt; 3 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.scalari&lt;BR /&gt; 4 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.vettori&lt;BR /&gt; 5 - SOR
 parallelo (i,j - b/r)  (k- zebra) - coef.scalari&lt;BR /&gt; 6 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.vettori&lt;BR /&gt;  Si sceglie:            5&lt;BR /&gt; SOR parallelo (i,j - b/r) (k-zebra) -coef.scalari x,y&lt;BR /&gt; Chiamata routine 9&lt;BR /&gt; N. iterazioni per convergenza ellittica:        4845&lt;BR /&gt;real 6.43&lt;BR /&gt;user 12.20&lt;BR /&gt;sys 0.33&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;#################  Alpha ###############&lt;BR /&gt;_____________________________________________________________________________________&lt;BR /&gt;&lt;BR /&gt;--- Execution times on Alpha two-processors compiled with KAP and auto-parallelizer:&lt;BR /&gt;&lt;BR /&gt;kf90 -I/utenti/denaro/provasor3d/ -fkapargs='-conc' -O5 -omp -fast  -tune ev6 -a&lt;BR /&gt;rch host -assume nounderscore  provasor3d.f -o provasor3d  calcphi3d.o&lt;BR /&gt; KAP/Tru64_U_F90   4.4 k340504 20010517         28-Oct-2004   11:18:27&lt;BR /&gt;KAP/Tru64_U_F90 4.4 k340504 20010517      : 0 errors in file provasor3d.f&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;1) sequential run, &lt;BR /&gt;by means of variable nsc=1 in the file sor.ini, it is called the sequential subroutine 2&lt;BR /&gt;that compiler KAP is unable to parallelize (due to dependendances), i.e. there are no &lt;BR /&gt;OpenMP directive.&lt;BR /&gt;&lt;BR /&gt;ds20dia.ing.unina2.it&amp;gt; time provasor3d&lt;BR /&gt; nx , ny , nz =          40          40          40&lt;BR /&gt; n , m , l =          41          42          41&lt;BR /&gt; 1 - SOR sequenziale con coef.matrice scalari;&lt;BR /&gt; 2 - SOR sequenziale con coef.matrice vettori;&lt;BR /&gt; 3 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.scalari&lt;BR /&gt; 4 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.vettori&lt;BR /&gt; 5 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.scalari&lt;BR /&gt; 6 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.vettori&lt;BR /&gt;  Si sceglie:            1&lt;BR /&gt; SOR sequenziale - coef. scalari x, z&lt;BR /&gt; Chiamata routine 2&lt;BR /&gt; N. iterazioni per convergenza ellittica:        4847&lt;BR /&gt;&lt;BR /&gt;real   16.8&lt;BR /&gt;user   16.1&lt;BR /&gt;sys    0.7&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;2) parallel run, &lt;BR /&gt;by means of variable nsc=5 in the file sor.ini, it is called the parallel subroutine 9&lt;BR /&gt;that compiler KAP is able to parallelize (due to white/black-red couloring), i.e. there are &lt;BR /&gt;OpenMP directive inserted in the source.&lt;BR /&gt;&lt;BR /&gt;ds20dia.ing.unina2.it&amp;gt; time provasor3d&lt;BR /&gt; nx , ny , nz =          40          40          40&lt;BR /&gt; n , m , l =          41          42          41&lt;BR /&gt; 1 - SOR sequenziale con coef.matrice scalari;&lt;BR /&gt; 2 - SOR sequenziale con coef.matrice vettori;&lt;BR /&gt; 3 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.scalari&lt;BR /&gt; 4 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.vettori&lt;BR /&gt; 5 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.scalari&lt;BR /&gt; 6 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.vettori&lt;BR /&gt;  Si sceglie:            5&lt;BR /&gt; SOR parallelo (i,j - b/r) (k-zebra) -coef.scalari x,y&lt;BR /&gt; Chiamata routine 9&lt;BR /&gt; N. iterazioni per convergenza ellittica:        4845&lt;BR /&gt;&lt;BR /&gt;real   7.2&lt;BR /&gt;user   12.4&lt;BR /&gt;sys    0.7&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;__________________________________________________________________________________________&lt;BR /&gt;&lt;BR /&gt;--- Execution times on Alpha two-processors compiled with KAP without auto-parallelizer:&lt;BR /&gt;&lt;BR /&gt;kf90  -O5 -fast  -tune ev6 -arch host -assume nounderscore  -c  -I/utenti/denaro&lt;BR /&gt;/provasor3d/ calcphi3d.f&lt;BR /&gt; KAP/Tru64_U_F90   4.4 k340504 20010517         28-Oct-2004   11:46:54&lt;BR /&gt;KAP/Tru64_U_F90 4.4 k340504 20010517      : 0 errors in file calcphi3d.f&lt;BR /&gt;kf90 -I/utenti/denaro/provasor3d/ -O5 -fast  -tune ev6 -arch host -assume nounde&lt;BR /&gt;rscore  provasor3d.f -o provasor3d  calcphi3d.o&lt;BR /&gt; KAP/Tru64_U_F90   4.4 k340504 20010517         28-Oct-2004   11:47:19&lt;BR /&gt;KAP/Tru64_U_F90 4.4 k340504 20010517      : 0 errors in fil
e provasor3d.f&lt;BR /&gt;&lt;BR /&gt;3) same calling as 1)&lt;BR /&gt;by means of variable nsc=1 in the file sor.ini, it is called the subroutine 2&lt;BR /&gt;&lt;BR /&gt;ds20dia.ing.unina2.it&amp;gt; time provasor3d&lt;BR /&gt; nx , ny , nz =          40          40          40&lt;BR /&gt; n , m , l =          41          42          41&lt;BR /&gt; 1 - SOR sequenziale con coef.matrice scalari;&lt;BR /&gt; 2 - SOR sequenziale con coef.matrice vettori;&lt;BR /&gt; 3 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.scalari&lt;BR /&gt; 4 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.vettori&lt;BR /&gt; 5 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.scalari&lt;BR /&gt; 6 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.vettori&lt;BR /&gt;  Si sceglie:            1&lt;BR /&gt; SOR sequenziale - coef. scalari x, z&lt;BR /&gt; Chiamata routine 2&lt;BR /&gt; N. iterazioni per convergenza ellittica:        4847&lt;BR /&gt;&lt;BR /&gt;real   16.5&lt;BR /&gt;user   15.9&lt;BR /&gt;sys    0.6&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;4) same calling as 2)&lt;BR /&gt;by means of variable nsc=5 in the file sor.ini, it is called the subroutine 9&lt;BR /&gt;with the white/black-red couloring without parallel loops&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;ds20dia.ing.unina2.it&amp;gt; time provasor3d&lt;BR /&gt; nx , ny , nz =          40          40          40&lt;BR /&gt; n , m , l =          41          42          41&lt;BR /&gt; 1 - SOR sequenziale con coef.matrice scalari;&lt;BR /&gt; 2 - SOR sequenziale con coef.matrice vettori;&lt;BR /&gt; 3 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.scalari&lt;BR /&gt; 4 - SOR parallelo (i,j - b/r)  (k- tutti) - coef.vettori&lt;BR /&gt; 5 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.scalari&lt;BR /&gt; 6 - SOR parallelo (i,j - b/r)  (k- zebra) - coef.vettori&lt;BR /&gt;  Si sceglie:            5&lt;BR /&gt; SOR parallelo (i,j - b/r) (k-zebra) -coef.scalari x,y&lt;BR /&gt; Chiamata routine 9&lt;BR /&gt; N. iterazioni per convergenza ellittica:        4845&lt;BR /&gt;&lt;BR /&gt;real   10.7&lt;BR /&gt;user   10.1&lt;BR /&gt;sys    0.6</description>
      <pubDate>Thu, 28 Oct 2004 18:20:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933242#M2583</guid>
      <dc:creator>denaro</dc:creator>
      <dc:date>2004-10-28T18:20:58Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933243#M2584</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Hello Filippo,&lt;/DIV&gt;
&lt;DIV&gt;The compiler will give a report showing which loops were parallelized and explanations why some loops could not be parallelized. Just use the -par_report option with the -parallel option.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;I encourage you to submit a feature request to Premier Support saying that you would like an option to get the transformed code from automatic parallelization.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Best regards,&lt;/DIV&gt;
&lt;DIV&gt;Henry&lt;/DIV&gt;</description>
      <pubDate>Thu, 28 Oct 2004 21:19:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933243#M2584</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2004-10-28T21:19:43Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933244#M2585</link>
      <description>Dear Henry,&lt;BR /&gt;I already used the -par_report option and I see that the loops are considered not candidate for parallelization. On the other hand the I programmed the code in a way in which dependences are eliminated and actually KAP recognizes this and performs parallelization. I am unable to understand the problem...&lt;BR /&gt;This is the report of the subroutine ifort can not parallelize:&lt;BR /&gt;&lt;BR /&gt;[les@venere ocean]$ ifort -c -O3 -parallel -par_report3 calcphi.f&lt;BR /&gt;   procedure: calcphi_seq_brij_zebrak_scalxz_ndy_perxz&lt;BR /&gt;   serial loop: line 42: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 70: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 91: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 112: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 145: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 275: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 304: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 326: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 348: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 356: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 517: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 356: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 356: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 590: not a parallel candidate due to missing zero-trip test&lt;BR /&gt;   serial loop: line 33: not a parallel candidate due to the loop being lexically discontinuous&lt;BR /&gt;   serial loop: line 150&lt;BR /&gt;      anti data dependence assumed from line 160 to line 168, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 160 to line 178, due to "er"&lt;BR /&gt;      output data dependence assumed from line 160 to line 168, due to "er"&lt;BR /&gt;      output data dependence assumed from line 160 to line 178, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 160 to line 168, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 160 to line 178, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 168 to line 160, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 168 to line 168, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 168 to line 178, due to "er"&lt;BR /&gt;      output data dependence assumed from line 168 to line 160, due to "er"&lt;BR /&gt;      output data dependence assumed from line 168 to line 168, due to "er"&lt;BR /&gt;      output data dependence assumed from line 168 to line 178, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 168 to line 160, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 168 to line 168, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 168 to line 178, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 178 to line 160, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 178 to line 168, due to "er"&lt;BR /&gt;      output data dependence assumed from line 178 to line 160, due to "er"&lt;BR /&gt;      output data dependence assumed from line 178 to line 168, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 178 to line 160, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 178 to line 168, due to "er"&lt;BR /&gt;   serial loop: line 183&lt;BR /&gt;      anti data dep
endence assumed from line 193 to line 193, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 193 to line 203, due to "er"&lt;BR /&gt;      output data dependence assumed from line 193 to line 193, due to "er"&lt;BR /&gt;      output data dependence assumed from line 193 to line 203, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 193 to line 193, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 193 to line 203, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 203 to line 193, due to "er"&lt;BR /&gt;      output data dependence assumed from line 203 to line 193, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 203 to line 193, due to "er"&lt;BR /&gt;   serial loop: line 207&lt;BR /&gt;      anti data dependence assumed from line 217 to line 217, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 217 to line 227, due to "er"&lt;BR /&gt;      output data dependence assumed from line 217 to line 217, due to "er"&lt;BR /&gt;      output data dependence assumed from line 217 to line 227, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 217 to line 217, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 217 to line 227, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 227 to line 217, due to "er"&lt;BR /&gt;      output data dependence assumed from line 227 to line 217, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 227 to line 217, due to "er"&lt;BR /&gt;   serial loop: line 232&lt;BR /&gt;      anti data dependence assumed from line 242 to line 250, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 242 to line 260, due to "er"&lt;BR /&gt;      output data dependence assumed from line 242 to line 250, due to "er"&lt;BR /&gt;      output data dependence assumed from line 242 to line 260, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 242 to line 250, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 242 to line 260, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 250 to line 242, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 250 to line 250, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 250 to line 260, due to "er"&lt;BR /&gt;      output data dependence assumed from line 250 to line 242, due to "er"&lt;BR /&gt;      output data dependence assumed from line 250 to line 250, due to "er"&lt;BR /&gt;      output data dependence assumed from line 250 to line 260, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 250 to line 242, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 250 to line 250, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 250 to line 260, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 260 to line 242, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 260 to line 250, due to "er"&lt;BR /&gt;      output data dependence assumed from line 260 to line 242, due to "er"&lt;BR /&gt;      output data dependence assumed from line 260 to line 250, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 260 to line 242, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 260 to line 250, due to "er"&lt;BR /&gt;   serial loop: line 392&lt;BR /&gt;      anti data dependence assumed from line 402 to line 410, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 402 to line 420, due to "er"&lt;BR /&gt;      output data dependence assumed from line 402 to line 410, due to "er"&lt;BR /&gt;      output data dependence assumed from line 402 to line 420, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 402 to line 410, due to "er"&lt;BR /&gt;      flow data d
ependence assumed from line 402 to line 420, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 410 to line 402, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 410 to line 410, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 410 to line 420, due to "er"&lt;BR /&gt;      output data dependence assumed from line 410 to line 402, due to "er"&lt;BR /&gt;      output data dependence assumed from line 410 to line 410, due to "er"&lt;BR /&gt;      output data dependence assumed from line 410 to line 420, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 410 to line 402, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 410 to line 410, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 410 to line 420, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 420 to line 402, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 420 to line 410, due to "er"&lt;BR /&gt;      output data dependence assumed from line 420 to line 402, due to "er"&lt;BR /&gt;      output data dependence assumed from line 420 to line 410, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 420 to line 402, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 420 to line 410, due to "er"&lt;BR /&gt;   serial loop: line 425&lt;BR /&gt;      anti data dependence assumed from line 435 to line 435, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 435 to line 445, due to "er"&lt;BR /&gt;      output data dependence assumed from line 435 to line 435, due to "er"&lt;BR /&gt;      output data dependence assumed from line 435 to line 445, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 435 to line 435, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 435 to line 445, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 445 to line 435, due to "er"&lt;BR /&gt;      output data dependence assumed from line 445 to line 435, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 445 to line 435, due to "er"&lt;BR /&gt;   serial loop: line 449&lt;BR /&gt;      anti data dependence assumed from line 459 to line 459, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 459 to line 469, due to "er"&lt;BR /&gt;      output data dependence assumed from line 459 to line 459, due to "er"&lt;BR /&gt;      output data dependence assumed from line 459 to line 469, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 459 to line 459, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 459 to line 469, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 469 to line 459, due to "er"&lt;BR /&gt;      output data dependence assumed from line 469 to line 459, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 469 to line 459, due to "er"&lt;BR /&gt;   serial loop: line 474&lt;BR /&gt;      anti data dependence assumed from line 484 to line 492, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 484 to line 502, due to "er"&lt;BR /&gt;      output data dependence assumed from line 484 to line 492, due to "er"&lt;BR /&gt;      output data dependence assumed from line 484 to line 502, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 484 to line 492, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 484 to line 502, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 492 to line 484, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 492 to line 492, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 492 to line 502, due to "er"&lt;BR /&gt;      output data dependence assumed from line 492 to line 484, due to "er"&lt;BR /&gt;      output data
 dependence assumed from line 492 to line 492, due to "er"&lt;BR /&gt;      output data dependence assumed from line 492 to line 502, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 492 to line 484, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 492 to line 492, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 492 to line 502, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 502 to line 484, due to "er"&lt;BR /&gt;      anti data dependence assumed from line 502 to line 492, due to "er"&lt;BR /&gt;      output data dependence assumed from line 502 to line 484, due to "er"&lt;BR /&gt;      output data dependence assumed from line 502 to line 492, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 502 to line 484, due to "er"&lt;BR /&gt;      flow data dependence assumed from line 502 to line 492, due to "er"</description>
      <pubDate>Fri, 29 Oct 2004 00:48:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933244#M2585</guid>
      <dc:creator>denaro</dc:creator>
      <dc:date>2004-10-29T00:48:34Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933245#M2586</link>
      <description>I forgot ... when instead I use the fortran subroutine parallelized by KAP with the OpenMP directive and compile with -openmp option, the activities of the two CPUs confirm that the code runs with two threads each one on a processore:&lt;BR /&gt;&lt;BR /&gt;top - 21:51:55 up 29 days,  2:44,  2 users,  load average: 1.31, 0.52, 0.42 &lt;BR /&gt;Tasks:  62 total,   2 running,  60 sleeping,   0 stopped,   0 zombie &lt;BR /&gt; Cpu0 : 99.7% us,  0.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si &lt;BR /&gt; Cpu1 : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si &lt;BR /&gt; Cpu2 :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si &lt;BR /&gt; Cpu3 :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si &lt;BR /&gt;Mem:   3106328k total,  2433712k used,   672616k free,   127052k buffers &lt;BR /&gt;Swap:  6193016k total,    10172k used,  6182844k free,  1664648k cached &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;bye&lt;BR /&gt;Filippo</description>
      <pubDate>Fri, 29 Oct 2004 01:10:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933245#M2586</guid>
      <dc:creator>denaro</dc:creator>
      <dc:date>2004-10-29T01:10:22Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933246#M2587</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Filippo -&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Do you get any better parallelization results from some of the loops if you put in the compiler directive to ignore dependencies? This would be...&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV style="BORDER-RIGHT: black 1px solid; PADDING-RIGHT: 10px; BORDER-TOP: black 1px solid; PADDING-LEFT: 10px; PADDING-BOTTOM: 10px; BORDER-LEFT: black 1px solid; PADDING-TOP: 10px; BORDER-BOTTOM: black 1px solid"&gt;&lt;SPAN class="text_smallest"&gt;Code:&lt;/SPAN&gt;&lt;PRE&gt;!DEC$ IVDEP
&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;See pages 14-23 to 14-25 of the Intel Fortran Language Reference for more details on the directive.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Since the messages you are getting from the compiler are &lt;EM&gt;assuming &lt;/EM&gt;that there is a dependency (and both you and KAP know that this is not a true dependence), the above directive may be enough to cue the Intel compilers to that same fact.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;I realize that it will be a pain to put in directives by hand when a tool (KAP) can see through the dependence automatically. There will always be situations where this will be the case. Automatic detection of safe loops will never beat old-fashioned human inspection and knowledge.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;--clay&lt;/DIV&gt;</description>
      <pubDate>Wed, 03 Nov 2004 08:08:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933246#M2587</guid>
      <dc:creator>ClayB</dc:creator>
      <dc:date>2004-11-03T08:08:17Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP &amp; KAP preprocessor</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933247#M2588</link>
      <description>Dear Clay&lt;BR /&gt;&lt;BR /&gt;many thanks for your suggestion, I will try it and  look again  to performances.&lt;BR /&gt;As a matter of fact, I am used to see the KAP transcripten code to be helped in programming the original source. When KAP fails I am sure that I have to work on that part of the code to eliminate (when  possible) dependencies.&lt;BR /&gt;For example, I am quite sure that one of the problems that degrade my performances on Xeon is that KAP performs an interchange in the loops, that is, I originally programmed the cycles following k,j,i ordering for 3d arrays in order to optimize cache using. Actually, KAP changes j and k loops but this does not degrade the performance on Alpha owing to the wider L2 cache. On the other hand, this fact can be of some impact on Xeon and affect performances on parallelization. Isn't it?&lt;BR /&gt;&lt;BR /&gt;I think this way of working is the best way to optimize human factor and auto-parallelization softwares, unfortunately Intel acquired and killed KAP product ....&lt;BR /&gt;&lt;BR /&gt;best regards&lt;BR /&gt;Filippo</description>
      <pubDate>Wed, 03 Nov 2004 16:54:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/OpenMP-KAP-preprocessor/m-p/933247#M2588</guid>
      <dc:creator>denaro</dc:creator>
      <dc:date>2004-11-03T16:54:10Z</dc:date>
    </item>
  </channel>
</rss>

