<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The code of flop in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800003#M667</link>
    <description>I complied the code which download from &lt;A target="_blank" href="http://www.netlib.org/benchmark/linpack-pc.c"&gt;&lt;/A&gt;&lt;A href="http://www.n" target="_blank"&gt;http://www.n&lt;/A&gt;&lt;WBR /&gt;etlib.org/be&lt;WBR /&gt;nchmark/linp&lt;WBR /&gt;ack-pc.c. And run it. My monitor tool shows that 34M, 435M, 1800M, 2045M ... and so on. &lt;BR /&gt;</description>
    <pubDate>Fri, 02 Mar 2012 13:32:03 GMT</pubDate>
    <dc:creator>Guanghui</dc:creator>
    <dc:date>2012-03-02T13:32:03Z</dc:date>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/799997#M661</link>
      <description>&lt;P&gt;I want to understand flops more. SNB used PORT, but not FPU, to do floating-point operations. What's program code could test SNB flop. Or how to calculate flop by coding?&lt;/P&gt;</description>
      <pubDate>Wed, 29 Feb 2012 16:03:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/799997#M661</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-02-29T16:03:29Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/799998#M662</link>
      <description>Hello GHui,&lt;BR /&gt;This is a complicated question.&lt;BR /&gt;FLOP is just a floating point operation.&lt;BR /&gt;There is an article on measuring the FLOPs on SNB (and other processor families) at &lt;A href="http://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs/" target="_blank"&gt;http://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs/&lt;/A&gt;&lt;BR /&gt;Hopefully this helps,&lt;BR /&gt;Pat</description>
      <pubDate>Wed, 29 Feb 2012 16:32:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/799998#M662</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-02-29T16:32:44Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/799999#M663</link>
      <description>Thanks, Pat. The material is very useful.&lt;BR /&gt;I do a "a&lt;I&gt;+=b&lt;I&gt;*c&lt;I&gt;" calculate. It's about 529M every second on nehalem. But it display about 1864M every second by other tools which monitored via PMU. I want to know how to understand this situation.&lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;</description>
      <pubDate>Thu, 01 Mar 2012 05:33:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/799999#M663</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-03-01T05:33:03Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800000#M664</link>
      <description>The code is like this.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;[cpp]#include &lt;STDIO.H&gt;
#include &lt;UNISTD.H&gt;
#include &lt;TIME.H&gt;
#include &lt;SYS&gt;

#define NUM 20000
#define TIMEi 10000

int main(int argc,char **argv)
{
        float a[NUM],b[NUM],c[NUM];
        long i,j,k;
        double result;
        struct timeval tv1,tv2;

        printf("init datan");
        for(i=0;i&lt;NUM&gt;=b&lt;I&gt;=c&lt;I&gt;=0.2;
        }

        printf("start FP opn");
        long iMax=TIMEi;

        while(1)
        {
                gettimeofday(&amp;amp;tv1,NULL);
                for(i=0;i&lt;IMAX&gt;+=b&lt;I&gt;*c&lt;I&gt;;
                        }
                }
                gettimeofday(&amp;amp;tv2,NULL);
                float dt=(float)(tv2.tv_sec*1000000+tv2.tv_usec-tv1.tv_sec*1000000-tv1.tv_usec)/1000000;
                //float dt=tv2.tv_usec-tv1.tv_usec;
                result=(double)2*iMax*NUM/dt/1000000;
                printf("MFlops:%lf %lfn",result,dt);
        }

        return 0;
}
[/cpp] &lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/IMAX&gt;&lt;/I&gt;&lt;/I&gt;&lt;/NUM&gt;&lt;/SYS&gt;&lt;/TIME.H&gt;&lt;/UNISTD.H&gt;&lt;/STDIO.H&gt;</description>
      <pubDate>Thu, 01 Mar 2012 09:00:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800000#M664</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-03-01T09:00:18Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800001#M665</link>
      <description>Did you have a chance to look at a&lt;STRONG&gt;Linpack 100x100 Benchmark&lt;/STRONG&gt; in C/C++ for PCs?&lt;BR /&gt;</description>
      <pubDate>Thu, 01 Mar 2012 14:35:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800001#M665</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-03-01T14:35:44Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800002#M666</link>
      <description>I had replied. Does it can be pass?</description>
      <pubDate>Fri, 02 Mar 2012 07:30:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800002#M666</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-03-02T07:30:45Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800003#M667</link>
      <description>I complied the code which download from &lt;A target="_blank" href="http://www.netlib.org/benchmark/linpack-pc.c"&gt;&lt;/A&gt;&lt;A href="http://www.n" target="_blank"&gt;http://www.n&lt;/A&gt;&lt;WBR /&gt;etlib.org/be&lt;WBR /&gt;nchmark/linp&lt;WBR /&gt;ack-pc.c. And run it. My monitor tool shows that 34M, 435M, 1800M, 2045M ... and so on. &lt;BR /&gt;</description>
      <pubDate>Fri, 02 Mar 2012 13:32:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800003#M667</guid>
      <dc:creator>Guanghui</dc:creator>
      <dc:date>2012-03-02T13:32:03Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800004#M668</link>
      <description>Hello GHui,&lt;BR /&gt;I'll try running your program this weekend.&lt;BR /&gt;Pat</description>
      <pubDate>Fri, 02 Mar 2012 13:55:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800004#M668</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-02T13:55:57Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800005#M669</link>
      <description>[bash]Hello GHui,&lt;BR /&gt;When I compile the program with 'gcc -O0 -g ghui_flops.c -o ghui_flops' on my SandyBridge "Intel Core i7-2820QM CPU @ 2.30GHz" processor and run it I get:[/bash]&lt;P&gt;snb-d2:/home/pfay/flops # ./ghui_flops&lt;BR /&gt;init data&lt;BR /&gt;start FP op&lt;BR /&gt;MFlops:726.550135 0.550547&lt;BR /&gt;MFlops:734.875382 0.544310&lt;BR /&gt;&lt;BR /&gt;The assembly code shows that the generated code for the inner loop loads each b[] and c[] value, does the multiply, then the adds the value to a[] and stores it. The compiler generates SSE2 vector instructions but only uses 1 of the 4 available single precision values in the xmm* registers.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I modified your program to print the Ops (in this case Float point ops) and to only do 9 outer loops.&lt;BR /&gt;And I ran it under 'perf stat' using the FP_COMP_OPS_EXE.SSE_FP_SCALAR_SINGLE event (event =0x10, umask= 0x20). &lt;BR /&gt;See the SDM vol 3 section 19.3 for sandy bridge events.&lt;BR /&gt;To specify a 'raw' event with the 'perf' utility, you have to say ' -e rXXYY' where XX is the mask and YY is the event number.&lt;/P&gt;&lt;P&gt;snb-d2:/home/pfay/flops # perf stat -e r2010 ./ghui_flops&lt;BR /&gt;init data&lt;BR /&gt;start FP op&lt;BR /&gt;MFlops:734.230559 Mops= 400.000000 0.544788&lt;BR /&gt;MFlops:734.408536 Mops= 400.000000 0.544656&lt;BR /&gt;MFlops:734.424691 Mops= 400.000000 0.544644&lt;BR /&gt;MFlops:733.564013 Mops= 400.000000 0.545283&lt;BR /&gt;MFlops:734.520429 Mops= 400.000000 0.544573&lt;BR /&gt;MFlops:734.523162 Mops= 400.000000 0.544571&lt;BR /&gt;MFlops:733.920691 Mops= 400.000000 0.545018&lt;BR /&gt;MFlops:734.270968 Mops= 400.000000 0.544758&lt;BR /&gt;MFlops:733.686477 Mops= 400.000000 0.545192&lt;BR /&gt;tot_Mops= 3600.000000, tot_time= 4.903483, overall Mops/sec= 734.172011&lt;BR /&gt;&lt;BR /&gt;Performance counter stats for './ghui_flops':&lt;BR /&gt;3635739254 raw 0x2010&lt;BR /&gt;4.904172008 seconds time elapsed&lt;BR /&gt;&lt;BR /&gt;Note that the FP_COMP_OPS_EXE.SSE_FP_SCALAR_SINGLE*1.0e-6 count is almost the same as the tot_Mops (3635.7 count versus tot_Mops= 3600). &lt;BR /&gt;The Mops calculation expects 2 SSE_FP_SCALAR_SINGLE operations per iteration. The 2 ops are the multiply and the add.&lt;BR /&gt;&lt;BR /&gt;If I now compile with -O to get some optimizations:&lt;BR /&gt;snb-d2:/home/pfay/flops # gcc ghui_flops.c -o ghui_flops -g -O&lt;BR /&gt;&lt;BR /&gt;And run it:&lt;BR /&gt;snb-d2:/home/pfay/flops # perf stat -e r2010 ./ghui_flops&lt;BR /&gt;init data&lt;BR /&gt;start FP op&lt;BR /&gt;MFlops:2172.460865 Mops= 400.000000 0.184123&lt;BR /&gt;MFlops:2223.580994 Mops= 400.000000 0.179890&lt;BR /&gt;MFlops:2222.938540 Mops= 400.000000 0.179942&lt;BR /&gt;MFlops:2221.691522 Mops= 400.000000 0.180043&lt;BR /&gt;MFlops:2222.963208 Mops= 400.000000 0.179940&lt;BR /&gt;MFlops:2223.358514 Mops= 400.000000 0.179908&lt;BR /&gt;MFlops:2223.420391 Mops= 400.000000 0.179903&lt;BR /&gt;MFlops:2222.827359 Mops= 400.000000 0.179951&lt;BR /&gt;MFlops:2223.148600 Mops= 400.000000 0.179925&lt;BR /&gt;tot_Mops= 3600.000000, tot_time= 1.623625, overall Mops/sec= 2217.260765&lt;BR /&gt;&lt;BR /&gt;Performance counter stats for './ghui_flops':&lt;BR /&gt;1800206977 raw 0x2010&lt;BR /&gt;1.625062683 seconds time elapsed&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Now we are getting near 1 Flop per clocktick... what's going on?&lt;BR /&gt;Generate the assembly: gcc ghui_flops.c -o ghui_flops.s -g -O -S -c&lt;BR /&gt;Inspecting ghui_flops.s shows the load/multiply/add/store loop has been replaced with (more or less)&lt;/P&gt;&lt;P&gt;for(i=0;i&lt;IMAX&gt;&lt;/IMAX&gt;{&lt;BR /&gt; float x = b&lt;I&gt;*c&lt;I&gt;;&lt;BR /&gt; for(j=0;j&lt;NUM&gt;&lt;/NUM&gt; {&lt;BR /&gt; a&lt;I&gt;+=x;&lt;BR /&gt; }&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;So the multiply has been moved out of the inner loop. This explains why 'perf stat' reports half the expected number of flops.&lt;BR /&gt;&lt;BR /&gt;Why are we only using 1 of the 4 single precision values in the xmm registers?&lt;BR /&gt;If we change swap the 2 'for()' loops to:&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;[cpp]for(j=0;j&lt;NUM&gt;+=b&lt;I&gt;*c&lt;I&gt;;
  }
}
[/cpp]&lt;P&gt;&lt;BR /&gt;Then the compiler can auto-vectorize and we can compile it with:&lt;BR /&gt;gcc ghui_flops.c -o ghui_flops -g -O3 -ftree-vectorizer-verbose=3&lt;BR /&gt;The -ftree... option tells you which loops it can vectorize.&lt;BR /&gt;&lt;BR /&gt;Then when we run we see&lt;/P&gt;[plain]snb-d2:/home/pfay/flops # perf stat -e cycles -e r4010 ./ghui_flops
init data
start FP op
MFlops:7673.419523 Mops= 400.000000 0.052128
MFlops:7719.772351 Mops= 400.000000 0.051815
MFlops:7696.007600 Mops= 400.000000 0.051975
MFlops:7732.756103 Mops= 400.000000 0.051728
MFlops:7683.442094 Mops= 400.000000 0.052060
MFlops:7714.263820 Mops= 400.000000 0.051852
MFlops:7741.286464 Mops= 400.000000 0.051671
MFlops:7722.156901 Mops= 400.000000 0.051799
MFlops:7723.647785 Mops= 400.000000 0.051789
tot_Mops= 3600.000000, tot_time= 0.466817, overall Mops/sec= 7711.801490

 Performance counter stats for './ghui_flops':

     1557784235  cycles
     1012898876  raw 0x4010

    0.468101879  seconds time elapsed

[/plain]&lt;P&gt;&lt;BR /&gt;Now we are running 3.5x faster. We are getting about 2.6 Flops/cycle.&lt;BR /&gt;Note that I changed the event to 'r0410' which is event FP_COMP_OPS_EXE.SSE_PACKED_SINGLE (event # 0x10 umask=0x40). And I added the '-e cycles' option to get clockticks.&lt;BR /&gt;&lt;BR /&gt;The perf event count is 1,013.7 Mops. &lt;BR /&gt;Each SSE_PACKED_SINGLE instruction does 4 operations, so we have to multiply it by 4 to get ops.&lt;BR /&gt;We would expect at least3600 Moperations and perf counted about 4000Mops.&lt;BR /&gt;&lt;BR /&gt;I'llattach my updated source code file in another post (this post is already too long).&lt;BR /&gt;Does this make sense?&lt;BR /&gt;Pat&lt;/P&gt;&lt;/I&gt;&lt;/I&gt;&lt;/NUM&gt;</description>
      <pubDate>Sun, 04 Mar 2012 05:39:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800005#M669</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-04T05:39:23Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800006#M670</link>
      <description>Here is my modified version of your source code.&lt;BR /&gt;&lt;BR /&gt;[cpp]#include &lt;STDIO.H&gt;
#include &lt;UNISTD.H&gt;
#include &lt;TIME.H&gt;
#include &lt;SYS&gt;

#define NUM 20000
#define TIMEi 10000

int main(int argc,char **argv)
{
    float a[NUM],b[NUM],c[NUM];
    int i,j,k, m;
    double result, ops, tot_ops, tot_time;
    struct timeval tv1,tv2;

    printf("init datan");
    for(i=0;i&lt;NUM&gt;=b&lt;I&gt;=c&lt;I&gt;=0.2;
    }

    printf("start FP opn");
    long iMax=TIMEi;

    m = 0;
    tot_ops = 0.0;
    tot_time = 0.0;
    while(++m &amp;lt; 10)
    {
        gettimeofday(&amp;amp;tv1,NULL);
        for(i=0;i&lt;IMAX&gt;+=b&lt;I&gt;*c&lt;I&gt;;
            }
        }
        gettimeofday(&amp;amp;tv2,NULL);
        float dt=(float)(tv2.tv_sec*1000000+tv2.tv_usec-tv1.tv_sec*1000000-tv1.tv_usec)/1000000;
        //float dt=tv2.tv_usec-tv1.tv_usec;
        ops = (double)2*iMax*NUM;
        result=ops/dt/1000000.0;
        printf("MFlops:%lf Mops= %f %lfn",result,1.0e-6*ops, dt);
        tot_ops += ops;
        tot_time+= dt;
    }
    if(argc &amp;gt; 10) // just put this in so compiler doesn't optimize everything away.
    {
        float d=0;
        for(j=0;j&lt;NUM&gt;;
        }
        printf("d= %fn", d);
    }
    printf("tot_Mops= %f, tot_time= %f, overall Mops/sec= %fn", 1.0e-6*tot_ops, tot_time, 1.0e-6*tot_ops/tot_time);
    return 0;
}

[/cpp]&lt;/NUM&gt;&lt;/I&gt;&lt;/I&gt;&lt;/IMAX&gt;&lt;/I&gt;&lt;/I&gt;&lt;/NUM&gt;&lt;/SYS&gt;&lt;/TIME.H&gt;&lt;/UNISTD.H&gt;&lt;/STDIO.H&gt;</description>
      <pubDate>Sun, 04 Mar 2012 05:48:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800006#M670</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-04T05:48:45Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800007#M671</link>
      <description>Hi Patrick,&lt;BR /&gt;&lt;BR /&gt;Thank you! I'll try to use the code to verify a performance (in &lt;STRONG&gt;Flops&lt;/STRONG&gt;) of my test computers.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Sergey</description>
      <pubDate>Sun, 04 Mar 2012 23:13:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800007#M671</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-03-04T23:13:02Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800008#M672</link>
      <description>Hello Pat.&lt;BR /&gt;Thank you very much.&lt;BR /&gt;GHui.</description>
      <pubDate>Mon, 05 Mar 2012 12:20:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800008#M672</guid>
      <dc:creator>Guanghui</dc:creator>
      <dc:date>2012-03-05T12:20:49Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800009#M673</link>
      <description>&lt;P&gt;One of theloops in my code has an error. The loop:&lt;/P&gt;[cpp]for(j=0;j&lt;NUM&gt;;   
}   [/cpp]&lt;P&gt;should be:&lt;/P&gt;[cpp]for(j=0;j&lt;NUM&gt;;   
}   [/cpp]&lt;P&gt;&lt;BR /&gt;The loop will only get executed ifsomeone ever enters 10 args to the program.&lt;BR /&gt;I put the loop in because, for one intermediate version of my code, the compiler optimized away the loops.&lt;BR /&gt;Pat&lt;/P&gt;&lt;/NUM&gt;&lt;/NUM&gt;</description>
      <pubDate>Mon, 05 Mar 2012 18:05:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800009#M673</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-05T18:05:48Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800010#M674</link>
      <description>Thanks for the update, Patrick! Ididn't have time yet to run the test but I hope I'll spend some time soon.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Sergey&lt;BR /&gt;</description>
      <pubDate>Tue, 06 Mar 2012 01:34:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800010#M674</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-03-06T01:34:06Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800011#M675</link>
      <description>Hi Pat,&lt;BR /&gt;&lt;BR /&gt;For several days past, I really don't understand the following code piece. It can change the compiler compiled result? Or how the code piece can effect the compiler?&lt;BR /&gt;&lt;BR /&gt;[cpp]    if(argc &amp;gt; 10) // just put this in so compiler doesn't optimize everything away.  
    {  
        float d=0;  
        for(j=0;j&lt;NUM&gt;;  
        }  
        printf("d= %fn", d);  
    }  [/cpp]&lt;/NUM&gt;</description>
      <pubDate>Tue, 06 Mar 2012 16:40:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800011#M675</guid>
      <dc:creator>Guanghui</dc:creator>
      <dc:date>2012-03-06T16:40:35Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800012#M676</link>
      <description>Hello Guanghui,&lt;BR /&gt;Sometimes compilers will optimize away sections of code.&lt;BR /&gt;For instance, if the compiler sees that the result arraya[] aren't used anywhere and nothing depends on the result, why not just delete the whole 'a&lt;I&gt; += b&lt;I&gt; * c&lt;I&gt;;' loop?&lt;BR /&gt;This is what happened when I compiled before I added the 'if(argc &amp;gt; 10)' logic.&lt;BR /&gt;&lt;BR /&gt;The 'if(argc &amp;gt; 10)' logic makes it where the compiler can't say (at compile time) whether the results in the a[] array are used or not. So the compiler leavesin the 'a&lt;I&gt; += b&lt;I&gt; * c&lt;I&gt;;' loop.&lt;BR /&gt;Does this make sense?&lt;BR /&gt;Pat&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;</description>
      <pubDate>Tue, 06 Mar 2012 17:10:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800012#M676</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-06T17:10:03Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800013#M677</link>
      <description>Yes. Thank you very much, Pat.&lt;BR /&gt;</description>
      <pubDate>Wed, 07 Mar 2012 11:02:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800013#M677</guid>
      <dc:creator>Guanghui</dc:creator>
      <dc:date>2012-03-07T11:02:41Z</dc:date>
    </item>
    <item>
      <title>The code of flop</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800014#M678</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1331130181687="58" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=335837" href="https://community.intel.com/en-us/profile/335837/" class="basic"&gt;Patrick Fay (Intel)&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;...&lt;BR /&gt;Sometimes &lt;STRONG&gt;compilers will optimize away sections of code&lt;/STRONG&gt;.&lt;BR /&gt;For instance, if the compiler sees that the result arraya[] aren't used anywhere and nothing depends on the result, why not just delete the whole 'a&lt;I&gt; += b&lt;I&gt; * c&lt;I&gt;;' loop?&lt;BR /&gt;...&lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;I'm always concerned in such cases.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Mar 2012 15:24:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/The-code-of-flop/m-p/800014#M678</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-03-07T15:24:11Z</dc:date>
    </item>
  </channel>
</rss>

