<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic After investigating this in Analyzers</title>
    <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100976#M15933</link>
    <description>&lt;P style="word-wrap: break-word; font-size: 13.008px; line-height: 19.512px;"&gt;After investigating this issue by developer. It seems this is a limitation for "-O3" used, not only for VTune.&amp;nbsp;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 13.008px; line-height: 19.512px;"&gt;During optimization stage, compiler can produce the code which doesn’t exactly reflect the source. &amp;nbsp;Note that there are 5 loops in the binary code and the sub1 function has about 7 code ranges inserted to different places inside ‘test’ &amp;nbsp;but not a single continuous code range.&lt;BR /&gt;
	&lt;BR /&gt;
	I think it is not a VTune issue, other tools (for example perf) will show the same.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 24 May 2016 06:15:52 GMT</pubDate>
    <dc:creator>Peter_W_Intel</dc:creator>
    <dc:date>2016-05-24T06:15:52Z</dc:date>
    <item>
      <title>Unexpected behaviour while vtuning inlined and IPOed code</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100968#M15925</link>
      <description>&lt;P&gt;Dear Forum members,&lt;/P&gt;

&lt;P&gt;I have encountered some unusual behaviour in VTune displaying the time spent in various subroutines inlined by IPO. I managed to reproduce my problem in a simple example using ifort&amp;nbsp;version 16.0.1 and VTune Update 1 (build 434111):&lt;/P&gt;

&lt;P&gt;test.f90&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;PROGRAM test
  USE m1
  IMPLICIT NONE

  INTEGER, PARAMETER ::no_repeats = 100
  INTEGER, PARAMETER :: n = 100000000
  INTEGER :: repeat, i
  REAL :: gamma, delta, epsilon
  REAL, DIMENSION(:), ALLOCATABLE :: a, b, c


  ALLOCATE( a(1:n) )
  ALLOCATE( b(1:n) )
  ALLOCATE( c(1:n) )

  a = 1.0
  b = 1.0
  c = 2.0

  DO repeat = 1, no_repeats
    DO i = 1, n

      epsilon = b(i)
      CALL sub1( i+1, epsilon, gamma )
      epsilon = c(i)
      CALL sub1( i+2, epsilon, delta )
      a(i) = a(i) + gamma * delta
    ENDDO  ! n
  ENDDO    ! repeat

END PROGRAM test

&lt;/PRE&gt;

&lt;P&gt;sub2.f90&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;MODULE m1
  IMPLICIT NONE
  CONTAINS

  SUBROUTINE sub1( k, alpha, beta )
    IMPLICIT NONE

    INTEGER, INTENT(IN) :: k
    REAL, INTENT(IN) :: alpha
    REAL, INTENT(OUT) :: beta


    IF( MOD( k, 10 ) == 0 ) THEN
      beta = 4.0 * alpha
    ELSE
      beta = 2.0 * alpha
    ENDIF

  END SUBROUTINE sub1

END MODULE m1
&lt;/PRE&gt;

&lt;PRE class="brush:bash;"&gt;compile.sh&lt;/PRE&gt;

&lt;PRE class="brush:bash;"&gt;#!/bin/sh

ifort -c -no-vec -O3 -ipo -debug full sub1.f90
ifort -c -no-vec -O3 -ipo -debug full test.f90
ifort -no-vec -O3 -ipo -debug full sub1.o test.o -o test
&lt;/PRE&gt;

&lt;P&gt;After running this program under VTune, the Top-down window shows plausible results, both instances of the inlined subroutine sub1 are assigned similar runtimes:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="topdown-marked.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8600i427C8909E65E1A4E/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="topdown-marked.png" alt="topdown-marked.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;However, if I open the source code of "test" to check the time consumption of the various inlined instances of sub1, all time (0.188s) is assigned to the second inline instance:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="testsource-marked.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8601iA07F743DC2289404/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="testsource-marked.png" alt="testsource-marked.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;This is confirmed by checking the stack information on the right hand side of the screen, where both "contributions" are assigned to line 26 of test.f90:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="testsource1-marked.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8602i06EFCC8283EEE149/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="testsource1-marked.png" alt="testsource1-marked.png" /&gt;&lt;/span&gt;, &lt;span class="lia-inline-image-display-wrapper" image-alt="testsource2-marked.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8603iDC8FD7345BDF5AC4/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="testsource2-marked.png" alt="testsource2-marked.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Moreover, when I check the assembly code and the times assigned to the machine instructions, all instructions belonging to both instantiation of sub1 seem to have some reasonable timings, but the lea at 0x40284d is attributed all time of both subroutines (0.188s; I do understand that the assembly level information is not necessarily precise due to the stochastic nature of this kind of performance testing and not using hardware counter based methods, but I still think it is a sign of something going to the wrong way):&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="testdisassembly1.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8604i3281F1AED67A41F2/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="testdisassembly1.png" alt="testdisassembly1.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="testdisassembly2.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8605i02ECC37A2CEB20B1/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="testdisassembly2.png" alt="testdisassembly2.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Overall, my problem is that I can not check the time consumption of the individual instances of the inlined subroutines. The whole phenomenon can not be blamed on the fact that VTune sometimes attributes the time of an instruction to an other instruction nearby, often one or few instructions later, since inserting some complicated code after line 24 does not change the timings of sub1 and sub2. I have the impression that VTune is booking the time of the inlined code to the wrong place. A possible workaround would be to manually create a second copy of sub1 called sub2 and inline them separately. However, this is not working since the time of both sub1 and sub2 will be attributed to sub2. Furthermore, in a real program (a heavily inlined spagetti of some 100000 Fortran lines with OpenMP involved) this isn't feasible since the inlined subroutines are not small and their full call tree ,i.e. all called subroutines would have to be duplicated (triplicated, quadricated ...).&lt;/P&gt;

&lt;P&gt;Is there some way to check these timings?&lt;/P&gt;

&lt;P&gt;Thank you for your help,&lt;/P&gt;

&lt;P&gt;Jozsef&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 06 Mar 2016 11:32:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100968#M15925</guid>
      <dc:creator>Jozsef_K_</dc:creator>
      <dc:date>2016-03-06T11:32:02Z</dc:date>
    </item>
    <item>
      <title>-debug doesn't default to</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100969#M15926</link>
      <description>&lt;P&gt;-debug doesn't default to -debug:inline-debug-info, in case that was your question.&lt;/P&gt;</description>
      <pubDate>Sun, 06 Mar 2016 12:54:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100969#M15926</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-03-06T12:54:06Z</dc:date>
    </item>
    <item>
      <title>Dear Tim,</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100970#M15927</link>
      <description>&lt;P&gt;Dear Tim,&lt;/P&gt;

&lt;P&gt;Thank you for your help. Unfortunately, when I replace "-debug full" with "-debug inline-debug-info", the problem still persist.&lt;/P&gt;

&lt;P&gt;Jozsef&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 06 Mar 2016 13:29:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100970#M15927</guid>
      <dc:creator>Jozsef_K_</dc:creator>
      <dc:date>2016-03-06T13:29:49Z</dc:date>
    </item>
    <item>
      <title>I can repeat this behavior, I</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100971#M15928</link>
      <description>&lt;P&gt;I can repeat this behavior, I work on -&lt;/P&gt;

&lt;P&gt;[cpp]&lt;/P&gt;

&lt;P&gt;&amp;gt; amplxe-cl --version&lt;BR /&gt;
	Intel(R) VTune(TM) Amplifier XE 2016 Update 2 (build 444464) Command Line Tool&lt;BR /&gt;
	Copyright (C) 2009-2015 Intel Corporation. All rights reserved.&lt;BR /&gt;
	&amp;gt; ifort --version&lt;BR /&gt;
	ifort (IFORT) 16.0.1 20151021 [/cpp]&lt;/P&gt;

&lt;P&gt;I used compile_o3.h to build:&lt;/P&gt;

&lt;P&gt;[cpp]&lt;/P&gt;

&lt;P&gt;#!/bin/sh&lt;BR /&gt;
	ifort -c -no-vec -O3 -ipo -debug inline-debug-info sub2.f90&lt;BR /&gt;
	ifort -c -no-vec -O3 -ipo -debug inline-debug-info test.f90&lt;BR /&gt;
	ifort -no-vec -O3 -ipo -debug inline-debug-info sub2.o test.o -o test_o3 [/cpp]&lt;/P&gt;

&lt;P&gt;Then, use basic hotspots to collect/display result:&lt;/P&gt;

&lt;P&gt;[cpp]&lt;/P&gt;

&lt;P&gt;&amp;gt;amplxe-cl -c hotspots&amp;nbsp; -- ./test_o3 ; it lasted 8 seconds in my machine.&lt;/P&gt;

&lt;P&gt;# amplxe-cl -R hotspots&lt;BR /&gt;
	amplxe: Using result path `/home/peter/problem_report/fort_inline/r009hs'&lt;BR /&gt;
	amplxe: Executing actions 75 % Generating a report&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Function&amp;nbsp; CPU Time&amp;nbsp; CPU Time:Effective Time&amp;nbsp; CPU Time:Effective Time:Idle&amp;nbsp; CPU Time:Effective Time:Poor&amp;nbsp; CPU Time:Effective Time:Ok&amp;nbsp; CPU Time:Effective Time:Ideal&amp;nbsp; CPU Time:Effective Time:Over&amp;nbsp; CPU Time:Spin Time&amp;nbsp; CPU Time:Overhead Time&amp;nbsp; Module&amp;nbsp;&amp;nbsp; Function (Full)&amp;nbsp; Source File&amp;nbsp; Start Address&lt;BR /&gt;
	--------&amp;nbsp; --------&amp;nbsp; -----------------------&amp;nbsp; ----------------------------&amp;nbsp; ----------------------------&amp;nbsp; --------------------------&amp;nbsp; -----------------------------&amp;nbsp; ----------------------------&amp;nbsp; ------------------&amp;nbsp; ----------------------&amp;nbsp; -------&amp;nbsp; ---------------&amp;nbsp; -----------&amp;nbsp; -------------&lt;BR /&gt;
	test&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7.770s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7.770s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7.770s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp; test_o3&amp;nbsp; test&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; test.f90&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0x402e30&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	&lt;STRONG&gt;sub1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.150s&lt;/STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.150s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.150s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp; test_o3&amp;nbsp; sub1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sub2.f90&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0x40305b&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	sub1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.010s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.010s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.010s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp; test_o3&amp;nbsp; sub1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sub2.f90&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0x403050&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	amplxe: Executing actions 100 % done&lt;/P&gt;

&lt;P&gt;[/cpp] &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;It seemed that sub1() only took ~0.15s but it didn't make sense. If I disabled all optimization, I can get expected result.&lt;/P&gt;

&lt;P&gt;I used "sh compile_o0.sh" to build:&lt;/P&gt;

&lt;P&gt;[cpp]&lt;/P&gt;

&lt;P&gt;#!/bin/sh&lt;BR /&gt;
	ifort -c -no-vec -O0 -debug full sub2.f90&lt;BR /&gt;
	ifort -c -no-vec -O0 -debug full test.f90&lt;BR /&gt;
	ifort -no-vec -O0 -debug full sub2.o test.o -o test_o0 [/cpp]&lt;/P&gt;

&lt;P&gt;I collected data &amp;amp; display report again:&lt;/P&gt;

&lt;P&gt;[cpp]&lt;/P&gt;

&lt;P&gt;&amp;gt; amplxe-cl -c hotspots -d 8 -- ./test_o0&lt;/P&gt;

&lt;P&gt;&amp;gt; amplxe-cl -R hotspots&lt;BR /&gt;
	amplxe: Using result path `/home/peter/problem_report/fort_inline/r010hs'&lt;BR /&gt;
	amplxe: Executing actions 75 % Generating a report&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Function&amp;nbsp; CPU Time&amp;nbsp; CPU Time:Effective Time&amp;nbsp; CPU Time:Effective Time:Idle&amp;nbsp; CPU Time:Effective Time:Poor&amp;nbsp; CPU Time:Effective Time:Ok&amp;nbsp; CPU Time:Effective Time:Ideal&amp;nbsp; CPU Time:Effective Time:Over&amp;nbsp; CPU Time:Spin Time&amp;nbsp; CPU Time:Overhead Time&amp;nbsp; Module&amp;nbsp;&amp;nbsp; Function (Full)&amp;nbsp; Source File&amp;nbsp; Start Address&lt;BR /&gt;
	--------&amp;nbsp; --------&amp;nbsp; -----------------------&amp;nbsp; ----------------------------&amp;nbsp; ----------------------------&amp;nbsp; --------------------------&amp;nbsp; -----------------------------&amp;nbsp; ----------------------------&amp;nbsp; ------------------&amp;nbsp; ----------------------&amp;nbsp; -------&amp;nbsp; ---------------&amp;nbsp; -----------&amp;nbsp; -------------&lt;BR /&gt;
	&lt;STRONG&gt;sub1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.270s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.270s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.270s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp; test_o0&amp;nbsp; sub1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sub2.f90&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0x402e36&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	test&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.590s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.590s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.590s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0s&amp;nbsp; test_o0&amp;nbsp; test&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; test.f90&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0x402ea0&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	amplxe: Executing actions 100 % done&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;[/cpp] &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;CPU time of function sub1() was expected when disabling optimization.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Is it a bug from VTune(TM) Amplifier or from ifort?&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2016 05:42:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100971#M15928</guid>
      <dc:creator>Peter_W_Intel</dc:creator>
      <dc:date>2016-03-07T05:42:00Z</dc:date>
    </item>
    <item>
      <title>I have forwarded this issue</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100972#M15929</link>
      <description>&lt;P&gt;I have forwarded this issue to developer, I will update when I get a solution.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2016 07:10:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100972#M15929</guid>
      <dc:creator>Peter_W_Intel</dc:creator>
      <dc:date>2016-03-07T07:10:43Z</dc:date>
    </item>
    <item>
      <title>Well, it is always hard to</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100973#M15930</link>
      <description>&lt;P&gt;Well, it is always hard to measure exact timings of inlined functions since compiler can significantly rearrange instructions doing optimizations. You can not rely on source view in this case because compiler may lose original source lines attribution.&lt;/P&gt;

&lt;P&gt;Like in your case - you can see in VTune disassembly view that instructions of both inline instances are mixed together. There is release noted IP+1 issue (the time is attributed to the next instruction) which leads to attribution of imul timing of the first instance to the second one (attributed to add). Stack shows confusing source line because compiler wasn't accurate generating debug info for both instances - as I said, it is hard to preserve original source line attribution doing intensive optimizations.&lt;/P&gt;</description>
      <pubDate>Tue, 08 Mar 2016 20:15:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100973#M15930</guid>
      <dc:creator>Vitaly_S_Intel</dc:creator>
      <dc:date>2016-03-08T20:15:02Z</dc:date>
    </item>
    <item>
      <title>Dear Vitaly, thank you for</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100974#M15931</link>
      <description>&lt;P&gt;Dear Vitaly, thank you for your help. I understand the troubles arising from the IP+1 issue and also I'm aware of the troubles of the performance measurement of heavily inlined and optimized code. I'm happy to browse the disassembled code to find the details and draw conclusions after some consideration.&lt;/P&gt;

&lt;P&gt;However, I still think something is going wrong here and the bookkeping is buggy. Taking a look at the last two screenshots in my post it is visible that the second inlined copy of sub1 at line 26 took 0.188 s t complete. However, the first corresponding machine instruction, a lea at 0x40284d took exactly the same, 0.188 s. Then the time for the rest of the instructions (highlighted in the last screenshot by vtune) is not accounted anywhere. Similarly at the previous screenshot, vtune highlighted the instructions corresponding to the first call to sub1, the time of these instructions are more than zero but not accounted to the first call at the source view (the left side of the screen). Moreover, I think that the lea is&amp;nbsp; irreasonably expensive here compared to the time needed by the floating point operations (hidden behind line 20 due to the IP+1 issue).&lt;/P&gt;

&lt;P&gt;I have encountered several similar situations where the timings were sometimes obviously wrong. In one case the time of dozens of inlined subroutines have been attributed to a single inlined copy, in some other cases the output was completely unfeasible (a few pushq instructions before a function call were attributed a significant time while these were executed a total of seven times). The example I provided above is a simple demonstration for the core problem.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2016 16:31:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100974#M15931</guid>
      <dc:creator>Jozsef_K_</dc:creator>
      <dc:date>2016-03-09T16:31:59Z</dc:date>
    </item>
    <item>
      <title>The problem on my side was,</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100975#M15932</link>
      <description>&lt;P&gt;The problem on my side was, most of CPU time dropped on the entry of loop, others dropped on branch - it doesn't make sense. You can find clue by clicking on source line (you estimate it to take big CPU time), then find its associated assembly line. See the example of this case -&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="ifort_inline.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/10288i4B98E562DCD87DCD/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="ifort_inline.png" alt="ifort_inline.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; Line #54 has no CPU time, it was counted in line #40, but instruction address is 0x4030e2 -&amp;gt; 0x4030f8, IP + n?&lt;/P&gt;

&lt;P&gt;0.18s CPU time on inline function is another issue.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Mar 2016 04:02:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100975#M15932</guid>
      <dc:creator>Peter_W_Intel</dc:creator>
      <dc:date>2016-03-10T04:02:16Z</dc:date>
    </item>
    <item>
      <title>After investigating this</title>
      <link>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100976#M15933</link>
      <description>&lt;P style="word-wrap: break-word; font-size: 13.008px; line-height: 19.512px;"&gt;After investigating this issue by developer. It seems this is a limitation for "-O3" used, not only for VTune.&amp;nbsp;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 13.008px; line-height: 19.512px;"&gt;During optimization stage, compiler can produce the code which doesn’t exactly reflect the source. &amp;nbsp;Note that there are 5 loops in the binary code and the sub1 function has about 7 code ranges inserted to different places inside ‘test’ &amp;nbsp;but not a single continuous code range.&lt;BR /&gt;
	&lt;BR /&gt;
	I think it is not a VTune issue, other tools (for example perf) will show the same.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 May 2016 06:15:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Unexpected-behaviour-while-vtuning-inlined-and-IPOed-code/m-p/1100976#M15933</guid>
      <dc:creator>Peter_W_Intel</dc:creator>
      <dc:date>2016-05-24T06:15:52Z</dc:date>
    </item>
  </channel>
</rss>

