Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
公告
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

Analyzing overhead of polymorphic code- no branching metrics available?

T_C
初學者
9,927 檢視

Hi,

I am analyzing the different between two designs which process millions of messages. One design uses polymorphism and the other doesnt- each message will be represented by a polymorphic sub type.

I have profiled both designs using VTune. The High-level summary data seems to make sense- the polymorphic design has a higher "branch mispredict" rate, higher CPI and higher "ICache misses" rate than the non-polymorphic version implemented with IF statements. 

The polymorphic design has a line of source code like this:

object->virtualFunction();

and this is called millions of times (where the sub type changes each time). I am expecting the polymorphic design to be slower because of branch target mispredictions/instruction misses. As said above, the VTune "summary" tab seems to confirm this. However, when I go to the metrics next to the line of source code there are absolutely no metrics except for:

  • Filled pipeline slots total -> Retiring -> General retirement
  • Filled pipeline slots self -> Retiring -> General retirement
  • Unfilled pipeline slots total -> Front end bound -> Front end bandwidth -> Front end bandwidth MITE
  • Unfilled pipeline slots self -> Front end bound -> Front end bandwidth -> Front end bandwidth MITE

None of the branch prediction columns have data, nor do the instruction cache miss columns??

Could somebody please comment on whether this seems sensible? To me it doesn't- how can there be no branch misprediction or instruction cache miss statistics for a line of polymorphic code where the branch target will constantly be changing per message? 

0 積分
48 回應
Peter_W_Intel
員工
4,287 檢視

When you enabled optimization switches,  some functions which are called significantly will be compiled with "inlined" - so branch instructions are reduced.  You have two options to verify:

1. Disable optimization switches when compiling.

2. For using Intel C/C++ Composer 13.1 above, use "CFLAGS=-O2 -g -inline-debug-info" like to build, and turn on "inline enabled" in VTune report

If you see branch misprediction metric data in Summary tab, what are contributed in source line? See bottom-up report to know metric in function/source line.

Regards, Peter  

T_C
初學者
4,287 檢視

Hi Peter, thanks for replying.

The compiler wouldn't be able to inline the polymorphic function call because it wouldn't know the exact subtype of the object (preventing inlining is another cost to Polymorphism). The exact code for the virtual method can only be known at run-time via the vptr and the vtable, surely?

Peter_W_Intel
員工
4,287 檢視

Hm, it could be issue if the method entry address got via vptr in the table, at runtime. VTune cannot know what exact function is before running (based on debug info), I think. -Peter 

T_C
初學者
4,287 檢視

Ok but the line of code:

object->virtualFunction();

is translated in to ASM. This ASM will then have an instruction address. When the CPU attempts to execute this unconditional branch it will try to predict the target branch address. If the address is right/wrong surely VTune should still be able to say for line object->virtualFunction() the CPU correctly/incorrectly guessed the branch target X%?

I'd be ever so surprised if VTune cannot profile polymorphic method calls/unconditional branches!?

Peter_W_Intel
員工
4,287 檢視

Is it possible that you can provide me a simple test case of polymorphic method? I may try to construct new one, there is limited time today...

T_C
初學者
4,287 檢視

Here you go:

#include <iostream>

class Parent{
public:
	    virtual void f() = 0;
};

class Child1 : public Parent{
public:
	    virtual void f(){
		        std::cout << "Child1" << std::endl;
	    }
};

class Child2 : public Parent{
public:
	    virtual void f(){
        		std::cout << "Child2" << std::endl;
	    }
};


int main(){

	    Parent* p;

	    for(int i=0; i<100000; i++){
        		if(__rdtsc() % 2 == 0){
           			 p = new Child1();
		        }
        		else{
			            p = new Child2();
		        }
		        p->f();
    	}
}

 

Bernard
傑出貢獻者 I
4,287 檢視

>>>Ok but the line of code:

object->virtualFunction();

is translated in to ASM. This ASM will then have an instruction address>>>

Usually object is pointed by *this pointer(in Windows it is stored in ecx register) which points to vtbl table.If I am correct new operator implementation returns pointer to the newly heap allocated object.VTBL table stores function pointers to class member functions.

T_C
初學者
4,287 檢視

iliyapolak wrote:

>>>Ok but the line of code:

object->virtualFunction();

is translated in to ASM. This ASM will then have an instruction address>>>

Usually object is pointed by *this pointer(in Windows it is stored in ecx register) which points to vtbl table.If I am correct new operator implementation returns pointer to the newly heap allocated object.VTBL table stores function pointers to class member functions.

Yes- I don't disagree with you there (its kind of what I was saying earlier). So I was expecting vtune to keep track of the branch target mispredictions for the instruction address referring to object->virtualmethod() and show me in the profiler?

Bernard
傑出貢獻者 I
4,287 檢視

Compiler will not be able to know at compile time the the exact result of the branch instruction which involves rdtsc() intrinsic call and modulo operation.The proper type of virtual function being called will be resolved by RTTI.Compiler will probably create in every branch target by comparing remainder of modulo operation which is probably stored in edx register to zero and will insert conditional jump to vtbl tables with two vptr's.

It could be nice to see assembly code.

T_C
初學者
4,287 檢視

iliyapolak wrote:

Compiler will not be able to know at compile time the the exact result of the branch instruction which involves rdtsc() intrinsic call and modulo operation.The proper type of virtual function being called will be resolved by RTTI.Compiler will probably create in every branch target by comparing remainder of modulo operation which is probably stored in edx register to zero and will insert conditional jump to vtbl tables with two vptr's.

It could be nice to see assembly code.

I understand that object->virtualfunction() will resolve to ASM (obviously). Are you saying because its a polymorphic call, resolved at run-time VTune cannot measure the branch mispredictions? That doesnt make sense because VTune can handle normal conditional branch (IF statement) mispredictions at run-time, so why can't polymorphic branch target mispredictions be displayed?

All of the mispredictions are effectively at run-time because that is when the CPU tries to predict and gets it correct or mispredicts- so I'm unsure why polymorphic branch target predictions are out of VTune's capability?

Bernard
傑出貢獻者 I
4,287 檢視

>>>So I was expecting vtune to keep track of the branch target mispredictions for the instruction address referring to object->virtualmethod() and show me in the profiler>>>

It seems strange that VTune cannot display data related to branch misprediction.I am not sure if this is related to polymorphic code execution.VTune driver which runs in kernel mode has no knowledge of code polymorphism.In very simplistic description it is only "reading" the values of the branch predicted/mispredicted counter and it is tracking current instruction pointer in order to resolve the functions with the help of pdb files(this could be done by  different module).

T_C
初學者
4,287 檢視

iliyapolak wrote:

>>>So I was expecting vtune to keep track of the branch target mispredictions for the instruction address referring to object->virtualmethod() and show me in the profiler>>>

It seems strange that VTune cannot display data related to branch misprediction.I am not sure if this is related to polymorphic code execution.VTune driver which runs in kernel mode has no knowledge of code polymorphism.In very simplistic description it is only "reading" the values of the branch predicted/mispredicted counter and it is tracking current instruction pointer in order to resolve the functions with the help of pdb files(this could be done by  different module).

When I profiled the example code I posted above I couldnt see any branch misprediction metrics for 

p->f();

Are you able to put my code in to your environment and see whether you can?

Bernard
傑出貢獻者 I
4,287 檢視

T C wrote:

Quote:

iliyapolak wrote:

Compiler will not be able to know at compile time the the exact result of the branch instruction which involves rdtsc() intrinsic call and modulo operation.The proper type of virtual function being called will be resolved by RTTI.Compiler will probably create in every branch target by comparing remainder of modulo operation which is probably stored in edx register to zero and will insert conditional jump to vtbl tables with two vptr's.

It could be nice to see assembly code.

 

I understand that object->virtualfunction() will resolve to ASM (obviously). Are you saying because its a polymorphic call, resolved at run-time VTune cannot measure the branch mispredictions? That doesnt make sense because VTune can handle normal conditional branch (IF statement) mispredictions at run-time, so why can't polymorphic branch target mispredictions be displayed?

All of the mispredictions are effectively at run-time because that is when the CPU tries to predict and gets it correct or mispredicts- so I'm unsure why polymorphic branch target predictions are out of VTune's capability?

Hi

You probably misunderstood me.My previous post #10  was about possible implementation of main() function code at machine code level.

I am sure that VTune kernel  driver which is accessing CPU branch prediction/misprediction counters can track such a type of branch as provided by your code.It is probably some higher level module of VTune which is responsible for parsing and analyzing the code being profiled.

Bernard
傑出貢獻者 I
4,287 檢視

<<<Are you able to put my code in to your environment and see whether you can?>>>

Yes I will test your code today and provide the result.

 

T_C
初學者
4,287 檢視

iliyapolak wrote:

Quote:

T C wrote:

Quote:

iliyapolak wrote:

Compiler will not be able to know at compile time the the exact result of the branch instruction which involves rdtsc() intrinsic call and modulo operation.The proper type of virtual function being called will be resolved by RTTI.Compiler will probably create in every branch target by comparing remainder of modulo operation which is probably stored in edx register to zero and will insert conditional jump to vtbl tables with two vptr's.

It could be nice to see assembly code.

 

I understand that object->virtualfunction() will resolve to ASM (obviously). Are you saying because its a polymorphic call, resolved at run-time VTune cannot measure the branch mispredictions? That doesnt make sense because VTune can handle normal conditional branch (IF statement) mispredictions at run-time, so why can't polymorphic branch target mispredictions be displayed?

All of the mispredictions are effectively at run-time because that is when the CPU tries to predict and gets it correct or mispredicts- so I'm unsure why polymorphic branch target predictions are out of VTune's capability?

 

Hi

You probably misunderstood me.My previous post #10  was about possible implementation of main() function code at machine code level.

I am sure that VTune kernel  driver which is accessing CPU branch prediction/misprediction counters can track such a type of branch as provided by your code.It is probably some higher level module of VTune which is responsible for parsing and analyzing the code being profiled.

I misunderstood :)

By looking at this question on branch target buffer predictions:

http://software.intel.com/en-us/forums/topic/392268

it looks like I should be seeing branch target misprediction metrics in the "branch mispredictions" column when you're in the source view for the line of the polymorphic function?

?

Bernard
傑出貢獻者 I
4,287 檢視

Lets wait for the response from Intel engineers.

Bernard
傑出貢獻者 I
4,287 檢視

Hi

I performed VTune general exploration analysis on your code and can confirm your results.That's mean no branch misprediction data was collected on polymorphic code.

T_C
初學者
4,287 檢視

Intel- please tell me there is a solution/fix to measuring branch target predictions for polymorphic methods?

Peter_W_Intel
員工
4,287 檢視

@ T C

Thanks for your test case. However I cannot see any problem with "branch misprediction" metric in report:

I used Intel C/C++ Composer 13.0 SP1 to build - "icpc -g ploymorphic.cpp -o ploymorphic", then use general exploration analysis to profile.

In result, I can see misprediction metric in main() function which ran Child1-f() and f(). Please see attached screen-shots

Regards, Peter

Peter_W_Intel
員工
4,257 檢視

I attached asm report of VTune Amplifier XE, you can see statement of "p = new Child1();" - which called Child1->f(), then jump to statement "p->f()"  (<Block 12>) in assembly code)- which caused branch misprediction high. 

回覆