Hi,
I am analyzing the different between two designs which process millions of messages. One design uses polymorphism and the other doesnt- each message will be represented by a polymorphic sub type.
I have profiled both designs using VTune. The High-level summary data seems to make sense- the polymorphic design has a higher "branch mispredict" rate, higher CPI and higher "ICache misses" rate than the non-polymorphic version implemented with IF statements.
The polymorphic design has a line of source code like this:
object->virtualFunction();
and this is called millions of times (where the sub type changes each time). I am expecting the polymorphic design to be slower because of branch target mispredictions/instruction misses. As said above, the VTune "summary" tab seems to confirm this. However, when I go to the metrics next to the line of source code there are absolutely no metrics except for:
- Filled pipeline slots total -> Retiring -> General retirement
- Filled pipeline slots self -> Retiring -> General retirement
- Unfilled pipeline slots total -> Front end bound -> Front end bandwidth -> Front end bandwidth MITE
- Unfilled pipeline slots self -> Front end bound -> Front end bandwidth -> Front end bandwidth MITE
None of the branch prediction columns have data, nor do the instruction cache miss columns??
Could somebody please comment on whether this seems sensible? To me it doesn't- how can there be no branch misprediction or instruction cache miss statistics for a line of polymorphic code where the branch target will constantly be changing per message?
連結已複製
When you enabled optimization switches, some functions which are called significantly will be compiled with "inlined" - so branch instructions are reduced. You have two options to verify:
1. Disable optimization switches when compiling.
2. For using Intel C/C++ Composer 13.1 above, use "CFLAGS=-O2 -g -inline-debug-info" like to build, and turn on "inline enabled" in VTune report
If you see branch misprediction metric data in Summary tab, what are contributed in source line? See bottom-up report to know metric in function/source line.
Regards, Peter
Hi Peter, thanks for replying.
The compiler wouldn't be able to inline the polymorphic function call because it wouldn't know the exact subtype of the object (preventing inlining is another cost to Polymorphism). The exact code for the virtual method can only be known at run-time via the vptr and the vtable, surely?
Ok but the line of code:
object->virtualFunction();
is translated in to ASM. This ASM will then have an instruction address. When the CPU attempts to execute this unconditional branch it will try to predict the target branch address. If the address is right/wrong surely VTune should still be able to say for line object->virtualFunction() the CPU correctly/incorrectly guessed the branch target X%?
I'd be ever so surprised if VTune cannot profile polymorphic method calls/unconditional branches!?
Here you go:
#include <iostream>
class Parent{
public:
	    virtual void f() = 0;
};
class Child1 : public Parent{
public:
	    virtual void f(){
		        std::cout << "Child1" << std::endl;
	    }
};
class Child2 : public Parent{
public:
	    virtual void f(){
        		std::cout << "Child2" << std::endl;
	    }
};
int main(){
	    Parent* p;
	    for(int i=0; i<100000; i++){
        		if(__rdtsc() % 2 == 0){
           			 p = new Child1();
		        }
        		else{
			            p = new Child2();
		        }
		        p->f();
    	}
}
>>>Ok but the line of code:
object->virtualFunction();
is translated in to ASM. This ASM will then have an instruction address>>>
Usually object is pointed by *this pointer(in Windows it is stored in ecx register) which points to vtbl table.If I am correct new operator implementation returns pointer to the newly heap allocated object.VTBL table stores function pointers to class member functions.
iliyapolak wrote:
>>>Ok but the line of code:
object->virtualFunction();
is translated in to ASM. This ASM will then have an instruction address>>>
Usually object is pointed by *this pointer(in Windows it is stored in ecx register) which points to vtbl table.If I am correct new operator implementation returns pointer to the newly heap allocated object.VTBL table stores function pointers to class member functions.
Yes- I don't disagree with you there (its kind of what I was saying earlier). So I was expecting vtune to keep track of the branch target mispredictions for the instruction address referring to object->virtualmethod() and show me in the profiler?
Compiler will not be able to know at compile time the the exact result of the branch instruction which involves rdtsc() intrinsic call and modulo operation.The proper type of virtual function being called will be resolved by RTTI.Compiler will probably create in every branch target by comparing remainder of modulo operation which is probably stored in edx register to zero and will insert conditional jump to vtbl tables with two vptr's.
It could be nice to see assembly code.
iliyapolak wrote:
Compiler will not be able to know at compile time the the exact result of the branch instruction which involves rdtsc() intrinsic call and modulo operation.The proper type of virtual function being called will be resolved by RTTI.Compiler will probably create in every branch target by comparing remainder of modulo operation which is probably stored in edx register to zero and will insert conditional jump to vtbl tables with two vptr's.
It could be nice to see assembly code.
I understand that object->virtualfunction() will resolve to ASM (obviously). Are you saying because its a polymorphic call, resolved at run-time VTune cannot measure the branch mispredictions? That doesnt make sense because VTune can handle normal conditional branch (IF statement) mispredictions at run-time, so why can't polymorphic branch target mispredictions be displayed?
All of the mispredictions are effectively at run-time because that is when the CPU tries to predict and gets it correct or mispredicts- so I'm unsure why polymorphic branch target predictions are out of VTune's capability?
>>>So I was expecting vtune to keep track of the branch target mispredictions for the instruction address referring to object->virtualmethod() and show me in the profiler>>>
It seems strange that VTune cannot display data related to branch misprediction.I am not sure if this is related to polymorphic code execution.VTune driver which runs in kernel mode has no knowledge of code polymorphism.In very simplistic description it is only "reading" the values of the branch predicted/mispredicted counter and it is tracking current instruction pointer in order to resolve the functions with the help of pdb files(this could be done by different module).
iliyapolak wrote:
>>>So I was expecting vtune to keep track of the branch target mispredictions for the instruction address referring to object->virtualmethod() and show me in the profiler>>>
It seems strange that VTune cannot display data related to branch misprediction.I am not sure if this is related to polymorphic code execution.VTune driver which runs in kernel mode has no knowledge of code polymorphism.In very simplistic description it is only "reading" the values of the branch predicted/mispredicted counter and it is tracking current instruction pointer in order to resolve the functions with the help of pdb files(this could be done by different module).
When I profiled the example code I posted above I couldnt see any branch misprediction metrics for
p->f();
Are you able to put my code in to your environment and see whether you can?
T C wrote:
Quote:
iliyapolak wrote:Compiler will not be able to know at compile time the the exact result of the branch instruction which involves rdtsc() intrinsic call and modulo operation.The proper type of virtual function being called will be resolved by RTTI.Compiler will probably create in every branch target by comparing remainder of modulo operation which is probably stored in edx register to zero and will insert conditional jump to vtbl tables with two vptr's.
It could be nice to see assembly code.
I understand that object->virtualfunction() will resolve to ASM (obviously). Are you saying because its a polymorphic call, resolved at run-time VTune cannot measure the branch mispredictions? That doesnt make sense because VTune can handle normal conditional branch (IF statement) mispredictions at run-time, so why can't polymorphic branch target mispredictions be displayed?
All of the mispredictions are effectively at run-time because that is when the CPU tries to predict and gets it correct or mispredicts- so I'm unsure why polymorphic branch target predictions are out of VTune's capability?
Hi
You probably misunderstood me.My previous post #10 was about possible implementation of main() function code at machine code level.
I am sure that VTune kernel driver which is accessing CPU branch prediction/misprediction counters can track such a type of branch as provided by your code.It is probably some higher level module of VTune which is responsible for parsing and analyzing the code being profiled.
iliyapolak wrote:
Quote:
T C wrote:Quote:
iliyapolak wrote:
Compiler will not be able to know at compile time the the exact result of the branch instruction which involves rdtsc() intrinsic call and modulo operation.The proper type of virtual function being called will be resolved by RTTI.Compiler will probably create in every branch target by comparing remainder of modulo operation which is probably stored in edx register to zero and will insert conditional jump to vtbl tables with two vptr's.
It could be nice to see assembly code.
I understand that object->virtualfunction() will resolve to ASM (obviously). Are you saying because its a polymorphic call, resolved at run-time VTune cannot measure the branch mispredictions? That doesnt make sense because VTune can handle normal conditional branch (IF statement) mispredictions at run-time, so why can't polymorphic branch target mispredictions be displayed?
All of the mispredictions are effectively at run-time because that is when the CPU tries to predict and gets it correct or mispredicts- so I'm unsure why polymorphic branch target predictions are out of VTune's capability?
Hi
You probably misunderstood me.My previous post #10 was about possible implementation of main() function code at machine code level.
I am sure that VTune kernel driver which is accessing CPU branch prediction/misprediction counters can track such a type of branch as provided by your code.It is probably some higher level module of VTune which is responsible for parsing and analyzing the code being profiled.
I misunderstood :)
By looking at this question on branch target buffer predictions:
http://software.intel.com/en-us/forums/topic/392268
it looks like I should be seeing branch target misprediction metrics in the "branch mispredictions" column when you're in the source view for the line of the polymorphic function?
?
@ T C
Thanks for your test case. However I cannot see any problem with "branch misprediction" metric in report:
I used Intel C/C++ Composer 13.0 SP1 to build - "icpc -g ploymorphic.cpp -o ploymorphic", then use general exploration analysis to profile.
In result, I can see misprediction metric in main() function which ran Child1-f() and f(). Please see attached screen-shots
Regards, Peter
 
					
				
				
			
		
