Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
4554 Discussions

compare two FIFO implementation

softarts
Beginner
109 Views
see the single-producer/single consumer FIFO code below,seems code1 is compact,and only have 1 if-else branch.

time: the former cost about avg. 160ns per insert (by clock_gettime) latter cost about 20 ns per insert(surprised me! ).
the CPI result:former is about 30,latter is about 8.

I don't check other events

the reason might be the code1 has an expensive insruct "div",but how to prove it?


code1:

bool insert( const T &rtItem )
{
uint32_t ww = (uint32_t) atomic_read( &w );
uint32_t rr = (uint32_t) atomic_read( &r );

if (predict_true(ww - rr != size))
{
f[ ww % size ] = rtItem;
atomic_inc( &w );
return true;
}
return false;
}

/////////////////////
code2:
bool insert( const T &rtItem )
{
uint32_t nw;
bool tRet = false;
uint32_t ww = (uint32_t) atomic_read( &w );
uint32_t rr = (uint32_t) atomic_read( &r );

if ( predict_true( ( ww + 1 ) < size ) )
nw = ww + 1;
else
nw = 0;

if ( predict_true( nw != rr ) )
{
f[ ww ] = rtItem;
atomic_set( &w, nw );
tRet = true;
}
return tRet;
}
0 Kudos
1 Reply
Vladimir_T_Intel
Moderator
109 Views
Quoting - softarts
the reason might be the code1 has an expensive insruct "div",but how to prove it?

Look a the cycle count event distribution over the code (disassembly view would be more useful). You can find out which instruction or group of instructions cost the most clock ticks.
Reply