Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4994 Discussions

compare two FIFO implementation

softarts
Beginner
337 Views
see the single-producer/single consumer FIFO code below,seems code1 is compact,and only have 1 if-else branch.

time: the former cost about avg. 160ns per insert (by clock_gettime) latter cost about 20 ns per insert(surprised me! ).
the CPI result:former is about 30,latter is about 8.

I don't check other events

the reason might be the code1 has an expensive insruct "div",but how to prove it?


code1:

bool insert( const T &rtItem )
{
uint32_t ww = (uint32_t) atomic_read( &w );
uint32_t rr = (uint32_t) atomic_read( &r );

if (predict_true(ww - rr != size))
{
f[ ww % size ] = rtItem;
atomic_inc( &w );
return true;
}
return false;
}

/////////////////////
code2:
bool insert( const T &rtItem )
{
uint32_t nw;
bool tRet = false;
uint32_t ww = (uint32_t) atomic_read( &w );
uint32_t rr = (uint32_t) atomic_read( &r );

if ( predict_true( ( ww + 1 ) < size ) )
nw = ww + 1;
else
nw = 0;

if ( predict_true( nw != rr ) )
{
f[ ww ] = rtItem;
atomic_set( &w, nw );
tRet = true;
}
return tRet;
}
0 Kudos
1 Reply
Vladimir_T_Intel
Moderator
337 Views
Quoting - softarts
the reason might be the code1 has an expensive insruct "div",but how to prove it?

Look a the cycle count event distribution over the code (disassembly view would be more useful). You can find out which instruction or group of instructions cost the most clock ticks.
0 Kudos
Reply