- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the applicationcapture packet from the NIC (use PACKET_MMAP mechanism,copy data from kernel memory,no much SYS call)
even I slow the network traffic to 2-3 Mbps,the application still perform very badly,only capture 200 packets/second
without vtune it can capture 800Mbps+ packet (200K packets/s)
whyperformance impacted so much?
the code like this:
while (1)
{
buffer=new Buffer();
memcpy(buffer,kernel_buffer,size)
...
}
//add
one more question:
also found pthread_spin_lock time is very high,does vtune perform a 'real' thread on multi-core pc?
or it is just a 'fake' multithread? by use SW to simulate multithread?
my vtune is linux 9.1, downloaded on 2008/11/28
even I slow the network traffic to 2-3 Mbps,the application still perform very badly,only capture 200 packets/second
without vtune it can capture 800Mbps+ packet (200K packets/s)
whyperformance impacted so much?
the code like this:
while (1)
{
buffer=new Buffer();
memcpy(buffer,kernel_buffer,size)
...
}
//add
one more question:
also found pthread_spin_lock time is very high,does vtune perform a 'real' thread on multi-core pc?
or it is just a 'fake' multithread? by use SW to simulate multithread?
my vtune is linux 9.1, downloaded on 2008/11/28
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - softarts
the applicationcapture packet from the NIC (use PACKET_MMAP mechanism,copy data from kernel memory,no much SYS call)
even I slow the network traffic to 2-3 Mbps,the application still perform very badly,only capture 200 packets/second
without vtune it can capture 800Mbps+ packet (200K packets/s)
whyperformance impacted so much?
the code like this:
while (1)
{
buffer=new Buffer();
memcpy(buffer,kernel_buffer,size)
...
}
even I slow the network traffic to 2-3 Mbps,the application still perform very badly,only capture 200 packets/second
without vtune it can capture 800Mbps+ packet (200K packets/s)
whyperformance impacted so much?
the code like this:
while (1)
{
buffer=new Buffer();
memcpy(buffer,kernel_buffer,size)
...
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Vladimir Tsymbal (Intel)
This is just my best guess. Depending on SAV set in VTune and the frequency of data being copied from user space to kernel space in your app, there could be mutual interference between VTune sampling driver interruption handler and the application. Try to increase the SAV for the events, and keep the number of events being collected small.
it's not sample,but call graph
will too much profile work in kernel impact this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
VTune call graph certainly may be expected to kill performance of an application with real time aspects. Yes, it's single threaded. gprof could perform call graphing with much less distortion of performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - softarts
it's not sample,but call graph
It's better to specify the type of analysis in the original question. It would help us to make less guesses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
for this program,vtune/call graph also got a different result:
the purpose is to compare 'if-else' with 'func pointer array'
and measure the branch predict impact.
the program output:
test caseA
branchA time=2510484704
test caseB
branchB time=1934092704
but the vtune show a different result:
time caseB spent much more time than caseA(about 5:1)
why call graph shows such differences?
----code----------------------
inline void HandleFuncA0(int y)
{ count=y+3;}
inline void HandleFuncA1(int y)
{ count=y+7;}
inline void HandleFuncA2(int y)
{ count=y-2;}
...
inline void BranchA2(int x)
{
if (x>RANGE*9/10)
HandleFuncA1(x);
else if (x>RANGE*8/10)
HandleFuncA2(x);
...
}
inline void BranchB(int x)
{
array
}
void test_caseA()
{
printf("test caseAn");
timespec l_startTime,l_endTime,l_interval;
int ret;
long long i_period;
clock_gettime(CLOCK_REALTIME,&l_startTime);
for (int i =0;i
BranchA2(inp[i%3000]); //inp[] has been initialized with rand()
}
clock_gettime(CLOCK_REALTIME,&l_endTime);
ret = delta_t(&l_interval, &l_startTime, &l_endTime);
i_period = l_interval.tv_sec*1000000000+l_interval.tv_nsec;
printf("branchA time=%un",i_period);
}
void test_case4B()
{
.../same as test_caseA
for (xxx)
BranchB(inp[i%3000]); //inp[] has been initialized with rand()
...
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - softarts
for this program,vtune/call graph also got a different result:
the purpose is to compare 'if-else' with 'func pointer array'
and measure the branch predict impact.
the program output:
test caseA
branchA time=2510484704
test caseB
branchB time=1934092704
but the vtune show a different result:
time caseB spent much more time than caseA(about 5:1)
why call graph shows such differences?

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page