Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5098 Discussions

Multi-core issue with vtune profiler

mllouay
Beginner
562 Views

Hi,

I have a dual-core board that I'm running vtune performance analyzer for Linux to check the performance of my application "pthread_test".

I have done two small multi-threaded programs. The first, has a main and a thread. The thread has only a while(1) loop running. The main also, only has a while(1) loop running. The second program, has a main only with no threads. This main also only has a while(1) loop running.

When I run the first program (main and a thread), and use "top" program in Linux, I see that the program takes 200% (each core represents a 100%). Moreover, when I run the second program (main only), I see that my program takes 100% (one main runs on one core).

On the other hand, when I run the same two programs and run vtune Performance Analyzer program on Linux, I see that my first application (main and a thread) takes 99.65 clockticks %, and the second program takes 97.67 clockticks %.

One thread can't take more then 100% of one core's processing. Just like in top (200% --> 100% for each core). If vtune Performance Analyzer clockticks % numbers are for both cores, how come in vtune Performance Analyzer I see that my program takes ~99% of both cores? I should see the main taking ~50% of both cores and the thread taken ~50% of both cores? This is a total of 100% for both cores.

I would like to know how does Performance Analyzer work with Dual Cores?
clockticks percentages

How are the output numbers (clockticks) for percentages work when using Dual Cores? Are they for one core or are they for both cores?

Note: I have included those two programs below.

Thanks for your help,

Louay M.


Program One ( main and thread ):

#include
#include
#include

void * threadwhile_1_Thread ( void * arg )
{ while(1) { } }

int main ( void )
{
pthread_t while_1_Thread;
int Thread_Id_11;

Thread_Id_11 = pthread_create ( &while_1_Thread, NULL, threadwhile_1_Thread, NULL );

while( 1 )
{ }

return 0;
}


Program Two ( main only):

#include
#include

int main ( void )
{
while( 1 )
{ }

return 0;
}

0 Kudos
1 Reply
Vladimir_T_Intel
Moderator
562 Views

Do not mess top utility results (CPU load) and results of VTune Event Based Sampling.

The top utility represents CPU workload based on the calculation of relation of idle (halted) and busy CPU Clockticks during a time interval.

VTune represents the number of un-halted Clockticks that it took to execute your application during collection. 100% Clockticks against your process would mean that during the collection CPU executed only your application.

Example. Consider a simple program which executing sort of while(1){} cycle for half of a second and sleeps another half of the second, then again executing the cycle, then again sleeps, and so on. If integration interval of top utility is 1 sec, it will show 50% CPU workload. VTune event based sampling will show close to 100% Clockticks result, providing that the application is the only lively process in the system.

Lets get back to your example.

Case 1: One-threaded simple application (sort of while(1){} cycle inside) executed on dual core.

As result of event based sampling you will get a bit less then 100% Clockticks as total and N number of Clocktick events. If you decompose the results to Processor0 and Processor1, you will see bit less then 100% Clockticks against both Processors. Surprising? You will get the idea when you check the Clockticks event numbers against Processors. The sum of N0 and N1 will be equal to N. Two conclusions can be made here:

- OS scheduled execution of the application on both cores in alternate way.

- On both cores the execution of the application took almost 100% of UNHALTED Clockticks. VTune does not count halted Clockticks (unlike top).

Case 2: Two-threaded simple application (sort of while(1){} cycle inside) executed on dual core.

As result of event based sampling you will see the same picture regarding the percentage. However, the total number N of collected Clockticks will be twice as many as in the first case, in condition the running time of the applications was the same.

I hope this answered your questions.

0 Kudos
Reply