- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a dual-core board that I'm running vtune performance analyzer for Linux
to check the performance of my application "pthread_test".
I have done two small multi-threaded programs. The first, has a main and
a thread. The thread has only a while(1) loop running. The main
also, only has a while(1) loop running. The second program, has a main
only with no threads. This main also only has a while(1) loop running.
When I run the first program (main and a thread), and use "top"
program in Linux, I see that the program takes 200% (each core represents a
100%). Moreover, when I run the second program (main only), I see that my
program takes 100% (one main runs on one core).
On the other hand, when I run the same two programs and run vtune Performance
Analyzer program on Linux, I see that my first application (main and a thread)
takes 99.65 clockticks %, and the second program takes 97.67 clockticks %.
One thread can't take more then 100% of one core's processing. Just like
in top (200% --> 100% for each core). If vtune Performance Analyzer
clockticks % numbers are for both cores, how come in vtune Performance Analyzer
I see that my program takes ~99% of both cores? I should see the main
taking ~50% of both cores and the thread taken ~50% of both cores? This
is a total of 100% for both cores.
I would like to know how does Performance Analyzer work with Dual Cores?
clockticks percentages
How are the output numbers (clockticks) for percentages work when using Dual
Cores? Are they for one core or are they for both cores?
Note: I have included those two programs below.
Thanks for your help,
Louay M.
Program One ( main and thread ):
#include
#include
#include
void * threadwhile_1_Thread ( void * arg )
{ while(1) { }
}
int main ( void )
{
pthread_t while_1_Thread;
int Thread_Id_11;
Thread_Id_11 = pthread_create ( &while_1_Thread, NULL,
threadwhile_1_Thread, NULL );
while( 1 )
{ }
return 0;
}
Program Two ( main only):
#include
#include
int main ( void )
{
while( 1 )
{ }
return 0;
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do not mess top utility results (CPU load) and results of VTune Event Based Sampling.
The top utility represents CPU workload based on the calculation of relation of idle (halted) and busy CPU Clockticks during a time interval.
VTune represents the number of un-halted Clockticks that it took to execute your application during collection. 100% Clockticks against your process would mean that during the collection CPU executed only your application.
Example. Consider a simple program which executing sort of while(1){} cycle for half of a second and sleeps another half of the second, then again executing the cycle, then again sleeps, and so on. If integration interval of top utility is 1 sec, it will show 50% CPU workload. VTune event based sampling will show close to 100% Clockticks result, providing that the application is the only lively process in the system.
Lets get back to your example.
Case 1: One-threaded simple application (sort of while(1){} cycle inside) executed on dual core.
As result of event based sampling you will get a bit less then 100% Clockticks as total and N number of Clocktick events. If you decompose the results to Processor0 and Processor1, you will see bit less then 100% Clockticks against both Processors. Surprising? You will get the idea when you check the Clockticks event numbers against Processors. The sum of N0 and N1 will be equal to N. Two conclusions can be made here:
- OS scheduled execution of the application on both cores in alternate way.
- On both cores the execution of the application took almost 100% of UNHALTED Clockticks. VTune does not count halted Clockticks (unlike top).
Case 2: Two-threaded simple application (sort of while(1){} cycle inside) executed on dual core.
As result of event based sampling you will see the same picture regarding the percentage. However, the total number N of collected Clockticks will be twice as many as in the first case, in condition the running time of the applications was the same.
I hope this answered your questions.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page