Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Piyush_J_Intel
Employee
72 Views

Vtune Number of Cores Ambiguity

Hi, While running Vtune Amplifier XE 2015, I am encountering a situation which I am not able to understand. I have a program which I am running on my i5-4300U, 64bit Operating System and the program is completely unparallelized and hence according to my understanding should run on a single core. Now when I am running the Basic Hotspot Analysis for the program, I can see that only thread is being spawned but when I run Advanced Hotspot Analysis, and go to Bottum Up and choose "Core / H/W / Function / Call Stack" as my grouping, it shows my 2 cores are being used. Now it is unclear to me, how and why this is happening. Its the same program whose analysis I am doing two different ways. And also when I parallelize the program for 2 cores using pthreads, still it shows 2 cores being used. Please explain me what exactly is happening so that I can do proper analysis. I have attached both the results for your perusal.

0 Kudos
7 Replies
David_A_Intel1
Employee
72 Views

Unless you *pin* the thread to a core, the OS can schedule the thread to run on any core that it wants to.  You will normally see the thread bouncing back and forth between cores.

Piyush_J_Intel
Employee
72 Views

Hi, Thanks for the reply. Is there a way to achieve that in visual studio in C++ project. Also would not bouncing back and forth give a worse performance because of the loss of locality of cache and TLB? So to the best of my knowledge shouldn't this be avoided?

David_A_Intel1
Employee
72 Views

Google shows this: SetThreadAffinityMask function.  Which states:

Setting an affinity mask for a process or thread can result in threads receiving less processor time, as the system is restricted from running the threads on certain processors. In most cases, it is better to let the system select an available processor.

WRT cache and TLB, I'm guess I'm unsure.  Some benchmarks do pin the single thread, so maybe it matters.  You could try it and measure performance and see if it makes any difference. ;)  Usually, though, it is more important to multi-thread your code than worry about pinning a single threaded app!

TimP
Black Belt
72 Views

Piyush J. (Intel) wrote:

Hi, Thanks for the reply. Is there a way to achieve that in visual studio in C++ project. Also would not bouncing back and forth give a worse performance because of the loss of locality of cache and TLB? So to the best of my knowledge shouldn't this be avoided?

Yes, it's possible that migrating between cores will affect the correlation of the various events you have selected.  On a single CPU with 1 thread, shared last level cache, the effect would not normally be large.  If it's a long run, you could open Task Manager and pin threads there.  With 2 threads on a 2 core HyperThreaded CPU, you would want to pin them to separate CPUs, otherwise you could spend much time with both on a single CPU, and, in my experience, reduce the turbo clock speedup even for intervening single thread application segments. I would either run via a script (.bat) or (if running Intel OpenMP or /Qparallel) set OMP_PROC_BIND=spread in the VTune environment variables.  It should work also when running OpenMP with 1 thread.

Piyush_J_Intel
Employee
72 Views

Hi Tim, Thank you for the explanation. This is exactly what I want to achieve, that when I am running only one thread, it should be confined to only one Core and when running 2 threads, they should be working separately  on two different cores. I was thinking of turning off completely all the cores except 1 core via System Configuration->Advanced Boot Option to make sure that when running only  a single thread, one core is being utilized. And when running two cores just pin separate threads to separate cores using what MrAnderson suggested above " SetThreadAffinityMask function".  Unfortunately I am using pthreads, and although pthread has similar command for pinning like OpenMP, I am unsure if it would work on windows, though openMP is calling pthread only in the background.

TimP
Black Belt
72 Views

Intel OpenMP for Windows doesn't use pthreads (libgomp for gcc does).  So I suppose you would use pthread setaffinity  to accomplish this, and it should work for either 1 or 2 threads.

It took me an embarrassingly long time to realize that these i5 CPUs are hyperthreaded even if there is no BIOS option to control it.

Piyush_J_Intel
Employee
72 Views

Oh ok. That should work then. Thanks a lot for all the information.Yes, but I am not too worried about hyperthreading as I can not think of a situation where that would hamper my performance results(if not improve them), but I was more worried about one thread being bounced between different cores. Because with that happening it was getting difficult for me to judge the exact number of cores that should be allocated to the program.

Reply