Community
cancel
Showing results for 
Search instead for 
Did you mean: 
nikorasu
Beginner
44 Views

Performance Analysis of multicores

Hi everyone,

I'm currently working on my final year project and I would like some help on doing the abovementioned.

Any suggestions on how to get started on comparing the speedup between running a program on 4 cores vs running the same thing on a single core?

So far, I tried getting data from the Branching samplings and cache samplings. However, I can't seem to find anything significant enough to explain the discrepancy of expected speedup (4x) vs actual (2.8+).

Thanks,
TM
0 Kudos
3 Replies
Vladimir_T_Intel
Moderator
44 Views

Hi,

Is far as I understand you are struggling for performance scalability of your application on the multicore system. In this case I would not recommentd to stickto CPU microarchitecture analysis and profiling (coollecting branching or cache events) - this is not right way fot tuning multithreaded application to start with. You first goal should be understanding what prevents your program to scale on 4 cores. There are many reasons for that - the most common is excessive use of data shared between threads. For the sake of correctness you protect shared data with sincronization primitives (like critical sections or semaphores). This serializes execution of the application. You have to understand the theading profile of the application and find out the critical places of the program where it's being executed serially or not using all available cores. Then you have to think of improvement of the data model. Going this way you acheive you goal faster.

There are at least two tools that can help you to do such analysis. Intel Thread Profiler - it goes along with VTune Performance Analizer. You might also be interested in new tool, Intel Parallel Amplifier, which combines many capabilities of VTune and Thread Profiler in one tool. It's a part of Intel Parallel Studio, which is in beta fase now,and you can sign up for the beta here: http://www.intel.com/go/parallel
TimP
Black Belt
44 Views

There are at least two tools that can help you to do such analysis. Intel Thread Profiler - it goes along with VTune Performance Analizer. You might also be interested in new tool, Intel Parallel Amplifier, which combines many capabilities of VTune and Thread Profiler in one tool. It's a part of Intel Parallel Studio, which is in beta fase now,and you can sign up for the beta here: http://www.intel.com/go/parallel
Intel OpenMP includes a link option openmp_profile, which is better suited than Thread Profiler for initial investigation with OpenMP. Parallel Studio doesn't support Fortran, in case that was your choice.
Vladimir_T_Intel
Moderator
44 Views

Quoting - tim18
Intel OpenMP includes a link option openmp_profile, which is better suited than Thread Profiler for initial investigation with OpenMP. Parallel Studio doesn't support Fortran, in case that was your choice.

Yeah.. I forgot to mention that Parallel Studio is C/C++ programming oriented. Thanks for noticing that.
WRT OpenMP, Thread Profiler works fine if /Qopenmp switch is used while compilation.