- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm currently working on my final year project and I would like some help on doing the abovementioned.
Any suggestions on how to get started on comparing the speedup between running a program on 4 cores vs running the same thing on a single core?
So far, I tried getting data from the Branching samplings and cache samplings. However, I can't seem to find anything significant enough to explain the discrepancy of expected speedup (4x) vs actual (2.8+).
Thanks,
TM
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is far as I understand you are struggling for performance scalability of your application on the multicore system. In this case I would not recommentd to stickto CPU microarchitecture analysis and profiling (coollecting branching or cache events) - this is not right way fot tuning multithreaded application to start with. You first goal should be understanding what prevents your program to scale on 4 cores. There are many reasons for that - the most common is excessive use of data shared between threads. For the sake of correctness you protect shared data with sincronization primitives (like critical sections or semaphores). This serializes execution of the application. You have to understand the theading profile of the application and find out the critical places of the program where it's being executed serially or not using all available cores. Then you have to think of improvement of the data model. Going this way you acheive you goal faster.
There are at least two tools that can help you to do such analysis. Intel Thread Profiler - it goes along with VTune Performance Analizer. You might also be interested in new tool, Intel Parallel Amplifier, which combines many capabilities of VTune and Thread Profiler in one tool. It's a part of Intel Parallel Studio, which is in beta fase now,and you can sign up for the beta here: http://www.intel.com/go/parallel- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah.. I forgot to mention that Parallel Studio is C/C++ programming oriented. Thanks for noticing that.
WRT OpenMP, Thread Profiler works fine if /Qopenmp switch is used while compilation.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page