- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes. Workload is no change, there is no direct indicator to compare them, so I suggested to use CL& execution time.
Sometime you can use Critical Pathdata to compare with serial result. I assume that you have reassigned work to different thread, and start them at almost same time stamp. So works terminated in threads at different time:
T1 T2T3 T4 T5
w1
w2
w3
So CP = T4, to compare this with serial result.
Thanks, Peter
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
VTune Amplifier XE can help to identify the hotspots, and have two kinds of improvement usually:
1. The workload of hot function can be parallelized (you've done), soit's best utilize the multi-core system,as result itreduced the execution time. You can use Concurrency Analysis to know ifconcurrency level gets better. You are right- all workloads arenot reduced, butparallelized, so execution time of program is reduced in Summary report.Youmight review bottom-up report by using grouping "Thread / Function / Call stack" to know parallel workload in each thread. Observe them - imbalanced? adjust algorithm again?
2. After completing parallelling work, we can step into Microarchitecture level turning - such Branch Misprediction issue, Cache Misses, etc. Your adjust code or use Intel C++ compiler's advanced optimization options. As result, execution time of hot functions will be reduced - that is quite different from parallelism optimization.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes. Workload is no change, there is no direct indicator to compare them, so I suggested to use CL& execution time.
Sometime you can use Critical Pathdata to compare with serial result. I assume that you have reassigned work to different thread, and start them at almost same time stamp. So works terminated in threads at different time:
T1 T2T3 T4 T5
w1
w2
w3
So CP = T4, to compare this with serial result.
Thanks, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page