- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I try to identify why the same function has a higher CPI ratio when executed in parallel than when executed by a single thread and I don't understand why !
I use vtune as my performance analyser, I tried to see if there was a problem of cache miss using L1 and L2 data/instruction cache miss rate but there is nothing special. Now, I don't know what looking for.
Any idea / suggestion would be greatly appreciated.
Thanks
Olivier
I try to identify why the same function has a higher CPI ratio when executed in parallel than when executed by a single thread and I don't understand why !
I use vtune as my performance analyser, I tried to see if there was a problem of cache miss using L1 and L2 data/instruction cache miss rate but there is nothing special. Now, I don't know what looking for.
Any idea / suggestion would be greatly appreciated.
Thanks
Olivier
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When the combined memory bus utilization of the threads approaches the maximum capacity, this will happen even without any problems in caching.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, that's what I guess but is there any way to confirm it ? I don't know what vtune event to monitor or what other tool to use to be certain that's the reason of the overhead !
Thanks again.
Olivier
Thanks again.
Olivier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check the posts on bus utilization in the VTune forum, or just correlate yourself, comparing your performance scaling by function with the memory traffic rates.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page