- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
time to transfer (us) | BW (GB/s) | |||
Store Type | 8MB | 2GB | 8MB | 2GB |
Regular store | 62 | 28912 | 129.0322581 | 69.17542889 |
Vector store | 147 | 48228 | 54.42176871 | 41.46968566 |
Vector NT store | 105 | 33625 | 76.19047619 | 59.4795539 |
It looks like vectored stores including Non temporal (NT) is slower and have less throughput than the regular 'store'. It is difficult to explain this result since at least Vector NT store instructions should ideally save bandwidth and produce a high throughput when message size is sufficiently larger than the cache. Is there any reason for this behavior ? Appreciate your feedback on this
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think some serious work could be done in the area of prefetching and core architecture. This may only apply (at first) to high-end systems.
Considering the complications with implementing TSX and HLE and compared to what I suggest next and you will not think it out of the realm of possibility. At compiler determined point of the code (and/or via #pragma or intrinsic) a specialty prefetcher thread, invisible to the O/S and user, is activated by the processor. Each hardware thread has a specialty prefetcher thread. At the point of activation, it executes in parallel with the code that activated it, and runs ahead of the normal thread, however it has diminished capacity. It can see and decode all the instructions, however, other than for instructions that manipulate those necessary to produce addressing, the instructions are no-oped other than for cache line fetching. At a closure point in the code, the prefetcher is shut down to conserve power and resources.
This won't necessarily be easy, in light of page faults should they happen. The nice part is prefetching will be performed regardless of TLB misses and such that interfere with an actual memory read. Intel engineers could simulate this to investigate its worthiness (though NIH syndrome may produce some resistance).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John,
On a different thread on IDZ forums I suggested someone experiment with TSX (I do not have such a system)
The idea would be for the shepard thread, to enter a TSX region and perform a memmove of a block to be prefetched that fits in the transaction buffer, then move it back. Exit the transaction, wait for next request.
Note, RAM will be read and cached but not written ro RAM. The transaction system will (should) undo (elide) the writes.
Do you have a system with TSX?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BKMs: Are there any BKMs from this discussion that can be useful to the community? Casually reading, it seems as if there are: DRAM access & collisions & # of threads executed per core; the relationship between array elements per OpenMP threads; etc.
I encourage you to create a blog that outlines these BKMs in a more concise way. Also, it makes promoting it to the community easier.
DISCUSSION: Great! I've really enjoyed it even as a passive follower.
ASIDE: Jim, your "Chronicles" series is one of the more popular reads on software.intel.com (i.e. it is broadly read).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Taylor,
>>ASIDE: Jim, your "Chronicles" series is one of the more popular reads on software.intel.com (i.e. it is broadly read).
Thanks for the feedback. The IDZ blogs page has no mechanism for the poster to see traffic on, nor the community to rank, the articles. As such, it is difficult for me (or other posters I imagine) to determine if they are doing a good job. It took a lot of effort to put together that 5-part series, it would be nice to know if it is being read and appreciated. Some sites do include ratings. Could you try to influence the blogs site manager to see if they could add a ranking system.
Regards,
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
My thoughts exactly. I discussed this briefly with the person in charge of software.intel.com marketing for the MIC community. He agrees that at the very least, we need to have some way of acknowledging the impact of contributions like yours.
I'll continue pursuing this since it is important. I can't promise that anything will happen soon.
Regards
--
Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This thread has come up with interesting information, not necessarily all related to the original question.
If Jim's blogs are getting significant viewing in spite of the difficulty of navigation on that site, I'm impressed; there must be motivated searchers. I found Jim's initial 3 of his announced 5 part series. I too would be interested to know what topics engage people.
I've been waiting (too long) to see whether anything would come of my efforts on queuing up for approval to post there (prior to my retirement from Intel), with annual revisions in some cases. It didn't occur to me to ask whether my retirement would remove obstacles. I had in the back of my mind the thought that the site has been overhauled without notice every couple of years, so alternate (non-Intel) sites seem more reliable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
Sheesh. Do you still have them (MIC or otherwise)? Send them to me I'll get them through the system and out on the proper forum. I'd always wondered why I saw so few articles from you.
--
Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »