- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have just read about Streaming SIMD Extensions on intel cpu's.
http://www.intel.com/cd/ids/developer/asmo-na/eng/segments/games/resources/optimization/20415.htm
Are these usable within the CVF compiler?
Thanks, TimH
http://www.intel.com/cd/ids/developer/asmo-na/eng/segments/games/resources/optimization/20415.htm
Are these usable within the CVF compiler?
Thanks, TimH
Message Edited by intel.software.network.support on 12-09-2005 10:10 AM
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No. The only SSE instructions CVF uses are those for data prefetch.
Steve
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the Data-Prefetch?
prey-tell.
prey-tell.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Prefetch is way that the compiler can tell the processor "In a little while, I'm going to touch this particular memory address, so why don't you start loading it into the cache for me if it's not already there." This is a way of reducing memory latency and can give a 10-20% boost in performance for some applications.
The compiler looks at memory reference patterns, for example, stepping through an array, and automatically inserts prefetch instructions ahead of when the data will be used, improving the chance that the data will be in the cache when needed.
This is a big help on Pentium III, but not so much on Pentium 4 where the processor itself tries to predict memory use patterns and does its own prefetching. We found, for example, that applying a Pentium III prefetch model to Pentium 4 actually made performance worse! CVF 6.6 uses a more appropriate memory system model for Pentium 4, resulting in fewer prefetch instructions issued, and better performance.
We were able to add this to CVF 6.5 because our optimizer already knew how to do prefetching for Alpha, so it was just a matter of tuning the memory model.
Steve
The compiler looks at memory reference patterns, for example, stepping through an array, and automatically inserts prefetch instructions ahead of when the data will be used, improving the chance that the data will be in the cache when needed.
This is a big help on Pentium III, but not so much on Pentium 4 where the processor itself tries to predict memory use patterns and does its own prefetching. We found, for example, that applying a Pentium III prefetch model to Pentium 4 actually made performance worse! CVF 6.6 uses a more appropriate memory system model for Pentium 4, resulting in fewer prefetch instructions issued, and better performance.
We were able to add this to CVF 6.5 because our optimizer already knew how to do prefetching for Alpha, so it was just a matter of tuning the memory model.
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks that is a useful answer.
1. Are you saying that I don't need to do anything to utilize pre-fetch as long as the exe is run on a P4?
Do I need to, at least, turn on a P4 switch at compile time?
2. Do you anticipate the ability to utilize this feature on Intel Fxx anytime soon?
thanks,
Tim
1. Are you saying that I don't need to do anything to utilize pre-fetch as long as the exe is run on a P4?
Do I need to, at least, turn on a P4 switch at compile time?
2. Do you anticipate the ability to utilize this feature on Intel Fxx anytime soon?
thanks,
Tim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In CVF, you have to compile with /arch:pn4 for Pentium 4, or /arch:host if you want to run on the same computer you're compiling for.
Intel Fortran has /QX and /QAX switches to select architecture - the /QAX variant generates code that automatically detects the running CPU type and dispatches to the appropriate code set.
Steve
Intel Fortran has /QX and /QAX switches to select architecture - the /QAX variant generates code that automatically detects the running CPU type and dispatches to the appropriate code set.
Steve

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page