in https://www.ixpug.org/documents/1520529027IXPUG_prefetch_pres_mar_2018_2.pdf, opt-prefetch options are discussed. i've been trying to figure out what are the implication of the different prefetch levels. specifically, options such as "-mP2OPT_hlo_pref_multiple_pfes_strided_refs". is there a documents that can give me a description of the different options? i've search the web and the intel developer zone website and i'm not finding much information. i'm just finding high level information such as https://software.intel.com/en-us/node/678050
i'm using intel compiler version 18.0
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Parallel Computing
Rakesh's powerpoint presentation covers a lot of ground and seems to go fairly far in the direction of your interest. An evident implication is that you may need to work seriously with VTune to find out the effect of some of these choices in your application. As indicated, hardware prefetch takes care of typical linear small stride access automatically, so you normally start with the options which add prefetch instructions only for your cases of indirect, large stride, TLB misses ... You need privilege to disable hardware prefetch in case you want to try that with fully aggressive software prefetch, because changing hardware prefetch will affect all users of your hardware node.