I would like to do some comparative testson Intel Core 2with L2 cache prefetcherenabled or disabled. It seems Intel C++ compilersupports such a function. Mysystem isLinux, kernelversion2.6.20 and I haven't used Intel compiler before. How should I implement such a function on my system?
Meanwhile, my system may need to kown informations suchas how many prefetchers can perform at the same time or how does the perfetcher handle the contentions on it?
Would you give some suggestion or tell me some literatures i shall refer to?
There's a great deal of material on software prefetch options in the compiler help file. It's used automatically to some extent by default with -O3. There's prefetching for a single loop, with or without specification of individual variables. The prefetch intrinsic to issue a specific prefetch instruction is most likely to work for indirect references, where there isn't a pattern which engages hardware prefetch, and there is unused bus capacity. What's missing is advice on when not to use these options (e.g. when it prevents hardware prefetch). The hint options for various cache levels would be ignored on Core 2, with all prefetches working with L2. Hardware prefetch isn't discussed in the compiler documentation, but that may be what you are referring to when you ask how many prefetchers. It's normally in effect, except on servers with a BIOS setup option to control it. Contentions are most likely to develop where there is high bus utilization and speculative prefetch brings in too many unused references, or where the adjacent sector or other prefetchers create a false sharing condition, where data are prefetched in conflict with writes from another core.