I started to use Vtune recently. For my application, Intel Tuning Assistant gives me the following data:
Streaming SIMD Extensions (SSE) Input Assists Performance Impact: 119.67
1st Level Cache Load Miss Performance Impact: 9.12
Clockticks per Instructions Retired (CPI): 3.39
Trace Cache (TC) Miss Performance Impact: 2.03
(1) How can the SSE input assists performance impact have such a big value? I was expecting a number below 5 by reading the definition of this parameter. The values for1st Level Cache Load Miss Performance Impact is also too high. Should I trust the numbers?
(2) I'm using Intel C++ compiler in MS visual studio 2000. I alreadyset "none" for "Floating Point Precision Improvment". Is this sufficient to enable the FTZ and DAZ modes? How to further reduce the SSE Input Assists Performance Impact?
Even when sampling completes correctly, it is possible for performance impact estimates to be off by a factor of 2, so you are correct to look for other means to verify them. The estimates are only correlations produced from data which may not resemble your application. Also, the underflow events are likely to be more serious for performance of early P4 models than for more recent ones, and more serious for execution of parallel than serial instructions.
I don't know any project settings or compiler options which alter DAZ setting. It can be done with the SSE intrinsic defined in
The current 9.1 compilers, as I understand, set FTZ with default options, and reset it with some of the -fp: options, such as -fp:precise. I would use the combination -fp:precise -Qftz when I want to avoid aggressive optimization which alters results, but still use abrupt underflow.
I hesitated to try to answer you, as I am not familiar with any correctly built applications where SSE input assists are a serious problem.