Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5064 Discussions

value for SSE Input Assists Performance Impact etc.



I started to use Vtune recently. For my application, Intel Tuning Assistant gives me the following data:

Event Ratios
Streaming SIMD Extensions (SSE) Input Assists Performance Impact: 119.67
1st Level Cache Load Miss Performance Impact: 9.12
Clockticks per Instructions Retired (CPI): 3.39
Trace Cache (TC) Miss Performance Impact: 2.03

(1) How can the SSE input assists performance impact have such a big value? I was expecting a number below 5 by reading the definition of this parameter. The values for1st Level Cache Load Miss Performance Impact is also too high. Should I trust the numbers?

(2) I'm using Intel C++ compiler in MS visual studio 2000. I alreadyset "none" for "Floating Point Precision Improvment". Is this sufficient to enable the FTZ and DAZ modes? How to further reduce the SSE Input Assists Performance Impact?



0 Kudos
1 Reply
Honored Contributor III
I agree with your skepticism about a high impact estimate for input assists. Sometimes, nonsensical estimates can be produced by prematurely terminated sampling sessions, as they may depend on all of your sampling series completing with consistent runs.
Even when sampling completes correctly, it is possible for performance impact estimates to be off by a factor of 2, so you are correct to look for other means to verify them. The estimates are only correlations produced from data which may not resemble your application. Also, the underflow events are likely to be more serious for performance of early P4 models than for more recent ones, and more serious for execution of parallel than serial instructions.
I don't know any project settings or compiler options which alter DAZ setting. It can be done with the SSE intrinsic defined in or equivalent headers, or any method of resetting DAZ bit in mxcsr. When FTZ is set, the only possible data which could trigger SSE input assists would come from data files produced outside your program, or produced by x87 operations, if you mix SSE and non-SSE code. In the latter case you would expect x87 assists when storing the data, if SSE assists are to be incurred when reading them back.
The current 9.1 compilers, as I understand, set FTZ with default options, and reset it with some of the -fp: options, such as -fp:precise. I would use the combination -fp:precise -Qftz when I want to avoid aggressive optimization which alters results, but still use abrupt underflow.
I hesitated to try to answer you, as I am not familiar with any correctly built applications where SSE input assists are a serious problem.
0 Kudos