Hi, the optimizations manual advises not to use the inc/dec instructions because they write only a part of EFLAGS register and this create a false dep with earlier instructions. On Sandy/Ivy Bridge inc/dec are listed as macro-fusable with jcc, so with macrofusion is the above advice still raccomanded or it's valid only for jumps on CF flag?.
If a pair like dec ecx; jz label is macrofused as a single u-op without false deps the encoding is more compact than sub ecx, 1 jz so there could be a reason to shift back to the old method.
The advice to avoid inc/dec stands if you intend to run on an earlier model Intel CPU. You may have to run your own tests if you want to find out whether inc/dec are helping your application on Sandy Bridge. The architectural change, as you indicated, should eliminate the earlier significant performance penalty.
For example, on Sandy Bridge, the MSVC++ default /favor:blend is satisfactory in one of my benchmark suites, where earlier CPU models needed /favor:EM64T (alternate spelling /favor:INTEL64 for VS2012). So it looks like there is no point in setting the /favor option when using /arch:AVX in the VS2012 compiler. The current Intel compiler switches to inc/dec when compiling for AVX, in accordance with your suggestion.