Should asm listing look like this?!?

levicki · ‎11-13-2007

        movss     DWORD PTR [esp+140], xmm4                     ;351.16
        DB        141                                           ;352.16
        DB        116                                           ;352.16
        DB        38                                            ;352.16
        DB        0                                             ;352.16
        movss     xmm4, DWORD PTR _2il0floatpacket$17           ;352.16

JenniferJ · ‎11-13-2007

Those "DB" instructions are used for padding purpose in order to get better performance. To combine all 4 DB together, it's one nop instruction "lea esi, [esi]".

One possibility is to make the load that follows the padding be at a different offset modulo 256 of some other load in the loop that looked important to have IP address based data prefetch working on.

So there's nothing wrong with the asm.

jimdempseyatthecove · ‎11-14-2007

Shouldn't the ASM generator/viewer shown the text of the instruction sequence in lieu of the op-code byte sequence?

Jim

JenniferJ · ‎11-14-2007

No. Because the asm is DB instruction.

There are a number of different possible ways to encode lea esi,[esi],thismethod (using specific bytes) assures the padding stays the same whether assembly or direct object generation is used.

jimdempseyatthecove · ‎11-14-2007

OK,

But then your C++ -> ASM generator should insert a comment indicating

DB ... ;; pad with lea esi,[esi]

Otherwise you will continue to get questions as to what those DB...'s are doing in the code.

Jim

JenniferJ · ‎11-14-2007

This seems a nice feature request. I'll get it into our tracker. Thanks for the suggestion!

levicki · ‎11-14-2007

Then why aren't they shown as instructions in disassembly with a comment on their purpose instead of making it look like the assembly code generator has gone haywire?

EDIT: Doh, I see Jim has beaten me to it :)

I would suggest to show the actual instruction though:

	lea	esi, [esi]	; PAD

Rationale: assembler listings can get rather long and hard to follow even without adding several lines each containing only one DB.

By the way, that is not LEA ESI, [ESI], it is LEA ESI, [ESI + 0]. LEA ESI, [ESI] is two bytes long (0x8D 0x36).

Moreover, I am not sure if this can really improve performance since it seems to affect decoding throughput.

TimP · ‎11-14-2007

Padding in the instruction sequence would be used so as to get favorable alignment of the top of an inner loop body, and possibly for frequent jump targets within the loop. On early Core CPUs, to take advantage of the loop accelerator, the frequently executed instructions must fit within 4 16 byte (aligned, not necessarily contiguous) chunks of code. Penryn models accept a larger number of such chunks. In addition, hardware prefetch may be improved by making the top of the loop 32-byte aligned. The extended no-op above the loop is executed only before entering the loop, and no-ops at the beginning of an else segment would never be executed.

levicki · ‎11-15-2007

I know about loop alignment, but what happens if this code itself is in an (outer) loop? How the compiler judges the benefit of alignment .vs. the size of the code of the innermost loop?