- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Some information for start:

We have 2d array, let's say like this:

unsigned char array[128][128];

and auxiliary array of pointers to the 'array's rows like this:

unsigned char *parray[128]; //initialized with '=&array[0....127][0];

Now... In C those 2 local pointers expressions are supposed to be absolutely equivalent:

unsigned char *a = &array

and

unsigned char*a = parray

Now for the reference to 'a[]'. Most compilers with /O2 optimization flag on will translate it as:

"a

to

BYTE PTR [edx+eax]

(registers in use are not important - could be edi instead an so on) where edx contains the address of 'x' and eax - the value 'y', i.e. dynamic displacement value.

What puzzles me is the translation of the intel x86 compiler, which gives different results of translation with the different initialization of the 'a' ptr:

when unsigned char*a = parray

then the reference a

BYTE PTR [edx+eax]

when unsigned char *a = &array

then the reference a

[_array + X + eax], where _array - start address and X - displacement.

In other words, in the first case the compiler uses Base + Index (indexed mode), in the second - Base + Index + Displacement (indexed + displacement).

This happens only with the intel x86 compiler and none of any other I've used. So my question is whether I should use the first mode with one register busy with the address (edx in that sample), or the second one that adds extra bytes to the reference.

Any suggestions are welcomed!

Thanks in advance!

Link Copied

5 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

unsigned char *a = &array

by the equivalent statement

unsigned char *a = array

followed by an expression in which a

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I will use an array of structures, because it's closer to my code and more visible due to the usage of Scaled addressing mode. The initialization of the array is just to avoid some listing issues and excluded unusable code.

The most significant index to the array (with value of 10 passed as a test_1 and test_2 parameter) is randomly picked. The conditional statements is to just show better the reference.

[bash]typedef unsigned char byte;And the listing for the test_1 and test_2 functions:

typedef struct

{ byte x;

byte y;

}ts;

ts array[128][128];

ts *parray[128];

void init()

{

int i,j;

for(i = 0; i < 128; i++)

{ parray= &array[0];

for(j = 0; j < 128; j++)

{ array.x = (byte)i;

array.y = (byte)j;

}

}

}

int test_2(int index)

{

int i;

ts *t = parray[index];

for(i = 0; i < 128; i++)

if(t.x & 2 || t.y & 4)

return 1;

return 0;

}

int test_1(int index)

{

int i;

ts *t = &array[index][0];

for(i = 0; i < 128; i++)

if(t.x & 2 || t.y & 4)

return 1;

return 0;

}

int main()

{

init();

test_1(10);

test_2(10);

return 0;

}

[/bash]

[bash]; -- Begin _test_2

; mark_begin;

ALIGN 16

PUBLIC _test_2

_test_2 PROC NEAR

; parameter 1: 4 + esp

.B2.1: ; Preds .B2.0

;;; {

;;; int i;

;;; ts *t = parray[index];

00000 8b 15 28 00 00 00 mov edx, DWORD PTR [_parray+40]

;;; for(i = 0; i < 128; i++)

00006 33 c0 xor eax, eax

; LOE eax edx ebx ebp esi edi

.B2.2: ; Preds .B2.4 .B2.1

;;; if(t.x & 2 || t.y & 4)

00008 f6 04 42 02 test BYTE PTR [edx+eax*2], 2

0000c 75 12 jne .B2.7 ; Prob 20%

; LOE eax edx ebx ebp esi edi

.B2.3: ; Preds .B2.2

0000e f6 44 42 01 04 test BYTE PTR [1+edx+eax*2], 4

00013 75 0b jne .B2.7 ; Prob 20%

; LOE eax edx ebx ebp esi edi

.B2.4: ; Preds .B2.3

00015 40 inc eax

00016 3d 80 00 00 00 cmp eax, 128

0001b 7c eb jl .B2.2 ; Prob 99%

; LOE eax edx ebx ebp esi edi

.B2.5: ; Preds .B2.4

;;; return 1;

;;; return 0;

0001d 33 c0 xor eax, eax

0001f c3 ret

; LOE

.B2.7: ; Preds .B2.2 .B2.3 ; Infreq

00020 b8 01 00 00 00 mov eax, 1

00025 c3 ret

00026 8d 76 00 8d bc

27 00 00 00 00 ALIGN 16

; LOE

; mark_end;

_test_2 ENDP

;_test_2 ENDS

; -- Begin _test_1

; mark_begin;

ALIGN 16

PUBLIC _test_1

_test_1 PROC NEAR

; parameter 1: 4 + esp

.B3.1: ; Preds .B3.0

;;; {

;;; int i;

;;; ts *t = &array[index][0];

;;; for(i = 0; i < 128; i++)

00000 33 c0 xor eax, eax

; LOE eax ebx ebp esi edi

.B3.2: ; Preds .B3.4 .B3.1

;;; if(t.x & 2 || t.y & 4)

00002 f6 04 45 00 0a

00 00 02 test BYTE PTR [_array+2560+eax*2], 2

0000a 75 15 jne .B3.7 ; Prob 20%

; LOE eax ebx ebp esi edi

.B3.3: ; Preds .B3.2

0000c f6 04 45 01 0a

00 00 04 test BYTE PTR [_array+2561+eax*2], 4

00014 75 0b jne .B3.7 ; Prob 20%

; LOE eax ebx ebp esi edi

.B3.4: ; Preds .B3.3

00016 40 inc eax

00017 3d 80 00 00 00 cmp eax, 128

0001c 7c e4 jl .B3.2 ; Prob 99%

; LOE eax ebx ebp esi edi

.B3.5: ; Preds .B3.4

;;; return 1;

;;; return 0;

0001e 33 c0 xor eax, eax

00020 c3 ret

; LOE

.B3.7: ; Preds .B3.2 .B3.3 ; Infreq

00021 b8 01 00 00 00 mov eax, 1

00026 c3 ret

00027 8b f6 8d bc 27

00 00 00 00 ALIGN 16

; LOE

; mark_end;

_test_1 ENDP

;_test_1 ENDS[/bash]

p.s. It seems that HTML code editor replaces the i < 128, j < 128 with "i<128" and "j <128", I apologize for that, just don't know how to post it properly I guess.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The performance difference will vary depending on processor archetecture.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If your application references separate arrays of data, or uses the same array but with different values each time, then broaden your test application to include this behavior.

Jim

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you!

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page