Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Problem with _dclspec and short int array

jmsecilla
Beginner
470 Views
Hi!, I have the following code, when I use the array vr, I cant got good results. I think that is by the alignement of data but I cant do anything. I hope that someone can to help me.


short int * prueba(short int *v)
{
int i, cont=(N-4)/4;
__declspec(align(16)) short int vr[64];


__asm {
movq mm0, v
psllq mm0, 32
add v,4
CW1D:
movq mm1,v
pxor mm2,mm2
por mm2,mm1
pxor mm7,mm7
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
movq vr, mm7 <----------- Here is array vr
pxor mm0,mm0
por mm0,mm2
add v, 8
add vr,8
dec cont
cmp cont,0
ja CW1D
movq mm1,(v)-4
psrlq mm1,32
pxor mm7, mm7
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
movq vr,mm7
emms
}
for(i=0;i<64;i++)
printf("%d %d ", vr, v);
}

The results are:
20 1
-402 1
-385 1
-386 1
12592 1
13106 1
13620 1
14134 1
14648 1
15162 1
15676 1
16190 1
24896 1
25442 1
25956 1
26470 1
26984 1
27498 1
1600 1
20 1
514 1
0 1
30068 1
30582 1
31096 1
23418 1
23900 1
24414 1
24928 1
25442 1
25956 1
26470 1
26984 1
27498 1
28012 1
28526 1
29040 1
29554 1
30068 1
30582 1
31096 1
31610 1
32124 1
32638 1
-32384 1
-31870 1
-1537 1
16363 1
0 1
0 1
2057 1
1 1
3876 1
0 1
28265 1
18789 1
29806 1
27749 1
25927 1
30062 1
2 1
0 1
-128 1
18 1

Thanks

Message Edited by jmsecilla@gmail.com on 07-15-2005 09:37 AM

0 Kudos
5 Replies
Intel_C_Intel
Employee
470 Views

Dear jmsecilla,

As a general remark, when you have questions like thisplease alsodescribe what exactly you are trying to achieve andgive the full code (I had to guess the value of N and the initial value of 1 for v). Having said that, your add vr, 8 statement adds the value 8 to the contents of memory address vr, it does not advance the pointer by eight bytes as you probably incorrectly assumed. That is why you only see a change in the first four values of the vr array. To do what you want simply use

lea eax, vr

before the loop, and

movq [eax], mm7
add eax, 8

inside the loop. In addition, did youtry coding your algorithm in C first and using automatic vectorization instead? You can find a brief introduction online at

http://www.intel.com/cd/ids/developer/asmo-na/eng/65774.htm

If you find the automatic vectorization does not help, please let me know and I may be able to help you further with this. Automatic vectorization may save you a lot of engineering efforts!

Aart Bik
http://www.aartbik.com/

Message Edited by abik on 07-16-2005 02:39 PM

0 Kudos
jmsecilla
Beginner
470 Views
Thanks for your reply.

I have been doing that you say me, but in the line "movq [ebx], mm7" the program crash. The final code with the modifications is;

PD: N is 64

I atach you the file
0 Kudos
Intel_C_Intel
Employee
470 Views

No you did not, since youalso suddenly changed vr into a pointer variable:

short int *vr;
vr=(short int *)calloc(64, sizeof(short int));

so that ironically your original"add vr, 8" at leastmakes sense again. To get it right for this dynamic allocation, use mov ebx, vr instead of lea ebx, vr now (same applies to v which already was a pointer). Alsowhy did you not try automatic vectorization first like I suggested?

Message Edited by abik on 07-16-2005 02:57 PM

0 Kudos
jmsecilla
Beginner
470 Views
Hi Abik,

thanks for all, my program works correctly. Now, Im going to see the automatic vectorization. I will keep in touch with you to talk about this.

Thanks.
0 Kudos
Intel_C_Intel
Employee
470 Views

You are welcome and I appreciate the follow-up. Looking forward to the vectorization discussion.
Aart

0 Kudos
Reply