- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
short int * prueba(short int *v)
{
int i, cont=(N-4)/4;
__declspec(align(16)) short int vr[64];
__asm {
movq mm0, v
psllq mm0, 32
add v,4
CW1D:
movq mm1,v
pxor mm2,mm2
por mm2,mm1
pxor mm7,mm7
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
movq vr, mm7 <----------- Here is array vr
pxor mm0,mm0
por mm0,mm2
add v, 8
add vr,8
dec cont
cmp cont,0
ja CW1D
movq mm1,(v)-4
psrlq mm1,32
pxor mm7, mm7
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
psrlq mm0,16
paddw mm7,mm0
psllq mm1,16
paddw mm7,mm1
movq vr,mm7
emms
}
for(i=0;i<64;i++)
printf("%d %d ", vr, v);
}
The results are:
20 1
-402 1
-385 1
-386 1
12592 1
13106 1
13620 1
14134 1
14648 1
15162 1
15676 1
16190 1
24896 1
25442 1
25956 1
26470 1
26984 1
27498 1
1600 1
20 1
514 1
0 1
30068 1
30582 1
31096 1
23418 1
23900 1
24414 1
24928 1
25442 1
25956 1
26470 1
26984 1
27498 1
28012 1
28526 1
29040 1
29554 1
30068 1
30582 1
31096 1
31610 1
32124 1
32638 1
-32384 1
-31870 1
-1537 1
16363 1
0 1
0 1
2057 1
1 1
3876 1
0 1
28265 1
18789 1
29806 1
27749 1
25927 1
30062 1
2 1
0 1
-128 1
18 1
Thanks
Message Edited by jmsecilla@gmail.com on 07-15-2005 09:37 AM
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear jmsecilla,
As a general remark, when you have questions like thisplease alsodescribe what exactly you are trying to achieve andgive the full code (I had to guess the value of N and the initial value of 1 for v). Having said that, your add vr, 8 statement adds the value 8 to the contents of memory address vr, it does not advance the pointer by eight bytes as you probably incorrectly assumed. That is why you only see a change in the first four values of the vr array. To do what you want simply use
lea eax, vr
before the loop, and
movq [eax], mm7
add eax, 8
inside the loop. In addition, did youtry coding your algorithm in C first and using automatic vectorization instead? You can find a brief introduction online at
http://www.intel.com/cd/ids/developer/asmo-na/eng/65774.htm
If you find the automatic vectorization does not help, please let me know and I may be able to help you further with this. Automatic vectorization may save you a lot of engineering efforts!
Aart Bik
http://www.aartbik.com/
Message Edited by abik on 07-16-2005 02:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been doing that you say me, but in the line "movq [ebx], mm7" the program crash. The final code with the modifications is;
PD: N is 64
I atach you the file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No you did not, since youalso suddenly changed vr into a pointer variable:
short int *vr;
vr=(short int *)calloc(64, sizeof(short int));
so that ironically your original"add vr, 8" at leastmakes sense again. To get it right for this dynamic allocation, use mov ebx, vr instead of lea ebx, vr now (same applies to v which already was a pointer). Alsowhy did you not try automatic vectorization first like I suggested?
Message Edited by abik on 07-16-2005 02:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for all, my program works correctly. Now, Im going to see the automatic vectorization. I will keep in touch with you to talk about this.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are welcome and I appreciate the follow-up. Looking forward to the vectorization discussion.
Aart
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page