Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Got crash! some questions about icl and gcc library when use sse2 instruction

ipeak_cn
Beginner
851 Views
My develop environment:
MS vs.studio 2008
Intel C++ Compiler 11.0.074
Windows 2003 sp2
MinGW
gcc 4.2.1 for MinGW

I downloaded Intel C++ Compiler 11.0.074 and want to compile some code. For the reason that icl can't compile some source code contained inline asm file, I split these codes and compile them in MinGW environment using gcc tooutput static library. I use this library link with otherobject filescreated by icl and things are in good train.

But when I test this exe files , I got crash! The crash address is in sse2 code compiled by gcc in MinGW.
Some people talk about this problem and think this is the reason for memory align. sse2 code must align use 16. ButICL failed to do this. there is no answer about this problem.

How can I to do ? I just want use sse code so use gcc to compile it. I don't want to disable sse code to resolved this.

asm code:

00FAB910 sub esp,8Ch
00FAB916 mov edx,offset _ff_pw_9+0C0h (1238220h)
00FAB91B mov dword ptr [esp+84h],ebx
00FAB922 mov eax,offset _ff_pw_9+0D0h (1238230h)
00FAB927 mov ebx,dword ptr [esp+90h]
00FAB92E mov dword ptr [esp+88h],esi
00FAB935 mov esi,offset _ff_pw_9+90h (12381F0h)
00FAB93A movdqa xmm0,xmmword ptr [ebx+10h]
00FAB93F movdqa xmm1,xmmword ptr [ebx+60h]
00FAB944 movdqa xmm2,xmm0
00FAB948 movdqa xmm3,xmmword ptr [ebx+20h]
00FAB94D paddsw xmm0,xmm1
00FAB951 movdqa xmm4,xmmword ptr [ebx+50h]
00FAB956 psllw xmm0,3
00FAB95B movdqa xmm5,xmmword ptr [ebx]
00FAB95F paddsw xmm4,xmm3
00FAB963 paddsw xmm5,xmmword ptr [ebx+70h]
00FAB968 psllw xmm4,3
00FAB96D movdqa xmm6,xmm0
00FAB971 psubsw xmm2,xmm1
00FAB975 movdqa xmm1,xmmword ptr [esi+10h]
00FAB97A psubsw xmm0,xmm4
00FAB97E movdqa xmm7,xmmword ptr [ebx+30h]
00FAB983 pmulhw xmm1,xmm0
00FAB987 paddsw xmm7,xmmword ptr [ebx+40h]
00FAB98C psllw xmm5,3
00FAB991 paddsw xmm6,xmm4
00FAB995 psllw xmm7,3
00FAB99A movdqa xmm4,xmm5
00FAB99E psubsw xmm5,xmm7
00FAB9A2 paddsw xmm1,xmm5
00FAB9A6 paddsw xmm4,xmm7
00FAB9AA por xmm1,xmmword ptr [edx]
00FAB9AE psllw xmm2,4
00FAB9B3 pmulhw xmm5,xmmword ptr [esi+10h]
00FAB9B8 movdqa xmm7,xmm4
00FAB9BC psubsw xmm3,xmmword ptr [ebx+50h]
00FAB9C1 psubsw xmm4,xmm6
00FAB9C5 movdqa xmmword ptr [esp+20h],xmm1 ----- got crash here!
00FAB9CB paddsw xmm7,xmm6
00FAB9CF movdqa xmm1,xmmword ptr [ebx+30h]
00FAB9D4 psllw xmm3,4
00FAB9D9 psubsw xmm1,xmmword ptr [ebx+40h]
00FAB9DE movdqa xmm6,xmm2
00FAB9E2 movdqa xmmword ptr [esp+40h],xmm4

registers state:

EAX = 01238230 EBX = 028B60C0 ECX = 028B60C0 EDX = 01238220 ESI = 012381F0 EDI = 00000000 EIP = 00FAB9C5
ESP = 001248CC EBP = 00125334 EFL = 00010216

CS = 001B DS = 0023 ES = 0023 SS = 0023 FS = 003B GS = 0000

ST0 = 1#SNAN ST1 = 1#SNAN ST2 = 1#SNAN
ST3 = 1#SNAN ST4 = 1#SNAN ST5 = +1.0000000000000000e+0000
ST6 = +5.0000000000000000e-0001 ST7 = +3.3700000000000000e+0002 CTRL = 027F STAT = 0123 TAGS = FFFF
EIP = 00FB14E6 EDO = 0012EF40

MM0 = 0000000000000000 MM1 = 002C852000590A40 MM2 = 0001000000010000 MM3 = 0000800000008000
MM4 = 0001000000010000 MM5 = 8000000000000000 MM6 = 8000000000000000 MM7 = A880000000000000

XMM0 = 00000000000000000000000000000000 XMM1 = 00010001000100010001000100010001
XMM2 = 00000000000000000000000000000000 XMM3 = 00000000000000000000000000000000
XMM4 = 00000000000000000000000000000000 XMM5 = 00000000000000000000000000000000
XMM6 = 19A019A019A019A019A019A019A019A0 XMM7 = 19A019A019A019A019A019A019A019A0 XMM00 = +0.00000E+000
XMM01 = +0.00000E+000 XMM02 = +0.00000E+000 XMM03 = +0.00000E+000 XMM10 = +9.2E-041#DEN
XMM11 = +9.2E-041#DEN XMM12 = +9.2E-041#DEN XMM13 = +9.2E-041#DEN XMM20 = +0.00000E+000
XMM21 = +0.00000E+000 XMM22 = +0.00000E+000 XMM23 = +0.00000E+000 XMM30 = +0.00000E+000
XMM31 = +0.00000E+000 XMM32 = +0.00000E+000 XMM33 = +0.00000E+000 XMM40 = +0.00000E+000
XMM41 = +0.00000E+000 XMM42 = +0.00000E+000 XMM43 = +0.00000E+000 XMM50 = +0.00000E+000
XMM51 = +0.00000E+000 XMM52 = +0.00000E+000 XMM53 = +0.00000E+000 XMM60 = +1.65540E-023
XMM61 = +1.65540E-023 XMM62 = +1.65540E-023 XMM63 = +1.65540E-023 XMM70 = +1.65540E-023
XMM71 = +1.65540E-023 XMM72 = +1.65540E-023 XMM73 = +1.65540E-023 MXCSR = 00001FA1

XMM0DL = +0.00000000000000E+000 XMM0DH = +0.00000000000000E+000
XMM1DL = +1.3906923818E-309#DEN XMM1DH = +1.3906923818E-309#DEN
XMM2DL = +0.00000000000000E+000 XMM2DH = +0.00000000000000E+000
XMM3DL = +0.00000000000000E+000 XMM3DH = +0.00000000000000E+000
XMM4DL = +0.00000000000000E+000 XMM4DH = +0.00000000000000E+000
XMM5DL = +0.00000000000000E+000 XMM5DH = +0.00000000000000E+000
XMM6DL = +2.96020117590228E-185 XMM6DH = +2.96020117590228E-185
XMM7DL = +2.96020117590228E-185 XMM7DH = +2.96020117590228E-185

OV = 0 UP = 0 EI = 1 PL = 0 ZR = 0 AC = 1 PE = 1 CY = 0

001248EC = 00000000000000000000000000000000


0 Kudos
5 Replies
TimP
Honored Contributor III
851 Views
This is well known, even if it hasn't been discussed much recently. ICL 32-bit does not support greater than 4 byte stack alignment requirements of functions compiled by other compilers. Among the ways to work around this would be to use an intermediate wrapper function compiled by gcc, not depending on alignment at entry, with -mpreferred-stack-boundary=4 (the default for support of parallel sse), so that alignment is set at exit. Just as you can't use parallel SSE in a 32-bit gcc main(), you must use parallel SSE only in a function called from a gcc function compiled with compatible options, or adjust the stack explicitly.
By the same token, you would have trouble with the plain Microsoft 32-bit malloc().
There is an effort to define a standard ABI for 32-bit compilers, but Windows is unlikely to be influenced by any such standard.
64-bit MinGW seems likely to improve, so that could be another option.
0 Kudos
Lingfeng_C_Intel
Employee
851 Views

Hello,
I read your code and just from statements view, I found that you didn't do movdqa statement to ptr[esp] and ptr[esp+10h] before you wrote 00FAB9C5 movdqa xmmword ptr [esp+20h],xmm1 ----- got crash here!
This will make some crash if the data is not aligned 16 boundary. Hope it can help you.

Thanks,
Wise

0 Kudos
Lingfeng_C_Intel
Employee
851 Views
Hi,

I went through your code, and user guide. and I read:
por
Packed logical inclusive OR
--------------------------------------------------------------------------------
Description
Performs a packed logical inclusive OR on the operands.
--------------------------------------------------------------------------------
Note: por is an MMX instruction.
--------------------------------------------------------------------------------

Operand Form(s)

memory, MMX register
MMX register, MMX register

So, in por xmm1, xmmword ptr[edx] statement, maybe you should change to 2 statements like this:
mov one register name, ptr[edx]
por xmm1 xmmword one register name

becuase por statement can't operate memory address as its source value.

Hope it can help you.

Thanks,
Wise
0 Kudos
ipeak_cn
Beginner
851 Views
Quoting - (Intel)
Hi,

I went through your code, and user guide. and I read:
por
Packed logical inclusive OR
--------------------------------------------------------------------------------
Description
Performs a packed logical inclusive OR on the operands.
--------------------------------------------------------------------------------
Note: por is an MMX instruction.
--------------------------------------------------------------------------------

Operand Form(s)

memory, MMX register
MMX register, MMX register

So, in por xmm1, xmmword ptr[edx] statement, maybe you should change to 2 statements like this:
mov one register name, ptr[edx]
por xmm1 xmmword one register name

becuase por statement can't operate memory address as its source value.

Hope it can help you.

Thanks,
Wise

I know how to do now! Thanks a lot to all people!
0 Kudos
Lingfeng_C_Intel
Employee
851 Views
Quoting - ipeak_cn

I know how to do now! Thanks a lot to all people!

Great!
0 Kudos
Reply