- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compile
[fortran]program test double complex x double precision y read *, x, y print *, x / y print *, x * y end program [/fortran]with ifort 12.0.0.084 (-O3 -xSSE4.2) produces (objdump output)
[plain] print *, x / y 402d48: f2 0f 10 84 24 98 00 movsd 0x98(%rsp),%xmm0 402d4f: 00 00 402d51: 48 8d 3c 24 lea (%rsp),%rdi 402d55: 0f 29 44 24 70 movaps %xmm0,0x70(%rsp) 402d5a: f2 0f 11 44 24 60 movsd %xmm0,0x60(%rsp) 402d60: 0f 57 c9 xorps %xmm1,%xmm1 402d63: dd 44 24 60 fldl 0x60(%rsp) 402d67: f2 0f 11 4c 24 60 movsd %xmm1,0x60(%rsp) 402d6d: d9 c0 fld %st(0) 402d6f: d8 c9 fmul %st(1),%st 402d71: 4c 8d 44 24 50 lea 0x50(%rsp),%r8 402d76: d9 e8 fld1 402d78: 0f 10 54 24 40 movups 0x40(%rsp),%xmm2 402d7d: dd 44 24 60 fldl 0x60(%rsp) 402d81: d9 c0 fld %st(0) 402d83: 0f 28 da movaps %xmm2,%xmm3 402d86: d8 c9 fmul %st(1),%st 402d88: be ff ff ff ff mov $0xffffffff,%esi 402d8d: f2 0f 11 54 24 60 movsd %xmm2,0x60(%rsp) 402d93: 48 ba 00 ff 84 83 20 mov $0x208384ff00,%rdx 402d9a: 00 00 00 402d9d: 66 0f 15 da unpckhpd %xmm2,%xmm3 402da1: b9 c0 ce 47 00 mov $0x47cec0,%ecx 402da6: 33 c0 xor %eax,%eax 402da8: 48 c7 04 24 00 00 00 movq $0x0,(%rsp) 402daf: 00 402db0: de c3 faddp %st,%st(3) 402db2: d9 ca fxch %st(2) 402db4: de f9 fdivrp %st,%st(1) 402db6: dd 44 24 60 fldl 0x60(%rsp) 402dba: f2 0f 11 5c 24 60 movsd %xmm3,0x60(%rsp) 402dc0: d9 c0 fld %st(0) 402dc2: d8 cc fmul %st(4),%st 402dc4: d9 c9 fxch %st(1) 402dc6: d8 cb fmul %st(3),%st 402dc8: dd 44 24 60 fldl 0x60(%rsp) 402dcc: d9 c0 fld %st(0) 402dce: de cd fmulp %st,%st(5) 402dd0: 0f 29 94 24 80 00 00 movaps %xmm2,0x80(%rsp) 402dd7: 00 402dd8: d9 cc fxch %st(4) 402dda: de c2 faddp %st,%st(2) 402ddc: d9 c9 fxch %st(1) 402dde: d8 ca fmul %st(2),%st 402de0: dd 5c 24 60 fstpl 0x60(%rsp) 402de4: d9 cb fxch %st(3) 402de6: de ca fmulp %st,%st(2) 402de8: f2 0f 10 64 24 60 movsd 0x60(%rsp),%xmm4 402dee: d9 ca fxch %st(2) 402df0: de e9 fsubrp %st,%st(1) 402df2: de c9 fmulp %st,%st(1) 402df4: dd 5c 24 60 fstpl 0x60(%rsp) 402df8: 66 0f 16 64 24 60 movhpd 0x60(%rsp),%xmm4 402dfe: 0f 11 64 24 50 movups %xmm4,0x50(%rsp) 402e03: e8 48 2d 01 00 callq 415b50Shouldn't the division calculated simply using the divpd instruction similarly to the multiplication?402e08: 0f 28 94 24 80 00 00 movaps 0x80(%rsp),%xmm2 402e0f: 00 print *, x * y 402e10: 48 8d 3c 24 lea (%rsp),%rdi 402e14: f2 0f 12 44 24 70 movddup 0x70(%rsp),%xmm0 402e1a: 4c 8d 44 24 60 lea 0x60(%rsp),%r8 402e1f: 66 0f 59 c2 mulpd %xmm2,%xmm0 402e23: be ff ff ff ff mov $0xffffffff,%esi 402e28: 48 ba 00 ff 84 83 20 mov $0x208384ff00,%rdx 402e2f: 00 00 00 402e32: b9 c8 ce 47 00 mov $0x47cec8,%ecx 402e37: 33 c0 xor %eax,%eax 402e39: 48 c7 04 24 00 00 00 movq $0x0,(%rsp) 402e40: 00 402e41: 0f 11 44 24 60 movups %xmm0,0x60(%rsp) 402e46: e8 05 2d 01 00 callq 415b50 [/plain]
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What size is "double"? REAL(8) or REAL(16)?
The size of double is affected by a command line option.
When REAL(16), internally it uses the FPP TBYTE format (10 bytes) for computations.
Jim Dempsey
The size of double is affected by a command line option.
When REAL(16), internally it uses the FPP TBYTE format (10 bytes) for computations.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree, it's curious that SSE4.2 option produces such code. But, if you accept the limited range complex arithmetic, -fp-model fast=2 appears to do what you want, and it shouldn't expose the limited range problem in this case. The question would be more interesting if you posed a case with a chance of measurably different performance, preferably one which included operations between complex and complex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, I don't know what you mean by "FPP TBYTE" format. In our Fortran, REAL(16) is a 16-byte IEEE-style format that has all computations done in a software library. Tim has the right idea here - the standard code is a bit more cautious regarding range issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The posted code does calculations in double-precision (64-bit) arithmetic. But that is not the issue. The generate code is acceptable if y is a double complex variable. But when it is a double precision variable, the division x/y can be computed as simply as by x/y = (a/y, b/y) assuming x = (a, b), just like the multiplication x*y is computed by x*y = (a*y, b*y). Converting y to double complex type and doing a full complex-complex division is not any safer in terms of range issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, the point is that the compiler defaults to a code generation mode which should support general complex arithmetic. If you wish the compiler to switch into the mode it uses with -fast=2 when it is possible to do so without loss of generality (supposing that is your meaning), you could make your case in a post on your premier.intel.com, but it would be worth while to have a case which shows a likely performance benefit.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page