- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Those math functions depend on parallelization for full efficiency, either the -xN vectorization for Xeon, or the SWP for Itanium. Considerations for in-lining and optimizing them are much the same in C and Fortran.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the very interesting information.
"but you are likely to need to specify explicitly the float or double versions of the functions"
By inspecting the number of clock cycles spent inmath functions (using MS C++ 7.1), and by inspecting the assembly dump, it seems to me that:
1) the number of clock cycles is independent of whether the variable is a double or a float, and
2) the assembly call to the math function is not changedif the variable is changed from a double to a float.
Does the Intel version 8.0 of the compilers have math functions that are more efficient for single than double precision?
"C, by design, doesn't have the potential of Fortran to optimize exponentiation operations like Fortran a**b."
I have noticed that rewriting a**0.25 to sqrt(sqrt(a)) had a dramatic effect on the computational time.
How does Fortran optimize exponents? Is it possible to calculate the N'th root a**(1/N) as efficient as sqrt(a)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am most gratful for the truly useful recommendations!
:-)
I would recommend that Intel implement the cube root in the next generation of Pentium 5 processors!
:-)
I will try the CubeRoot function in C and also check if we can gain a little if we write it directly in inline assembly. This routine has 24 bits accuracy. Would it be easy to extend it to 53 and 64 bit accuracy also, by adding additional lines like this:
r = (double)(2.0/3.0) * r + (double)(1.0/3.0) * x / (r * r); /* 12 bits of precision */ r = (double)(2.0/3.0) * r + (double)(1.0/3.0) * x / (r * r); /*24 bits of precision */ r = (double)(2.0/3.0) * r + (double)(1.0/3.0) * x / (r * r); /*36 bits of precision?*/ r = (double)(2.0/3.0) * r + (double)(1.0/3.0) * x / (r * r); /*48 bits of precision?*/
Please forgive me if the following question seems a little strange, but I am a beginner only in this field, and I am thusstill a little confused about precision. Aren't all the followingFPU instructions always calculated with 10 byte floating point represenation?
- FADD: Addition
- FMUL: Multiplication
- FDIV: Division
- FDIVR: Division
- FSIN: Sine (uses radians)
- FCOS: Cosine (uses radians)
- FSQRT: Square Root
- FSUB: Subtraction
- FABS: Absolute Value
If we write for example A = COS(B) in FORTRAN 77 wouldn't that calculation always be carried out with 10 byte representation internaly in the FPU both if the variable is single or double precision? (I know that the extended precision of the ST(0)..ST(7) registerswill be lost when the variable is written to RAM.)As it is not so easy to determine whether single or double precision is required to have good enough accuracy in the implementation of our mathematical model, it would be nice to be able to switch between single,double and extended precision as easy as possible. Would the best procedure be to implement the model in double precision,but switch back and forth between single, double precision and extended precision using:
- _control87(0x00020000, 0x00030000) (24 bits)
- _control87(0x00010000, 0x00030000) (53 bits)
- _control87(0x00000000, 0x00030000) (64 bits)
The _control87 routine may be called once in the beginning of the program to set the accuracy of the FPU. Will this affect the accuracy and efficiency of all the FPU instructions listed above?
Best regards from Lars Petter Endresen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apparently, cbrt() and cbrtf() are already supported in the libraries for the linux C compilers, as well as in newlib for gcc on Windows x87.
http://www.intel.com/software/products/opensource/whats_new.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can someone educate me on what SSE/SSE2 is?
Mike D.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am very grateful for all the useful comments to my questions. I would like to wish you a happy new year!
:-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(Note: this message was posted to the Intel C++ also!)
Thanks a lot for the help. I have received some comments directly from Intel also, regarding the cbrt function in C++. Here is the FORTRAN intertface to that function. This interface works for any calling convention, STDCALL, CDECL andFASTCALL. We have found that this function is at least four times faster than writing X**0.3333333333333333 in FORTRAN.
Question: Would this interface result in an inlined cbrt in FORTRAN? Or is it possible to write aan interface which is more efficient? The interface should be invariant with calling convention. We are using Intel Visual Fortran 8.0.
DOUBLE PRECISION
FUNCTION CBRTC(X)DOUBLE PRECISION X
INTERFACE
DOUBLE PRECISION FUNCTION CBRT(Y) DOUBLE PRECISION Y !DEC$ ATTRIBUTES C, ALIAS:'_cbrt' :: CBRT END FUNCTION CBRT END INTERFACE CBRTC = CBRT(%VAL(X)) END- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page