- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Those math functions depend on parallelization for full efficiency, either the -xN vectorization for Xeon, or the SWP for Itanium. Considerations for in-lining and optimizing them are much the same in C and Fortran.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Thanks for the very interesting information.
"but you are likely to need to specify explicitly the float or double versions of the functions"
By inspecting the number of clock cycles spent inmath functions (using MS C++ 7.1), and by inspecting the assembly dump, it seems to me that:
1) the number of clock cycles is independent of whether the variable is a double or a float, and
2) the assembly call to the math function is not changedif the variable is changed from a double to a float.
Does the Intel version 8.0 of the compilers have math functions that are more efficient for single than double precision?
"C, by design, doesn't have the potential of Fortran to optimize exponentiation operations like Fortran a**b."
I have noticed that rewriting a**0.25 to sqrt(sqrt(a)) had a dramatic effect on the computational time.
How does Fortran optimize exponents? Is it possible to calculate the N'th root a**(1/N) as efficient as sqrt(a)?
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
I am most gratful for the truly useful recommendations!
:-)
I would recommend that Intel implement the cube root in the next generation of Pentium 5 processors!
:-)
I will try the CubeRoot function in C and also check if we can gain a little if we write it directly in inline assembly. This routine has 24 bits accuracy. Would it be easy to extend it to 53 and 64 bit accuracy also, by adding additional lines like this:
r = (double)(2.0/3.0) * r + (double)(1.0/3.0) * x / (r * r); /* 12 bits of precision */ r = (double)(2.0/3.0) * r + (double)(1.0/3.0) * x / (r * r); /*24 bits of precision */ r = (double)(2.0/3.0) * r + (double)(1.0/3.0) * x / (r * r); /*36 bits of precision?*/ r = (double)(2.0/3.0) * r + (double)(1.0/3.0) * x / (r * r); /*48 bits of precision?*/
Please forgive me if the following question seems a little strange, but I am a beginner only in this field, and I am thusstill a little confused about precision. Aren't all the followingFPU instructions always calculated with 10 byte floating point represenation?
- FADD: Addition
- FMUL: Multiplication
- FDIV: Division
- FDIVR: Division
- FSIN: Sine (uses radians)
- FCOS: Cosine (uses radians)
- FSQRT: Square Root
- FSUB: Subtraction
- FABS: Absolute Value
If we write for example A = COS(B) in FORTRAN 77 wouldn't that calculation always be carried out with 10 byte representation internaly in the FPU both if the variable is single or double precision? (I know that the extended precision of the ST(0)..ST(7) registerswill be lost when the variable is written to RAM.)As it is not so easy to determine whether single or double precision is required to have good enough accuracy in the implementation of our mathematical model, it would be nice to be able to switch between single,double and extended precision as easy as possible. Would the best procedure be to implement the model in double precision,but switch back and forth between single, double precision and extended precision using:
- _control87(0x00020000, 0x00030000) (24 bits)
- _control87(0x00010000, 0x00030000) (53 bits)
- _control87(0x00000000, 0x00030000) (64 bits)
The _control87 routine may be called once in the beginning of the program to set the accuracy of the FPU. Will this affect the accuracy and efficiency of all the FPU instructions listed above?
Best regards from Lars Petter Endresen
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Apparently, cbrt() and cbrtf() are already supported in the libraries for the linux C compilers, as well as in newlib for gcc on Windows x87.
http://www.intel.com/software/products/opensource/whats_new.htm
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
(Note: this message was posted to the Intel C++ also!)
Thanks a lot for the help. I have received some comments directly from Intel also, regarding the cbrt function in C++. Here is the FORTRAN intertface to that function. This interface works for any calling convention, STDCALL, CDECL andFASTCALL. We have found that this function is at least four times faster than writing X**0.3333333333333333 in FORTRAN.
Question: Would this interface result in an inlined cbrt in FORTRAN? Or is it possible to write aan interface which is more efficient? The interface should be invariant with calling convention. We are using Intel Visual Fortran 8.0.
DOUBLE PRECISION
FUNCTION CBRTC(X)DOUBLE PRECISION X
INTERFACE
DOUBLE PRECISION FUNCTION CBRT(Y) DOUBLE PRECISION Y !DEC$ ATTRIBUTES C, ALIAS:'_cbrt' :: CBRT END FUNCTION CBRT END INTERFACE CBRTC = CBRT(%VAL(X)) END- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
