Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Question about __fabs

gisli
Beginner
543 Views
My application contains a spare linear system solver which relies
heavily on the fabs() function. One of the biggest surprises I
have gotten from VTune on Linux is how significant a fraction of
my runtime is spent in __fabs. In a recent profile run I found
that 1 billion calls were made to this function with a Self Time
of 126 million. This is about as much time as is being spent
doing other things in the linear solve, which I find outrageous.

I should point out that I have profiled this code on many other
platforms, e.g. SGI, and __fabs has never popped on my radar.
My only recollection is that a colleague once told me that he
had seen fabs show up on an NT, but I don't remember how bad it
was.

I am using GCC 3.2 and my app is C++.

Can anyone help me understand this? Why is __fabs even showing
up as a function call? Would it be unreasonable to assume that
this could be in-lined? If my NT recollection is correct, then
this raises the question whether this is an x86 thing. Do MIPS
chips have some kind of fabs hardware which Intel lacks.

Thanks for any tips.

Gisli
0 Kudos
3 Replies
TimP
Honored Contributor III
543 Views
If you read the gcc documentation (info gcc), you will see that fabs() is included among the built-in functions, and there are ways to enable or disable recognition of built-ins. __builtin_fabs is implemented both for x87 code and for SSE2 (gcc -march=pentium4 -mfpmath=sse). 'info gcc' doesn't cover the likely possibility that your include files are broken.
The usual glibc setup has invoking a file of in-line definitions under certain circumstances, including the level of optimization. The usual location of this file is /usr/include/bits/mathinline.h In there, you will find macros for fabs() . If invoked, those could over-ride the recognition of built-ins. Many linux distros carry incorrect versions, presumably for reasons of historical continuity.
Among the reasons for not in-lining fabs(), which you will see alluded to in the documentation, are that C90 was the first to treat fabs as a reserved identifier, and the (slight, in this case) possibility that a programmer might attempt checking.
So, in a long winded way, I've given the usual advice, if all else fails, examine your pre-processed code and consult the documentation.
0 Kudos
gisli
Beginner
543 Views
Hi. Thanks for the response.

Please help me understand some of the terminology. For maximum performance, is "built-in" a desirable thing? If I see __fabs in VTune, does this mean that I have *not* gotten the more desirable __builtin_fabs. I am compiling with -march=pentium4, on RedHat9.

I am compiling with -ansi. Is that what is biting me?

In math.h I see two interesting defines: __NO_MATH_INLINES and __USE_EXTERN_INLINES. Could you give me an idea about the difference between math-inlines and extern-inlines? What is going to give me maximum performance? I am not concerned about errno, because according to the man page and common sense, no errors can occur in fabs.

Thanks in advance.

Gisli
0 Kudos
gisli
Beginner
543 Views
I have discovered that if I replace calls to fabs() with
calls to std::abs() then abs vanishes as a symbol in my
libraries, suggesting a successful inlining of these calls.

A noticable performance improvement has followed.

Thanks for the help.

Gisli
0 Kudos
Reply