Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)

Question about __fabs

My application contains a spare linear system solver which relies
heavily on the fabs() function. One of the biggest surprises I
have gotten from VTune on Linux is how significant a fraction of
my runtime is spent in __fabs. In a recent profile run I found
that 1 billion calls were made to this function with a Self Time
of 126 million. This is about as much time as is being spent
doing other things in the linear solve, which I find outrageous.

I should point out that I have profiled this code on many other
platforms, e.g. SGI, and __fabs has never popped on my radar.
My only recollection is that a colleague once told me that he
had seen fabs show up on an NT, but I don't remember how bad it

I am using GCC 3.2 and my app is C++.

Can anyone help me understand this? Why is __fabs even showing
up as a function call? Would it be unreasonable to assume that
this could be in-lined? If my NT recollection is correct, then
this raises the question whether this is an x86 thing. Do MIPS
chips have some kind of fabs hardware which Intel lacks.

Thanks for any tips.

0 Kudos
3 Replies
Black Belt
If you read the gcc documentation (info gcc), you will see that fabs() is included among the built-in functions, and there are ways to enable or disable recognition of built-ins. __builtin_fabs is implemented both for x87 code and for SSE2 (gcc -march=pentium4 -mfpmath=sse). 'info gcc' doesn't cover the likely possibility that your include files are broken.
The usual glibc setup has invoking a file of in-line definitions under certain circumstances, including the level of optimization. The usual location of this file is /usr/include/bits/mathinline.h In there, you will find macros for fabs() . If invoked, those could over-ride the recognition of built-ins. Many linux distros carry incorrect versions, presumably for reasons of historical continuity.
Among the reasons for not in-lining fabs(), which you will see alluded to in the documentation, are that C90 was the first to treat fabs as a reserved identifier, and the (slight, in this case) possibility that a programmer might attempt checking.
So, in a long winded way, I've given the usual advice, if all else fails, examine your pre-processed code and consult the documentation.
Hi. Thanks for the response.

Please help me understand some of the terminology. For maximum performance, is "built-in" a desirable thing? If I see __fabs in VTune, does this mean that I have *not* gotten the more desirable __builtin_fabs. I am compiling with -march=pentium4, on RedHat9.

I am compiling with -ansi. Is that what is biting me?

In math.h I see two interesting defines: __NO_MATH_INLINES and __USE_EXTERN_INLINES. Could you give me an idea about the difference between math-inlines and extern-inlines? What is going to give me maximum performance? I am not concerned about errno, because according to the man page and common sense, no errors can occur in fabs.

Thanks in advance.

I have discovered that if I replace calls to fabs() with
calls to std::abs() then abs vanishes as a symbol in my
libraries, suggesting a successful inlining of these calls.

A noticable performance improvement has followed.

Thanks for the help.