Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development Technologies
- Intel® ISA Extensions
- Hardware acceleration of Special Functions.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2012
11:13 PM

323 Views

Hardware acceleration of Special Functions.

I would like to ask Intel's employees on this forum.Why IntelCPU architects have never implemented in hardware some of themore "popular" Special Functions like'GAMMA','BETA' and various 'BESSEL' functions of an integer order.All these functions could have been accessed byx87 ISAinstructions.

Link Copied

70 Replies

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-02-2012
10:43 PM

213 Views

Is anyone interested in this question?

sirrida

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
02:52 AM

213 Views

To be honest: How often did you need such functions?

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
03:17 AM

213 Views

I liked your answer :) I know that trigonometric function(fsin fcos)are more useful than special functions mentioned by me,but there are various application that can benefit from hardware implementation of such a functions.How often did you need such functions

For example Bessel functions are used in signal processing and in wave propagation.

Gamma functions are used in statistics as gamma distribution.

These functions can be approximated by polynomial fit with pre-calculated coefficients and it is straightforward to implement in SSE technology when the high-precision(less than 80-bit) is needed.I suppose that CPU designers beign aware of such a functions andposibility to accurate approximate them in software simply decided to not implement it in hardware.

bronxzv

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
04:18 AM

213 Views

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
05:09 AM

213 Views

As our tests have shown highly optimized SSE - based sine() function is almost as fast as x87 fsin.But the comparision was made to fsin which prabably implements in hardware range reduction.

I think that Intel could have implemented in microcode transcendental functions with the help of SSE technology.I mean creating SSE instruction which takes as an input single precision or double precision values and returns sine of these values , such a instruction is implemented in microcode.

bronxzv

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
05:43 AM

213 Views

But the comparision was made to fsin which prabably implements in hardware range reduction

obviously any complete implementation will include a proper range reduction, that's the case for the x87 instructions and high performance vectorized implementations such as the MKL Vector Mathematical Functions Library [1] vsCos, vsSin, vsSincos, vsAcos, etc.

[1] http://software.intel.com/sites/products/documentation/hpc/mkl/vml/vmldata.htm

sirrida

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
05:53 AM

213 Views

Exactly this is the reason for not wasting silicon, especially when such a function is only very seldom needed.

> I think that Intel could have implemented in microcode...

Of course they could have - but why?

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
05:57 AM

213 Views

Yes thats true.Did you test MKL transcendentals?obviously any complete implementation will include a proper range reduction, that's the case for the x87 instructions and high performance vectorized implementations such as MKL vsCos, vsSin, vsSincos, vsAcos,

For example such a function like Gamma which is not periodic albeit its rate of grow is very fast.I think Intel could have implemented it in microcode as a SSE or AVX instructionit could have been even faster when coded as minimax approximated polynomial(elimination of dependency on exp and pow).

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
06:09 AM

213 Views

I have to agree with you :)You are talking from practical point of view.If Intel were creating some custom DSP processor tailored for Bessel function's application it was probably mandatory for the enginers to implement it in hardware.But in case of Intel CPU when such a exotic functions can be efficiently approximated by SSE/AVX simplier instruction they did not waste silicon for this.Exactly this is the reason for not wasting silicon, especially when such a function is only very seldom needed

>>Of course they could have - but why

bronxzv

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
06:11 AM

213 Views

no, I don't, but you can find very detailed performance data here:Yes thats true.Did you test MKL transcendentals?

http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/_performanceall.html

as stated by sirrida Idon't think it will be a good idea to waste chip area and/or validation budget (read: potential delays) for such specialized things in hardware, forthcoming FMA in Haswell will probably provide a strong boost to all polynomial/rational based approximations, and gather instructions willhelp table-based methods, it's the right way forward IMHO: powerful new general purpose instructions that let us speed up a lot of special cases

also the more you add functions the more you open the door to someone asking for yet another one, with an hardware based solution you talk about a 3+ years turnaround for just a new function and with software it's more like3 weeks

sirrida

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
06:40 AM

213 Views

Talking of validation, we don't need another fdiv bug...

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
06:43 AM

213 Views

Thank you very much for posting this link.You made my day:)no, I don't, but you can find very detailed performance data here:

http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/_performanceall.html

As I already have seen vml gamma functions also is slow 123 clocks per value.I suppose that theydid not eliminate dependency on library calls.

bronxzv

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
07:36 AM

213 Views

indeed, I was thinking to it actually when mentioning validation, it will be real bad to delay or recall new CPUs due to a hard to catch microcode bug for an instruction used by 0.0001 % of the code baseTalking of validation, we don't need another fdiv bug...

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
08:14 AM

213 Views

There are many .lib files

bronxzv

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
08:55 AM

213 Views

"

3. LICENSE RESTRICTIONS:

[...]

B. You may NOT: [...] (v) reverse engineer, decompile, or disassemble the Materials;

"

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
10:25 AM

213 Views

Sorry did not know this.reverse engineer, decompile, or disassemble the Materials

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
10:51 AM

213 Views

MKL tgamma results for 1000 random choosen doublevalues are 123 cycles very close to the my results.It is interesting what an approximation did they use?

They also were able to achieve 0.5 ulp of an accuracyeven on the problematic range[ 0.0001,1.0] maybe they have used Lanczos approximation?

bronxzv

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
12:34 PM

213 Views

the poster with the best knowledge of MKL here isTimP AFAIK

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-04-2012
09:20 AM

213 Views

SergeyKostrov

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-04-2012
04:26 PM

85 Views

Quoting iliyapolak

inhardware some of themore "popular" Special Functions like'GAMMA','BETA' and various 'BESSEL'

functions of an integer order. All these functions could have been accessed byx87 ISAinstructions.

I don't think that Intel will add a such set of instructions. These functions are "Special" and they are not "Fundamental".

Intel clearly made a statement: "Use SSE or AVX to achieve as better as possible performance..."

Usually, big or small companies need to balance:

What markets demand?

What compatitors do?

What some customers want or expect?

And what is going on now? There is a growing demand on more powerful and energy efficient CPUs to run

bigger (!) versions of different "mobile"and "desktop" OSs.

Iliya, you mentioned a couple of times that some function calculates a result in ~120 clock cycles.

Let's put it on the LEFT side of some "Magic Scale". Let's assume that some bigcompany added a hardware

support for that special function in its CPUand it allows to get the same result in ~60 clock cycles. We put it

on the RIGHT side of our "Magic Scale". But, that is not everything and a cost, something like $500,000,000 USD,

will need to be added on the RIGHT side as well. This is because company needs to complete R&D, testing,

verifications, different production related tasks, and during these times salaries must be paid.

Personally, I would be glad to see a hardware accelerated matrix multiplication for matrices with

sizes up to 1,024x1,024.

Best regards,

Sergey

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.