Hardware acceleration of Special Functions. - Page 2

Bernard · ‎06-30-2012

Hi!
I would like to ask Intel's employees on this forum.Why IntelCPU architects have never implemented in hardware some of themore "popular" Special Functions like'GAMMA','BETA' and various 'BESSEL' functions of an integer order.All these functions could have been accessed byx87 ISAinstructions.

Bernard · ‎07-04-2012

I don't think that Intel will add a such set of instructions. These functions are "Special" and they are not "Fundamental".
Intel clearly made a statement: "Use SSE or AVX to achieve as better as possible performance

Sergey
After reading your answer and answers from the other posters I have came to the same conclusion like you and other people answering this thread.Knowing that various special functions can be very accurately represented by Taylor series which in turn can be implemented at machine code level as a set of additions and multiplication with coefficients pre-calculation can eliminate the burden of micro-code level hardware implementation of special functions.
SSE/AVX instruction are perfect fundamental building blocks for such a implementations.

I think that it does make sense to implement in hardware for example Bessel functions,but it must besome kind of chip or controllerused to control modulation/demodulation FMcircuits.

SergeyKostrov · ‎07-05-2012

Hi Iliya,

Quoting iliyapolak

...I think that it does make sense to implement in hardware for example Bessel functions,but it must besome
kind of chip or controllerused to control modulation/demodulation FMcircuits.

I won't be surprised if specializedCPUs for DSP have it already. I would personallyput Intel CPUs for personal
computers into a category "General Purpose CPUs".

Best regards,
Sergey

Bernard · ‎07-06-2012

Did quick search on the web but was not able any DSP which implements in hardware some of the special functions.Probably easier is to use custom ISA instructions like Intel x87 or SSE/AVX.

SergeyKostrov · ‎07-07-2012

Quoting iliyapolak

Did quick search on the web but was not able any DSP which implements in hardware some of the special functions...

Did you try to look at websites of these companies:

- AMD (I rememberAMD had a great set of RISC microcontrollers, like Am29200, Am29205, etc )
- Texas Instruments
- Motorolla
- Marvell

You could also try to email tothese companies.

Best regards,
Sergey

Bernard · ‎07-08-2012

AMD (I rememberAMD had a great set of RISC microcontrollers, like Am29200, Am29205

I have read Amd manual and found that info regarding microcontroller's ISA andit was clearly written that this controller does not have even FPU.
Checked also TI DSP's, but these also havebasic floating-pont and integer ISA.
Probably some special gear like:FM modulators and bessel filters can use bessel functionthe question is how was it implemented in such a complex gear.I suppose that General Purpose TIDSP does not have dedicated hardware accelerated Bessel functions , but it implements a software approximation.

SergeyKostrov · ‎07-08-2012

Quoting iliyapolak

AMD (I rememberAMD had a great set of RISC microcontrollers, like Am29200, Am29205
I have read Amd manual and found that info regarding microcontroller's ISA andit was clearly written that this controller does not have even FPU...

You could call to AMD to verify that Am29200 & Am29205 RISC microcontrollershave Floating Point single-precision ( 32-bit ) and
double-precision ( 64-bit ) instructions ( 18 instructionsin total ).

There is a book "29K Family" ( AMD / User Manual/ 1994 year ) and it has lots of technical details about these RISC microcontrollers.

Bernard · ‎07-08-2012

You could call to AMD to verify that Am29200 & Am29205 RISC microcontrollershave Floating Point single-precision ( 32-bit ) and
double-precision ( 64-bit ) instructions ( 18 instructionsin total ).

Probably was not reading the right manual.

Even if those microcontrollers have floating-point instructions it is hard to believe that Amd engineers implemented in hardware some of the special functions.
It is easier to provide simple fp arithmetic instruction and use polynomial or rational approximation to calculate special functions values.

SergeyKostrov · ‎07-09-2012

Quoting Sergey Kostrov

...Am29200 & Am29205 RISC microcontrollershave Floating Point single-precision ( 32-bit ) and
double-precision ( 64-bit ) instructions ( 18 instructionsin total )...

Single-precision instructions:

FADD
FSUB
FMUL
FDIV
FEQ
FGE
FGT

Double-precision instructions:

DADD
DSUB
FDMUL
DMUL
DDIV
DEG
DGE
DGT

Other:

SQRT
CONVERT
CLASS

SergeyKostrov · ‎07-09-2012

Quoting iliyapolak

...it is hard to believe that Amd engineers implemented in hardware some of the special functions...

That is correct and they provided a CRT-library of standard functions.

bronxzv · ‎07-10-2012

Single-precision instructions:
FADD
FSUB
FMUL
FDIV
FEQ
FGE
FGT
Double-precision instructions:
DADD
DSUB
FDMUL
DMUL
DDIV
DEG
DGE
DGT
Other:
SQRT
CONVERT
CLASS

yes but allimplemented as slowsoftware emulation (software traps much like x87 codeon386 and 486SX), isn'it?

Bernard · ‎07-10-2012

yes but allimplemented as slowsoftware emulation (software traps much like x87 codeon386 and 486SX), isn'it?

IIRC the Amd microcontroller'sspecification which I have read clearly stated that floating-point instruction are executed by trap handlers.
Here is the excerpt from the official AMD document:

"Am29200 and Am29205 RISC Microcontrollers"

(copied from the pdf document):

The floating point

instructions are not executed directly, but are emulated

by trap handlers

bronxzv · ‎07-10-2012

yes, in other words no FPU

Bernard · ‎07-10-2012

yes, in other words no FPU

When you think about this , there is no FPU and basic arithmetic floating-point instructions have to be emulated by the software and on top of this various more complicated approximations (sine,cosine ,atan...) are implemented by the software library which in turn calls into floating-pointtrap handlers to calculate primitive fp instructions like :fadd and fmul.
Not so efficient for the real-time applications based on heavy usage of fp instructions.

bronxzv · ‎07-10-2012

When you think about this , there is no FPU and basic arithmetic floating-point instructions have to be emulated by the software

in these ancient times even hardware support for integer multiplication wasn't always agiven so the multiplication of the mantissas wasa critical part of your floating pointemulation routines

Bernard · ‎07-10-2012

in these ancient times

This reminds me a book on computer graphics written by Foley(did not remember the title) where the author describes some 2Dvideo hardwareengine.

bronxzv · ‎07-10-2012

This reminds me a book on computer graphics written by Foley(did not remember the title) where the author describes some 2D video hardware engine .

I still have Foley et al. Computer Graphics: Principles and Practice 2nd Edition, Addison Wesley 1990, there is an overview of the Silicon Graphics Power IRIS 4D/240GTX architecture which is 3D already

IIRC the most used chip for general purpose 3D in the early 90s was the Intel i860 "Cray on a chip", used on a lot of 3D accelerators, notably #9 and Dupont Pixel offering

the usage of the Am292xx series discussed here (like the Intel i960) for graphics were mostly for 2D raster applications like laser printers, scanners, etc.

SergeyKostrov · ‎07-10-2012

Quoting bronxzv

...yes but allimplemented as slowsoftware emulation (software traps much like x87 codeon386 and 486SX), isn'it?

I'll need toverify it.

Bernard · ‎07-10-2012

I have the second edition of the Foley book.This book is outdated but has a very good intro on computer graphics theory. The third edition is expected 1/2013.The video adapter described in second edition is used to mostly to create 2D raster images on the screen.

bronxzv · ‎07-10-2012

The video adapter described in second edition is used to mostly to create 2D raster images on the screen.

I was refering to chapter 18 "Advanced Raster Graphics Architecture" which is mostly about hardware architectures for 3D rendering based on a standard graphics pipeline, it looks like you haveanother chapter in mind, probablychapter 4 "Graphics Hardware"

Bernard · ‎07-10-2012

it looks like you haveanother chapter in mind, probablychapter 4 "Graphics Hardware"

Yes I was reffering to the chapter 4"Graphics Hardware".

Intel i860 was very advanced at those days.
Hereare one of the few benchamrks published in Foley's book.
13MFLOP of double precision instructions.Today one processing core can exceed this speed by 1000x.
50,000 Gourad-shaded 100-pixel triangles per second.What could be an average speed measured in Gourad-shaded 100-pixels triangles when executed on Intel Sandy-Bridge CPU?

bronxzv · ‎07-10-2012

50,000 Gourad-shaded 100-pixel triangles per second.What could be an average speed measured in Gourad-shaded 100-pixels triangles when executed on Intel Sandy-Bridge CPU?

on a quad coreSandy Bridgeit will be something like 30M-60M 100 samples polygons per second(i.e. 1000x more) with a dumb Z-buffer algorithm (CPU only using all cores and fully vectorized AVX code)

here is an example with 49M polygons with per sample normal interpolation and reflection mapping(morecostly than Gouraud shading)that runs at 20+ fps (~ 1G polygon/second apparent) on a quad core Sandy Bridge:
http://www.inartis.com/Company/Lab/KribiBenchmark/KB_Robots.aspx

it's possible thanks to scene graph traversal optimizations such as occlusion culling where most polygons are not actually drawn