Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1135 Discussions

Hardware acceleration of Special Functions.

Bernard
Valued Contributor I
8,623 Views
Hi!
I would like to ask Intel's employees on this forum.Why IntelCPU architects have never implemented in hardware some of themore "popular" Special Functions like'GAMMA','BETA' and various 'BESSEL' functions of an integer order.All these functions could have been accessed byx87 ISAinstructions.
0 Kudos
70 Replies
Bernard
Valued Contributor I
1,639 Views

I don't think that Intel will add a such set of instructions. These functions are "Special" and they are not "Fundamental".
Intel clearly made a statement: "Use SSE or AVX to achieve as better as possible performance


Sergey
After reading your answer and answers from the other posters I have came to the same conclusion like you and other people answering this thread.Knowing that various special functions can be very accurately represented by Taylor series which in turn can be implemented at machine code level as a set of additions and multiplication with coefficients pre-calculation can eliminate the burden of micro-code level hardware implementation of special functions.
SSE/AVX instruction are perfect fundamental building blocks for such a implementations.

I think that it does make sense to implement in hardware for example Bessel functions,but it must besome kind of chip or controllerused to control modulation/demodulation FMcircuits.
0 Kudos
SergeyKostrov
Valued Contributor II
1,639 Views
Hi Iliya,

Quoting iliyapolak
...I think that it does make sense to implement in hardware for example Bessel functions,but it must besome
kind of chip or controllerused to control modulation/demodulation FMcircuits.


I won't be surprised if specializedCPUs for DSP have it already. I would personallyput Intel CPUs for personal
computers into a category "General Purpose CPUs".

Best regards,
Sergey

0 Kudos
Bernard
Valued Contributor I
1,639 Views
Did quick search on the web but was not able any DSP which implements in hardware some of the special functions.Probably easier is to use custom ISA instructions like Intel x87 or SSE/AVX.
0 Kudos
SergeyKostrov
Valued Contributor II
1,639 Views
Quoting iliyapolak
Did quick search on the web but was not able any DSP which implements in hardware some of the special functions...

Did you try to look at websites of these companies:

- AMD (I rememberAMD had a great set of RISC microcontrollers, like Am29200, Am29205, etc )
- Texas Instruments
- Motorolla
- Marvell

You could also try to email tothese companies.

Best regards,
Sergey
0 Kudos
Bernard
Valued Contributor I
1,639 Views

AMD (I rememberAMD had a great set of RISC microcontrollers, like Am29200, Am29205

I have read Amd manual and found that info regarding microcontroller's ISA andit was clearly written that this controller does not have even FPU.
Checked also TI DSP's, but these also havebasic floating-pont and integer ISA.
Probably some special gear like:FM modulators and bessel filters can use bessel functionthe question is how was it implemented in such a complex gear.I suppose that General Purpose TIDSP does not have dedicated hardware accelerated Bessel functions , but it implements a software approximation.
0 Kudos
SergeyKostrov
Valued Contributor II
1,639 Views
Quoting iliyapolak

AMD (I rememberAMD had a great set of RISC microcontrollers, like Am29200, Am29205

I have read Amd manual and found that info regarding microcontroller's ISA andit was clearly written that this controller does not have even FPU...

You could call to AMD to verify that Am29200 & Am29205 RISC microcontrollershave Floating Point single-precision ( 32-bit ) and
double-precision ( 64-bit ) instructions ( 18 instructionsin total ).

There is a book "29K Family" ( AMD / User Manual/ 1994 year ) and it has lots of technical details about these RISC microcontrollers.
0 Kudos
Bernard
Valued Contributor I
1,639 Views

You could call to AMD to verify that Am29200 & Am29205 RISC microcontrollershave Floating Point single-precision ( 32-bit ) and
double-precision ( 64-bit ) instructions ( 18 instructionsin total ).

Probably was not reading the right manual.

Even if those microcontrollers have floating-point instructions it is hard to believe that Amd engineers implemented in hardware some of the special functions.
It is easier to provide simple fp arithmetic instruction and use polynomial or rational approximation to calculate special functions values.
0 Kudos
SergeyKostrov
Valued Contributor II
1,639 Views
...Am29200 & Am29205 RISC microcontrollershave Floating Point single-precision ( 32-bit ) and
double-precision ( 64-bit ) instructions (
18 instructionsin total )...

Single-precision instructions:

FADD
FSUB
FMUL
FDIV
FEQ
FGE
FGT

Double-precision instructions:

DADD
DSUB
FDMUL
DMUL
DDIV
DEG
DGE
DGT

Other:

SQRT
CONVERT
CLASS

0 Kudos
SergeyKostrov
Valued Contributor II
1,639 Views
Quoting iliyapolak
...it is hard to believe that Amd engineers implemented in hardware some of the special functions...

That is correct and they provided a CRT-library of standard functions.
0 Kudos
bronxzv
New Contributor II
1,639 Views

Single-precision instructions:

FADD

FSUB

FMUL

FDIV

FEQ

FGE

FGT

Double-precision instructions:

DADD

DSUB

FDMUL

DMUL

DDIV

DEG

DGE

DGT

Other:

SQRT

CONVERT

CLASS


yes but allimplemented as slowsoftware emulation (software traps much like x87 codeon386 and 486SX), isn'it?
0 Kudos
Bernard
Valued Contributor I
1,639 Views

yes but allimplemented as slowsoftware emulation (software traps much like x87 codeon386 and 486SX), isn'it?

IIRC the Amd microcontroller'sspecification which I have read clearly stated that floating-point instruction are executed by trap handlers.
Here is the excerpt from the official AMD document:

"Am29200 and Am29205 RISC Microcontrollers"

(copied from the pdf document):

The floating point

instructions are not executed directly, but are emulated

by trap handlers

0 Kudos
bronxzv
New Contributor II
1,639 Views
yes, in other words no FPU
0 Kudos
Bernard
Valued Contributor I
1,639 Views

yes, in other words no FPU

When you think about this , there is no FPU and basic arithmetic floating-point instructions have to be emulated by the software and on top of this various more complicated approximations (sine,cosine ,atan...) are implemented by the software library which in turn calls into floating-pointtrap handlers to calculate primitive fp instructions like :fadd and fmul.
Not so efficient for the real-time applications based on heavy usage of fp instructions.
0 Kudos
bronxzv
New Contributor II
1,639 Views

When you think about this , there is no FPU and basic arithmetic floating-point instructions have to be emulated by the software

in these ancient times even hardware support for integer multiplication wasn't always agiven so the multiplication of the mantissas wasa critical part of your floating pointemulation routines
0 Kudos
Bernard
Valued Contributor I
1,639 Views

in these ancient times

This reminds me a book on computer graphics written by Foley(did not remember the title) where the author describes some 2Dvideo hardwareengine.
0 Kudos
bronxzv
New Contributor II
1,639 Views

This reminds me a book on computer graphics written by Foley(did not remember the title) where the author describes some 2D video hardware engine .


I still have Foley et al. Computer Graphics: Principles and Practice 2nd Edition, Addison Wesley 1990, there is an overview of the Silicon Graphics Power IRIS 4D/240GTX architecture which is 3D already

IIRC the most used chip for general purpose 3D in the early 90s was the Intel i860 "Cray on a chip", used on a lot of 3D accelerators, notably #9 and Dupont Pixel offering

the usage of the Am292xx series discussed here (like the Intel i960) for graphics were mostly for 2D raster applications like laser printers, scanners, etc.
0 Kudos
SergeyKostrov
Valued Contributor II
1,639 Views
Quoting bronxzv
...yes but allimplemented as slowsoftware emulation (software traps much like x87 codeon386 and 486SX), isn'it?


I'll need toverify it.

0 Kudos
Bernard
Valued Contributor I
1,639 Views
I have the second edition of the Foley book.This book is outdated but has a very good intro on computer graphics theory. The third edition is expected 1/2013.The video adapter described in second edition is used to mostly to create 2D raster images on the screen.
0 Kudos
bronxzv
New Contributor II
1,639 Views

The video adapter described in second edition is used to mostly to create 2D raster images on the screen.


I was refering to chapter 18 "Advanced Raster Graphics Architecture" which is mostly about hardware architectures for 3D rendering based on a standard graphics pipeline, it looks like you haveanother chapter in mind, probablychapter 4 "Graphics Hardware"
0 Kudos
Bernard
Valued Contributor I
1,630 Views

it looks like you haveanother chapter in mind, probablychapter 4 "Graphics Hardware"

Yes I was reffering to the chapter 4"Graphics Hardware".

Intel i860 was very advanced at those days.
Hereare one of the few benchamrks published in Foley's book.
13MFLOP of double precision instructions.Today one processing core can exceed this speed by 1000x.
50,000 Gourad-shaded 100-pixel triangles per second.What could be an average speed measured in Gourad-shaded 100-pixels triangles when executed on Intel Sandy-Bridge CPU?
0 Kudos
bronxzv
New Contributor II
1,630 Views

50,000 Gourad-shaded 100-pixel triangles per second.What could be an average speed measured in Gourad-shaded 100-pixels triangles when executed on Intel Sandy-Bridge CPU?


on a quad coreSandy Bridgeit will be something like 30M-60M 100 samples polygons per second(i.e. 1000x more) with a dumb Z-buffer algorithm (CPU only using all cores and fully vectorized AVX code)

here is an example with 49M polygons with per sample normal interpolation and reflection mapping(morecostly than Gouraud shading)that runs at 20+ fps (~ 1G polygon/second apparent) on a quad core Sandy Bridge:
http://www.inartis.com/Company/Lab/KribiBenchmark/KB_Robots.aspx

it's possible thanks to scene graph traversal optimizations such as occlusion culling where most polygons are not actually drawn
0 Kudos
Reply