- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to ask Intel's employees on this forum.Why IntelCPU architects have never implemented in hardware some of themore "popular" Special Functions like'GAMMA','BETA' and various 'BESSEL' functions of an integer order.All these functions could have been accessed byx87 ISAinstructions.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is simply amazing how the CPU processing power increased over the period of 20 years.on a quad coreSandy Bridgeit will be something like 30M-60M 100 samples polygons per second(i.e. 1000x more) with a dumb Z-buffer algorithm (CPU only using all cores and fully vectorized AVX code
>>here is an example with 49M polygons with per sample normal interpolation and reflection mapping
By the looking at the Robots surface what interpolation has been used in order to calculate the samples.
Was it bi-cubic interpolation , albeit costly but it can add significally smoother surface colourtransition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By the looking at the Robots surface what interpolation has been used in order to calculate the samples.
Was it bi-cubic interpolation
normals are bilinearly interpolated in world spaceand thereflection map isbilinearly interpolated in texture space, bicubic interpolation is useful mostly for texture magnification but will be overkill here IMHO
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Still the Demo programmers were able to achieve smooth transitions.Maybe usage of bilinear interpolation is compensated by the high level of details and high frequency sampling?normals are bilinearly interpolated in world spaceand thereflection map isbilinearly interpolated in texture space, bicubic interpolation is useful mostly for texture magnification but will be overkill here IMHO
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
...yes, in other words no FPU
Not so efficient for the real-time applications based on heavy usage of fp instructions.
29K family microcontrollers designed for embedded systems, like laser printers, scaners, X terminals and
these microcontrollersdon't have FPU in order to reduce a cost of system integration.Even if FP-instructions
on these microcontrollers cause "lightweight"Trapsonly ~3 clock cycles are needed tocomplete a vector fetch.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes bilinear interpolation is fine for reflection maps since there is generally no magnification but a very high frequency samplingin texture space instead (due to the wild variation of normal directions), the sampling scheme is thus paramount for good quality, adaptive stochastic antialiasing in this example when you don't move the mouseStill the Demo programmers were able to achieve smooth transitions.Maybe usage of bilinear interpolation is compensated by the high level of details and high frequency sampling?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
integration. Even if FP-instructions on these microcontrollers cause "lightweight" Traps only ~3 clock cycles are needed to complete a vector fetch.
what was a "vector fetch" on such anancient purely scalar chip?
btw, do you knowhow many cycles were required foremulating basic fp instructions like FADD and FMUL? FMUL was particularly slow due to the lack of integer multiplier AFAIK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
adaptive stochastic antialiasing in this example when you don't move the mouse
Adaptive stochastoc antialising is very good at minimizing computational cost and memory bandwidth,but at the cost of some irregular sampling pattern introduced by the random(stochastic) sampling.
What is the sampling filter used in the Robots demo?
Is this simple box filter or sinc filter?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
29K family microcontrollers designed for embedded systems, like laser printers, scaners, X terminals and
For intensive floating-point application the better option is to use Texas Instruments SHARC microprocessors.
But even these DSP microprocessors do not have some special functions directly implemented in hardware.
I think that we can come to conclusion that none of the general purposeDSP implememtssuch functions in the hardware and microprocessors useinstead software libraries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the reconstruction filter is a box in this example, it's generally the best filter for low resolution raster images since other filterssuch as Gaussian andraised cosine lead to too much bluring and 2-3 lobes Lanczos too much ringing(note thatwe have these alternate reconstruction filters availablewith auserselectablefilter radius)What is the sampling filter used in the Robots demo?
Is this simple box filter or sinc filter?
NB: sinc is a theoretical filter, not something you can use in practice with a realworld FIR filter kernel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
at low sampling frequency yes (though I'll prefer raised cosine or Lanczos 2 over Gaussian), but with adaptive sampling you have a high sampling frequency in high frequency signal areas thus the local radius of your reconstruction filter must be very small to avoid excessive bluring and you will missin this case the farthestsamples (to the pixel center) within the pixel areaGussian filter in random sampling could lead to the better results than box filter.
pixels on your screen arerectangular areas after all so a box reconstruction filter makes more sense in practice (with supersampling) thansome theoretical texts may let you think when reasoning about only discretereconstruction samples
anyway this is user selectable, and, since you have to ask, the default look is probably notthat bad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
IIRC this was mentioned in a private post@bronxzv Slightly off topic question.In one of your posts you mentjoned that Kribi project has 500 timers based on rdtsc instruction.Is it possible to post the proper usage of rdtsc instruction in those timers.It could be posted as a code template. Best regards Iliya
unfortunately this isa closed source framework so I'm not allowed to post source code from it
thekey advantage isthat itmakes it easy to install nested stopwatches in our source code with a simple (single line) notation, then after profile runs it reports detailed number of cycles and % in a nicely formatted report, for examplewith indentation for inner timings
for the actual usage of RDTSC it's simply using the advices from the paper I posted the other day, nothing special or innovative there
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
integration. Even if FP-instructions on these microcontrollers cause "lightweight" Traps only ~3 clock cycles are needed to complete a vector fetch.
what was a "vector fetch" on such anancient purely scalar chip?..
In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table.
>>...do you knowhow many cycles were required foremulating basic fp instructions like FADD and FMUL?..
No. I just checked a 29K familty User's Manual and I have not found any technical details regarding "number of cycles to execute an instruction".
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why it is called "vector table".This is simply a data structure which holds a scalar values not vector values.In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so it's only (part of?) the time to branch to the trap hander, it tells us nothing about the speed of the actual routineIn another words, every time when an interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table.
No. I just checked a 29K familty User's Manit was probably something like 50-100 cyclesforFP32 FMUL and FADD (based on my past experience writing FP emulation routines)
ual and I have not found any technical details regarding "number of cycles to execute an instruction".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes I know this.
My question was slightly different.Members of IDT(IVT in DOS)are addresses i.e single binary number representing an address in the memory.Judging by the definition of the vector each IDT's entry should have been composed from a few values(addresses),but this is not the case.
I do not know why Intel decided to call it a vector.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ah I see what you mean, I have no idea why it's called a vector, I'll consider the whole table as a vector but noteach individual address as you said, unlike the common usageMy question was slightly different.Members of IDT(IVT in DOS) are addresses i.e single binary number representing an address in the memory.Judging by the definition of the vector each IDT's entry should have been composed from a few values(addresses),but this is not the case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Every IDT's "vector" point to the 8-byte descriptor which itself could be represented as a vector composed from various fields.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why it is called "vector table"...In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table
I think AMDusesa "Vector Table" termbecause Intel calls a similar structure as an"Interrupt Descriptor Table".
It looks like this is a "War of Terms" and the same applies to Oracle and Informix, etc.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page