- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
I would like to ask Intel's employees on this forum.Why IntelCPU architects have never implemented in hardware some of themore "popular" Special Functions like'GAMMA','BETA' and various 'BESSEL' functions of an integer order.All these functions could have been accessed byx87 ISAinstructions.
Ссылка скопирована
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
It is simply amazing how the CPU processing power increased over the period of 20 years.on a quad coreSandy Bridgeit will be something like 30M-60M 100 samples polygons per second(i.e. 1000x more) with a dumb Z-buffer algorithm (CPU only using all cores and fully vectorized AVX code
>>here is an example with 49M polygons with per sample normal interpolation and reflection mapping
By the looking at the Robots surface what interpolation has been used in order to calculate the samples.
Was it bi-cubic interpolation , albeit costly but it can add significally smoother surface colourtransition.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
By the looking at the Robots surface what interpolation has been used in order to calculate the samples.
Was it bi-cubic interpolation
normals are bilinearly interpolated in world spaceand thereflection map isbilinearly interpolated in texture space, bicubic interpolation is useful mostly for texture magnification but will be overkill here IMHO
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Still the Demo programmers were able to achieve smooth transitions.Maybe usage of bilinear interpolation is compensated by the high level of details and high frequency sampling?normals are bilinearly interpolated in world spaceand thereflection map isbilinearly interpolated in texture space, bicubic interpolation is useful mostly for texture magnification but will be overkill here IMHO
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
...yes, in other words no FPU
Not so efficient for the real-time applications based on heavy usage of fp instructions.
29K family microcontrollers designed for embedded systems, like laser printers, scaners, X terminals and
these microcontrollersdon't have FPU in order to reduce a cost of system integration.Even if FP-instructions
on these microcontrollers cause "lightweight"Trapsonly ~3 clock cycles are needed tocomplete a vector fetch.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
yes bilinear interpolation is fine for reflection maps since there is generally no magnification but a very high frequency samplingin texture space instead (due to the wild variation of normal directions), the sampling scheme is thus paramount for good quality, adaptive stochastic antialiasing in this example when you don't move the mouseStill the Demo programmers were able to achieve smooth transitions.Maybe usage of bilinear interpolation is compensated by the high level of details and high frequency sampling?
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
integration. Even if FP-instructions on these microcontrollers cause "lightweight" Traps only ~3 clock cycles are needed to complete a vector fetch.
what was a "vector fetch" on such anancient purely scalar chip?
btw, do you knowhow many cycles were required foremulating basic fp instructions like FADD and FMUL? FMUL was particularly slow due to the lack of integer multiplier AFAIK
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
adaptive stochastic antialiasing in this example when you don't move the mouse
Adaptive stochastoc antialising is very good at minimizing computational cost and memory bandwidth,but at the cost of some irregular sampling pattern introduced by the random(stochastic) sampling.
What is the sampling filter used in the Robots demo?
Is this simple box filter or sinc filter?
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
29K family microcontrollers designed for embedded systems, like laser printers, scaners, X terminals and
For intensive floating-point application the better option is to use Texas Instruments SHARC microprocessors.
But even these DSP microprocessors do not have some special functions directly implemented in hardware.
I think that we can come to conclusion that none of the general purposeDSP implememtssuch functions in the hardware and microprocessors useinstead software libraries.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
the reconstruction filter is a box in this example, it's generally the best filter for low resolution raster images since other filterssuch as Gaussian andraised cosine lead to too much bluring and 2-3 lobes Lanczos too much ringing(note thatwe have these alternate reconstruction filters availablewith auserselectablefilter radius)What is the sampling filter used in the Robots demo?
Is this simple box filter or sinc filter?
NB: sinc is a theoretical filter, not something you can use in practice with a realworld FIR filter kernel
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
at low sampling frequency yes (though I'll prefer raised cosine or Lanczos 2 over Gaussian), but with adaptive sampling you have a high sampling frequency in high frequency signal areas thus the local radius of your reconstruction filter must be very small to avoid excessive bluring and you will missin this case the farthestsamples (to the pixel center) within the pixel areaGussian filter in random sampling could lead to the better results than box filter.
pixels on your screen arerectangular areas after all so a box reconstruction filter makes more sense in practice (with supersampling) thansome theoretical texts may let you think when reasoning about only discretereconstruction samples
anyway this is user selectable, and, since you have to ask, the default look is probably notthat bad
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
IIRC this was mentioned in a private post@bronxzv Slightly off topic question.In one of your posts you mentjoned that Kribi project has 500 timers based on rdtsc instruction.Is it possible to post the proper usage of rdtsc instruction in those timers.It could be posted as a code template. Best regards Iliya
unfortunately this isa closed source framework so I'm not allowed to post source code from it
thekey advantage isthat itmakes it easy to install nested stopwatches in our source code with a simple (single line) notation, then after profile runs it reports detailed number of cycles and % in a nicely formatted report, for examplewith indentation for inner timings
for the actual usage of RDTSC it's simply using the advices from the paper I posted the other day, nothing special or innovative there
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
integration. Even if FP-instructions on these microcontrollers cause "lightweight" Traps only ~3 clock cycles are needed to complete a vector fetch.
what was a "vector fetch" on such anancient purely scalar chip?..
In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table.
>>...do you knowhow many cycles were required foremulating basic fp instructions like FADD and FMUL?..
No. I just checked a 29K familty User's Manual and I have not found any technical details regarding "number of cycles to execute an instruction".
Best regards,
Sergey
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Why it is called "vector table".This is simply a data structure which holds a scalar values not vector values.In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
so it's only (part of?) the time to branch to the trap hander, it tells us nothing about the speed of the actual routineIn another words, every time when an interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table.
No. I just checked a 29K familty User's Manit was probably something like 50-100 cyclesforFP32 FMUL and FADD (based on my past experience writing FP emulation routines)
ual and I have not found any technical details regarding "number of cycles to execute an instruction".
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Yes I know this.
My question was slightly different.Members of IDT(IVT in DOS)are addresses i.e single binary number representing an address in the memory.Judging by the definition of the vector each IDT's entry should have been composed from a few values(addresses),but this is not the case.
I do not know why Intel decided to call it a vector.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
ah I see what you mean, I have no idea why it's called a vector, I'll consider the whole table as a vector but noteach individual address as you said, unlike the common usageMy question was slightly different.Members of IDT(IVT in DOS) are addresses i.e single binary number representing an address in the memory.Judging by the definition of the vector each IDT's entry should have been composed from a few values(addresses),but this is not the case.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Every IDT's "vector" point to the 8-byte descriptor which itself could be represented as a vector composed from various fields.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Why it is called "vector table"...In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table
I think AMDusesa "Vector Table" termbecause Intel calls a similar structure as an"Interrupt Descriptor Table".
It looks like this is a "War of Terms" and the same applies to Oracle and Informix, etc.

- Подписка на RSS-канал
- Отметить тему как новую
- Отметить тему как прочитанную
- Выполнить отслеживание данной Тема для текущего пользователя
- Закладка
- Подписаться
- Страница в формате печати