Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1093 Discussions

C++-implementation of the Larrabee new instructions

Thomas_W_Intel
Employee
375 Views
Hello,

You might be interested to know that there is a C++ implementation of the Larrabee new instructions available at http://software.intel.com/en-us/articles/prototype-primitives-guide/.

It allows you to implement prototypes using the Larrabee new instructions without the needfor special compilers or hardware.

Have fun
Thomas
0 Kudos
5 Replies
capens__nicolas
New Contributor I
375 Views
You might be interested to know that there is a C++ implementation of the Larrabee new instructions available at http://software.intel.com/en-us/articles/prototype-primitives-guide/.

It allows you to implement prototypes using the Larrabee new instructions without the needfor special compilers or hardware.

Awesome! I have a few questions though...

Does the precision of special functions (log, exp, sqrt, etc.) of the hardware match that of the C++ implementation, or should we take some approximation into consideration?

Also, how can the texture samplers be accessed? Or are these considered not useful for generic programming and only accessible to the graphics driver developers?
0 Kudos
kevin-bray
Beginner
375 Views
Quoting - c0d1f1ed

Awesome! I have a few questions though...

Does the precision of special functions (log, exp, sqrt, etc.) of the hardware match that of the C++ implementation, or should we take some approximation into consideration?

Also, how can the texture samplers be accessed? Or are these considered not useful for generic programming and only accessible to the graphics driver developers?
Looking over the primitives, some instructions popped out at me as "texture access" instructions:

To me, GATHERD and GATHERFP stand out as texture addressing functions. From there, it looks like you would do filtering on your own. It also looks like you'd need to calculate the fetch address from UVs as well. Also, it looks like things such as texture swizzling (for optimized bandwidth) would be done manually as well through the BITINTERLEAVE11_PI instruction.

By the way Intel, this *seriously* rocks. =)

Kevin B

0 Kudos
capens__nicolas
New Contributor I
374 Views
Quoting - Kevin Bray
To me, GATHERD and GATHERFP stand out as texture addressing functions.

Diagrams of Larrabee show dedicated texture sampling units, so I doubt they use gather for it. It's not just filtering, but also mipmap LOD computation, texture decompression, anisotropic filtering, format conversion, etc. which isn't very efficient in software. To have any chance of competing with other GPUs I believe fixed-function texture sampling still makes sense. That might change as other GPUs also become more flexible and when texture filtering becomes only a minor task (compared to arithmetic work, bandwidth, ...).

Anyway, the slides seem to suggest that the hardware hasn't been fully designed yet, so they can still make changes to the texture samplers, but they already publish the specifications of the arithmetic instructions for generic computing.
0 Kudos
kevin-bray
Beginner
374 Views
Quoting - c0d1f1ed

Diagrams of Larrabee show dedicated texture sampling units, so I doubt they use gather for it. It's not just filtering, but also mipmap LOD computation, texture decompression, anisotropic filtering, format conversion, etc. which isn't very efficient in software. To have any chance of competing with other GPUs I believe fixed-function texture sampling still makes sense. That might change as other GPUs also become more flexible and when texture filtering becomes only a minor task (compared to arithmetic work, bandwidth, ...).

Anyway, the slides seem to suggest that the hardware hasn't been fully designed yet, so they can still make changes to the texture samplers, but they already publish the specifications of the arithmetic instructions for generic computing.

Yeah, you're probably right about having dedicated sampling units and such. I can see how to do simple bilinear and trilinear filtering with these instructions, but it doesn't seem like DXT decompression or anisotropic filtering could be achieved quickly enough with these instructions alone. Besides, dedicated hardware for stuff like that can only be a good thing. =)

The log2 instruction would provide the mip-level determination, so I don't see that as a problem. Also, the gather instruction can perform several different kinds of format conversion (see _MM_FULLUP32_ENUM). Another thing to note is that the performance cost of these operations is pretty much unknown at this point. Whether or not various operations (filtering, DXT decompression) will be fast or slow in software depends on the throughput of these instructions. Also, given that these instructions operate on really wide registers, plus the fact that there will be a lot of cores in one of these, there might be enough raw throughput that filtering/decoding hardware wouldn't matter (although custom management of caches would probably be required for something like that). At this point, I'm really interested to see what we end up with.

Also, as far as competing with other GPUs is concerned, it appears to me that they're competing on the point of flexibility. Even if it's not competitive with GPUs in terms of performance, it still might be fast enough to provide reasonable frame rates, while at the same time provide access to algorithms that are currently impossible on modern GPUs. That could open the door to novel realtime rendering techniques which could potentially provide higher quality than their brute force counter-parts.

Kevin B
0 Kudos
Thomas_W_Intel
Employee
375 Views
Hello,

In addition, there are slides available from the Game Developer Conferenceand Dr.Dobbs has published an article about the new instructions.

Kind regards
Thomas
0 Kudos
Reply