Hi,
We would like to thank you for your suggestions.
Some of these suggestions have been raised internally as well, and are considered for the next versions of the SDK.
About the specific proposal to allow asm functions accessed directly inside kernels, I do not believe that we will want to go in this direction. Intel's direction is to promotethe cross-device approach of OpenCL, and this proposal goes against it. The preferred direction is improving the compiler, making sure that the mapping to assembler instructions is efficient. I believe that the additions made to the OCL C language improve the complier's ability to reach this goal.
However, the direction of adding new built-in functions, which map well to SSE instructions, is interesting. We do see cases where a code sequence can be efficiently replaced by a call to SSE instruction - and the method that we prefer is to expose it as a built-in function. this preserves the approach of C language, and is also forward compatible - on future ISA, this built-in can be replaced by the JIT complier in a new instruction.
链接已复制
Another one:
*Allow asm("") function being able to insert x86 assembly code in kernels
CUDA allows asm function containing PTX code inside CUDA device functions..
Hi,
We would like to thank you for your suggestions.
Some of these suggestions have been raised internally as well, and are considered for the next versions of the SDK.
About the specific proposal to allow asm functions accessed directly inside kernels, I do not believe that we will want to go in this direction. Intel's direction is to promotethe cross-device approach of OpenCL, and this proposal goes against it. The preferred direction is improving the compiler, making sure that the mapping to assembler instructions is efficient. I believe that the additions made to the OCL C language improve the complier's ability to reach this goal.
However, the direction of adding new built-in functions, which map well to SSE instructions, is interesting. We do see cases where a code sequence can be efficiently replaced by a call to SSE instruction - and the method that we prefer is to expose it as a built-in function. this preserves the approach of C language, and is also forward compatible - on future ISA, this built-in can be replaced by the JIT complier in a new instruction.
