Solved: Some suggestions!

rtfss1gmail_com · ‎11-23-2010

Hi,

Some suggestions for improving this good release:

For completeness:

*Add 3d_image_writes support as AMD GPU backend supports it and I have some demo using it..

*Add D3D10 interop: cl_khr_d3d10_sharing similar to OpenGL interop so some Nvidia/AMD samples work too..

comparing to AMD:

*Addcl_ext_device_fission so we can expose multiple concurrent kernels,etc..

more ambitious:

Add next-gen computing features (as featured in CUDA 3.x) :

*(support for no inlined functions with stack) brings Function pointers andRecursion : believe or not but Nvidia OCL GPU backend supports at least recursion! and function pointers fails only when building (also GPU ocelot cpu backend (PTX->LLVM) supports it right now!)

*Similar to printf expose malloc and free (featured new in CUDA 3.2)

Also seems AMD is working on some C++ support (templatized kernels)

What do you think?

Thank.

Ofer_Rosenberg__Inte · ‎11-25-2010

Hi,
We would like to thank you for your suggestions.
Some of these suggestions have been raised internally as well, and are considered for the next versions of the SDK.

About the specific proposal to allow asm functions accessed directly inside kernels, I do not believe that we will want to go in this direction. Intel's direction is to promotethe cross-device approach of OpenCL, and this proposal goes against it. The preferred direction is improving the compiler, making sure that the mapping to assembler instructions is efficient. I believe that the additions made to the OCL C language improve the complier's ability to reach this goal.
However, the direction of adding new built-in functions, which map well to SSE instructions, is interesting. We do see cases where a code sequence can be efficiently replaced by a call to SSE instruction - and the method that we prefer is to expose it as a built-in function. this preserves the approach of C language, and is also forward compatible - on future ISA, this built-in can be replaced by the JIT complier in a new instruction.

View solution in original post

rtfss1gmail_com · ‎11-23-2010

Another one:

*Allow asm("") function being able to insert x86 assembly code in kernels

CUDA allows asm function containing PTX code inside CUDA device functions..

Ofer_Rosenberg__Inte · ‎11-25-2010

Hi,
We would like to thank you for your suggestions.
Some of these suggestions have been raised internally as well, and are considered for the next versions of the SDK.

About the specific proposal to allow asm functions accessed directly inside kernels, I do not believe that we will want to go in this direction. Intel's direction is to promotethe cross-device approach of OpenCL, and this proposal goes against it. The preferred direction is improving the compiler, making sure that the mapping to assembler instructions is efficient. I believe that the additions made to the OCL C language improve the complier's ability to reach this goal.
However, the direction of adding new built-in functions, which map well to SSE instructions, is interesting. We do see cases where a code sequence can be efficiently replaced by a call to SSE instruction - and the method that we prefer is to expose it as a built-in function. this preserves the approach of C language, and is also forward compatible - on future ISA, this built-in can be replaced by the JIT complier in a new instruction.

rtfss1gmail_com · ‎12-01-2010

Hi,

thanks for your insight.. as you say perhaps asm("") is not a good approach but I think the others are interesting still! Really waiting to see how this excellent SDK evolves!

Thanks.