I have compiler bugs to report - they are strongly related, and may come back to a single issue in the compiler design:
(1) The compiler emits different kernel code for functions sqrt, cos, sin (which I will now call X) when coded as X(), std::X() than it does for cl::sycl::X() or sycl::X(). This is true even if “using namespace sycl” is used. I believe, given that -fsycl (or dpcpp driver) is used, when compiling a kernel should cause the SYCL standard to be observed.
(2) The Intel GPU driver crashes when it gets a sqrt, cos, or sin in the kernel code, to JIT, when the compiler was fed X() or std::X() instead of sycl::x() or cl::sycl::X(). The HOST driver handles them properly.
(3) When using -ffast-math - the llvm backend works for sqrt but CRASHES (in the compilation pass) for cos and sin.
All the files needed to replicate this are in a ZIP file, attached.
Xinmin is aware of this issue, and may have some commits for beta07 to address it - but wanted me to document this complete report so that all the bugs could be considered.
That said - the SYCL builtins, for FP32 and FP64 need to "just work" for user of DPC++. Ideally - the entire matrix (in the attached ZIP file) of options (sycl:: and std::) would work without the user doing anything other than compiling.
$ uname -a
Linux s001-n176 4.15.18 #1 SMP Fri Oct 18 11:54:23 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux
$ clang++ -v
Intel(R) oneAPI DPC++ Compiler 2021.1-beta06 (2020.4.0.0415)
Thread model: posix
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.4.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
Thanks for providing the documentation for various bugs that you have encountered.
If I were to make guesses for some of the anomalies that you have encountered, it would be as follows:
This is true even if “using namespace sycl” is used.
The scope of using namespace sycl; statement is only valid in the host part of the code. Since the individual compilation units of host and device processes host code and device code separately, the namespace statement remains valid only in the host part of the code. Hence, there could be unresolved dependencies in the device part of the code.
The Intel GPU driver crashes when it gets a sqrt, cos, or sin in the kernel code, to JIT, when the compiler was fed X() or std::X() instead of sycl::x() or cl::sycl::X().
Dpc++ offers support for some functions within the "std" namespace inside the kernel.
As per the documentation, these "std" functions have been tested on GCC* 7.5.0. std:: cos and std::sqrt are derived from GCC's libstdc++ library (tested on GCC 7.5.0). I could see that your GCC version is 7.4.0. Let me confirm with the team if GCC 7.4.0's libraries(libstdc++) are supported inside the sycl kernel.
The HOST driver handles them properly.
As per SYCL specs 1.2.1,
The SYCL host device is a native C++ implementation of a device. It does not have an OpenCL cl_device_id and it will only appear in the available SYCL devices, as it is not an OpenCL device. It has full SYCL capabilities and reports them through the SYCL information retrieval interface. The SYCL host device is mandatory for every SYCL implementation and is always available, but may not achieve the same performance as an OpenCL CPU device. Any C++ application debugger can be used for debugging SYCL kernels executing on a SYCL host device.
When running on SYCL host device, it runs the device code as a native C++ implementation of the device. DPC++'s current implementation of host device uses TBB as its backend.
When using -ffast-math - the llvm backend works for sqrt but CRASHES (in the compilation pass) for cos and sin
This part is extremely intriguing. With -ffast-math flag, the code build/runs successfully for sqrt function inside the kernel(without sycl::sqrt too!). Where as, it fails for sin and cos functions. I'm not really sure as to what -ffast-math flag is doing in the backend. I will check this with the team and get back to you at the earliest.
using namespace does apply within kernels, it is NOT limited to the host code.
The std:: functions you refer to are known to me - I put a table of them together for Chapter 17 of your upcoming book on SYCL and DPC++.
I've outlined a number of issues, and they are quite real.
Let's work to make sure they all get investigated.
Let me know if I can help.
That was just my assumption regarding namespace(I did not find any reference in SYCL specs 1.2.1).
The issues that you have encountered are reproducible at my end too. I was in the process of verifying your claims and hence the delay.
I will escalate these issues to the concerned team.
Thanks for the thorough documentation.
Regarding, using sycl namespace for math functions -
We have had this discussion previously on using sycl namespace before call to math function on device .
1. On GPU, you cannot call any arbitrary function if it wasn't compiled for the device, so, external call to exp is unresolved.
2. The "using namespace" just helps a compiler to find a function, but does not enforce it to do in the exact scope (otherwise "using namespace cl::sycl;" would break usage of std::cout)
It is nicely explained using this example -
https://godbolt.org/z/8ntX-9 . Here a class "Example" was added in a global scope and in a namespace "A". Then it was used in main() function after "using namespace A;". The result is compilation failure because of ambiguities declaration.