Solved: Fat binary with PTX backend

Viet-Duc · ‎06-04-2021

Hi,

To compiler SYCL code for a specific NVIDIA device, I've used to following:

clang++ \
  -fsycl \
  -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \ 
  -fsycl-unnamed-lambda \
  -Xsycl-target-backend "--cuda-gpu-arch=sm_35" \
  test.cpp

Is there a way to generate a fat binary containing several sm_* computes ?

I have gone through the manual with no avail.

Thanks

RahulV_intel · ‎06-08-2021

Hi,

As per the documentation, SM-50 and above architectures are supported.

https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md

Please note that only the Github version of DPC++ supports the CUDA backend. If you have further queries, please raise a new issue in the below link:

https://github.com/intel/llvm/issues

Thanks,

Rahul

View solution in original post

RahulV_intel · ‎06-07-2021

Hi,

Could you try the following command and let us know?

clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice sample.cpp -o sample

IMO, the above command should work on any SM architecture.

Thanks,

Rahul

Viet-Duc · ‎06-07-2021

Thanks for suggestions.

But if you remove '-Xsycl-target-backend', the program will fail at runtime, for instance on Tesla K40

PI CUDA ERROR:
        Value:           209
        Name:            CUDA_ERROR_NO_BINARY_FOR_GPU
        Description:     no kernel image is available for execution on the device
        Function:        build_program
        Source Location: .../apps/src/llvm/unstable/sycl/plugins/cuda/pi_cuda.cpp:516


PI CUDA ERROR:
        Value:           400
        Name:            CUDA_ERROR_INVALID_HANDLE
        Description:     invalid resource handle
        Function:        cuda_piProgramRelease
        Source Location: .../apps/src/llvm/unstable/sycl/plugins/cuda/pi_cuda.cpp:2938

The program was built for 1 devices
Build program log for 'Tesla K40m':
 -999 (Unknown OpenCL error code)

If I force sm35, it will not work on other gpus such as V100.

Since the manual is quite terse, I have asked the question in case I overlooked something.

RahulV_intel · ‎06-08-2021

Hi,

As per the documentation, SM-50 and above architectures are supported.

https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md

Please note that only the Github version of DPC++ supports the CUDA backend. If you have further queries, please raise a new issue in the below link:

https://github.com/intel/llvm/issues

Thanks,

Rahul

Viet-Duc · ‎06-09-2021

Hi,

You are right.

For CUDA-related question, I should have asked the intel-llvm developers instead.

I will raise the issue through github page. Nevertheless, thanks for your time.

Regards.

RahulV_intel · ‎06-14-2021

Intel will no longer monitor this thread. Further discussions on this thread will be considered community only.