Community
cancel
Showing results for 
Search instead for 
Did you mean: 
172 Views

dpct crashing while converting cuda code

Hi,

I am trying to convert cuda code for convnet (https://github.com/TorontoDeepLearning/convnet) and dpct is crashing while converting cudamat_conv_imgacts.cu & cudamat_conv_weightacts.cu and gives SIGSERV for cudamat.cu

Crashdump for cudamat_conv_imgacts.cu:

Stack dump:
0.      Program arguments: dpct --cuda-include-path=C:\nvcc\include --out-root=dpct cudamat_conv_imgacts.cu
0x00007FFCED44A799 (0x0000002EC318F830 0x0000006300000000 0x0000000000000000 0x0000013E12030CC0), RaiseException() + 0x69 bytes(s)
0x00007FFCDBBA485D (0x00007FFCCBB40000 0x0000013E1723FF30 0x0000013E1224BE20 0x0000013E00000000), _CxxThrowException() + 0xAD bytes(s)
0x00007FFCCBB764D2 (0x0000000000000013 0x00007FFCED062596 0x0000000000000002 0x0000000000000020), ?_Xout_of_range@std@@YAXPEBD@Z() + 0x22 bytes(s)
0x00007FF6E66D8CF0 (0x0000013E1723FEF0 0x00007FF6E75664E3 0x0000000000000020 0x00007FF6E794CEB8)
0x00007FF6E66E884E (0x0000002EC318F960 0x0000013E173317B0 0x0000000000000000 0x00007FFCED062596)
0x00007FF6E682886E (0x0000013E173317B0 0x0000013E173317E0 0x0000002EC318F960 0x0000013E1540D8C0)
0x00007FF6E6705CAD (0x0000013E1747CB20 0x0000013E1747C160 0x0000013E15BDDE30 0x0000013E1747C160)
0x00007FF6E6702F2F (0x0000013E15BDDE58 0x00007FF6E7610E10 0x00008C027062629E 0x0000013E175B48E0)
0x00007FF6E670308A (0x0000013E17390BF0 0x0000013E13E4B270 0x0000013E13E4B270 0x0000013E15BDDDE0)
0x00007FF6E66F91F9 (0x0000013E15BDDDE0 0x0000013E17485720 0x0000013E12139BC0 0x00007FF6E794B8F0)
0x00007FF6E66FC61D (0x00007FF6E794B8F0 0x0000013E12243E40 0x0000002EC318FC80 0x00007FF6E794CC40)
0x00007FF6E66DED2E (0x0000000000000000 0x0000000000000000 0x00007FFCED13B590 0x0000000000000000)
0x00007FF6E7566440 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000)
0x00007FFCEE047BD4 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), BaseThreadInitThunk() + 0x14 bytes(s)
0x00007FFCEF6ACE51 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x21 bytes(s)

Crashdump for cudamat_conv_weightacts.cu:

Stack dump:
0.      Program arguments: dpct --cuda-include-path=C:\nvcc\include --out-root=dpct cudamat_conv_weightacts.cu
0x00007FFCED44A799 (0x00000071B038F160 0x0000001800000000 0x0000000000000000 0x000001C285D30CC0), RaiseException() + 0x69 bytes(s)
0x00007FFCDBBA485D (0x00007FFCCBB40000 0x000001C28B215FF0 0x000001C287C787B0 0x000001C200000000), _CxxThrowException() + 0xAD bytes(s)
0x00007FFCCBB764D2 (0x0000000000000015 0x00007FFCED062596 0x0000000000000002 0x0000000000000020), ?_Xout_of_range@std@@YAXPEBD@Z() + 0x22 bytes(s)
0x00007FF6E66D8CF0 (0x000001C28B215D40 0x00007FF6E75664E3 0x0000000000000020 0x00007FF6E794CEB8)
0x00007FF6E66E884E (0x00000071B038F290 0x000001C28B16F490 0x0000000000000000 0x00007FFCED062596)
0x00007FF6E682886E (0x000001C28B16F490 0x000001C28B16F4C0 0x00000071B038F290 0x000001C28912FDF0)
0x00007FF6E6705CAD (0x000001C28AE1A7B0 0x000001C28AE1A370 0x000001C289B5D2C0 0x000001C28AE1A370)
0x00007FF6E6702F2F (0x000001C289B5D2E8 0x00007FF6E7610E10 0x00002AD1096D18BC 0x000001C28B1CE460)
0x00007FF6E670308A (0x000001C28B130EB0 0x000001C289CE5140 0x000001C289CE5140 0x000001C289B5D270)
0x00007FF6E66F91F9 (0x000001C289B5D270 0x000001C28B1B3970 0x000001C285F891A0 0x00007FF6E794B8F0)
0x00007FF6E66FC61D (0x00007FF6E794B8F0 0x000001C287C6ED40 0x00000071B038F5B0 0x00007FF6E794CC40)
0x00007FF6E66DED2E (0x0000000000000000 0x0000000000000000 0x00007FFCED13B590 0x0000000000000000)
0x00007FF6E7566440 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000)
0x00007FFCEE047BD4 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), BaseThreadInitThunk() + 0x14 bytes(s)
0x00007FFCEF6ACE51 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x21 bytes(s)

For cudamat.cu: 

Meet signal:SIGSEGV
Intel(R) DPC++ Compatibility Tool tries to give analysis reports and terminates...

dpct version: 

Intel(R) DPC++ Compatibility Tool Version: 2021.1-beta04 codebase:(401d3826c9c3a0224459b0b189bd9bb9a0421912)

System:

Windows 10 Pro (64-bit)

Intel i5-6200U CPU @2.30GHz 2.40GHz

When I try to upload relevant files, I'm getting error so here are relavant URLs of files:

https://github.com/TorontoDeepLearning/convnet/blob/master/cudamat/cudamat.cu

https://github.com/TorontoDeepLearning/convnet/blob/master/cudamat/cudamat_conv_imgacts.cu

https://github.com/TorontoDeepLearning/convnet/blob/master/cudamat/cudamat_conv_weightacts.cu

I am able to convert few other cuda files successfully but facing problem for these 3 files.

 

Regards,

Gagan

Tags (1)
0 Kudos
5 Replies
AbhishekD_Intel
Moderator
172 Views

Hi Gagandeep,

Please try out doing migration of application with the Intel OneAPI Beta-06 toolkit. Lots of bugs have been fixed in this release.

You can find the link below to download the toolkit:

https://software.intel.com/content/www/us/en/develop/tools/oneapi.html

 

Warm Regards,

Abhishek

AbhishekD_Intel
Moderator
172 Views

Hi Gagandeep,

Will you please give us an update on your problem?

 

 

172 Views

After updating compiler, I am able to build most of my files but cudamat.cu conversion still fails with the error:

Meet signal:SIGSEGV
Intel(R) DPC++ Compatibility Tool tries to give analysis reports and terminates...

Complete output is:

C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:40:27: warning: DPCT1027:0: The call to cublasGetError was replaced with 0, because this call is redundant in DPC++.
    cublasStatus status = cublasGetError();
                          ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:44:23: warning: DPCT1010:1: SYCL uses exceptions to report errors and does not use the error codes. The call was replaced with 0. You need to rewrite this code.
    cudaError_t err = cudaGetLastError();
                      ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:47:24: warning: DPCT1009:2: SYCL uses exceptions to report errors and does not use the error codes. The original code was commented out and a warning string was inserted. You need to rewrite this code.
        printf("%s\n", cudaGetErrorString( err));
                       ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:52:23: warning: DPCT1010:3: SYCL uses exceptions to report errors and does not use the error codes. The call was replaced with 0. You need to rewrite this code.
    cudaError_t err = cudaGetLastError();
                      ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:54:12: warning: DPCT1009:4: SYCL uses exceptions to report errors and does not use the error codes. The original code was commented out and a warning string was inserted. You need to rewrite this code.
    return cudaGetErrorString( err);
           ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:58:5: warning: DPCT1026:5: The call to cublasInit was removed, because this call is redundant in DPC++.
    cublasInit();
    ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:66:5: warning: DPCT1026:6: The call to cublasShutdown was removed, because this call is redundant in DPC++.
    cublasShutdown();
    ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:74:21: warning: DPCT1012:7: Detected kernel execution time measurement pattern and generated an initial code for time measurements in SYCL. You can change the way time is measured depending on your goals.
  cudaError_t err = cudaEventRecord(*t, 0);
                    ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:74:21: warning: DPCT1024:8: The original code returned the error code that was further consumed by the program logic. This original code was replaced with 0. You may need to rewrite the program logic consuming the error code.
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:76:20: warning: DPCT1009:9: SYCL uses exceptions to report errors and does not use the error codes. The original code was commented out and a warning string was inserted. You need to rewrite this code.
    printf("%s\n", cudaGetErrorString( err));
                   ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:83:21: warning: DPCT1003:10: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
  cudaError_t err = cudaStreamWaitEvent(NULL, *t, 0);
                    ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:85:20: warning: DPCT1009:11: SYCL uses exceptions to report errors and does not use the error codes. The original code was commented out and a warning string was inserted. You need to rewrite this code.
    printf("%s\n", cudaGetErrorString( err));
                   ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:92:21: warning: DPCT1027:12: The call to cudaEventCreate was replaced with 0, because this call is redundant in DPC++.
  cudaError_t err = cudaEventCreate(t);
                    ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:94:20: warning: DPCT1009:13: SYCL uses exceptions to report errors and does not use the error codes. The original code was commented out and a warning string was inserted. You need to rewrite this code.
    printf("%s\n", cudaGetErrorString( err));
                   ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:113:10: warning: DPCT1005:14: The device version is different. You need to rewrite this code.
  return prop.major >= 2;
         ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:121:3: warning: DPCT1031:15: DPC++ currently does not support memory access across peer devices. The output parameter(s) are set to 0.
  cudaDeviceCanAccessPeer(&access2from1, gpu1, gpu2);
  ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:122:3: warning: DPCT1031:16: DPC++ currently does not support memory access across peer devices. The output parameter(s) are set to 0.
  cudaDeviceCanAccessPeer(&access1from2, gpu2, gpu1);
  ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:132:5: warning: DPCT1026:17: The call to cudaDeviceEnablePeerAccess was removed, because DPC++ currently does not support memory access across peer devices.
    cudaDeviceEnablePeerAccess(gpu2, 0); //second argument is flags
    ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:134:5: warning: DPCT1026:18: The call to cudaDeviceEnablePeerAccess was removed, because DPC++ currently does not support memory access across peer devices.
    cudaDeviceEnablePeerAccess(gpu1, 0); //second argument is flags
    ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:219:12: warning: DPCT1003:19: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
    stat = cublasAlloc(len, sizeof(mat->data_device[0]), (void**)&mat->data_device);
           ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:236:12: warning: DPCT1003:20: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
    stat = cublasAlloc(size, sizeof(int), (void**)&mat->data_device.seg);
           ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:241:12: warning: DPCT1003:21: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
    stat = cublasAlloc(numboxes, sizeof(int), (void**)&mat->data_device.labels);
           ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:246:12: warning: DPCT1003:22: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
    stat = cublasAlloc(4 * numboxes, sizeof(int), (void**)&mat->data_device.boxes);
           ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:261:12: warning: DPCT1003:23: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
    stat = cublasAlloc(nnz, sizeof(mat->data_device.data[0]), (void**)&mat->data_device.data);
           ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:267:12: warning: DPCT1003:24: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
    stat = cublasAlloc(nnz, sizeof(mat->data_device.indices[0]), (void**)&mat->data_device.indices);
           ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:273:12: warning: DPCT1003:25: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
    stat = cublasAlloc(rows + 1, sizeof(mat->data_device.indptr[0]), (void**)&mat->data_device.indptr);
           ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:451:5: warning: DPCT1007:26: Migration of this CUDA API is not supported by the Intel(R) DPC++ Compatibility Tool.
    cudaMemcpyPeerAsync(dst->data_device, dst_dev, src->data_device, src_dev, len * sizeof(float));
    ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:543:16: warning: DPCT1003:27: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
        stat = cublasFree(mat->data_device);
               ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:557:16: warning: DPCT1003:28: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
        stat = cublasFree(mat->data_device.seg);
               ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:558:16: warning: DPCT1003:29: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
        stat = cublasFree(mat->data_device.labels);
               ^
C:\Users\intel\Downloads\cuda code\convnet-master\cudamat\cudamat.cu:559:16: warning: DPCT1003:30: Migrated API does not return error code. (*, 0) is inserted. You may need to rewrite this code.
        stat = cublasFree(mat->data_device.boxes);
               ^

Meet signal:SIGSEGV
Intel(R) DPC++ Compatibility Tool tries to give analysis reports and terminates...

Regards,
Gagan

 

AbhishekD_Intel
Moderator
172 Views

Hi,

We are also facing this error at our end, so we are raising this issue with concerned team.

 

Warm Regards,

Abhishek

Sravani_K_Intel
Employee
172 Views

Hi Gagandeep,

The issue has been root caused and fixed by Engineering and should be available in the next release of the tool. 

Reply