- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The results for a fixed set of input data vary depending on whether you run the inference on the CPU or GPU. I have observed this on a range of different networks but as an example, I created a very simple network with 3 layers (convolution, pooling and deconvolution). I initialising the weights to gaussians for conv and deconv and then converted it to Intel IR (from Caffe).
I added some basic diagnostics to the segmentation sample to print out the min/max of each output channel. Below is the output from running the sample with -d "GPU" and -d "CPU" (all other parameters remain the same): As you can see the channel min/max are quite different - visually the output of the networks is far from similar.
From some other tests with other networks it appears as though the CPU version is closer to the truth than the the GPU version. I also noted that if I remove the deconvolution and maxpool layers, leaving just a convolution layer then the results for the channel min/max are identical. Perhaps there is some issue with the GPU accelerated deconvolution?
See output below - Intel IR network attached.
.\segmentation_sample.exe -i .\img.png -m convolution.xml -d "GPU" InferenceEngine: API version ............ 1.0 Build .................. 5852 [ INFO ] Parsing input parameters [ INFO ] No extensions provided [ INFO ] Loading plugin API version ............ 0.1 Build .................. prod-02709 Description ....... clDNNPlugin [ INFO ] Loading network files [ INFO ] Preparing input blobs [ INFO ] Batch size is 1 input item.first = input input channels = 3 width: 100 height: 100 [ INFO ] Preparing output blobs output item.first = deconv [ INFO ] Loading model to the plugin [ INFO ] Start inference (1 iterations) Average running time of one iteration: 1.17634 ms [ INFO ] Processing output blobs Output: W:101 H:101 C:3 N:1 channel: 0 min: -0.152521 max: 0.245878 channel: 1 min: -0.193329 max: 0.589803 channel: 2 min: -0.126119 max: 0.349479 [ INFO ] Output file : out_dat.bmp was created [ INFO ] Execution successfull
.\segmentation_sample.exe -i .\img.png -m convolution.xml -d "CPU" InferenceEngine: API version ............ 1.0 Build .................. 5852 [ INFO ] Parsing input parameters [ INFO ] No extensions provided [ INFO ] Loading plugin API version ............ 1.0 Build .................. win_2018.0.20170425 Description ....... MKLDnnPlugin [ INFO ] Loading network files [ INFO ] Preparing input blobs [ INFO ] Batch size is 1 input item.first = input input channels = 3 width: 100 height: 100 [ INFO ] Preparing output blobs output item.first = deconv [ INFO ] Loading model to the plugin [ INFO ] Start inference (1 iterations) Average running time of one iteration: 0.384 ms [ INFO ] Processing output blobs Output: W:101 H:101 C:3 N:1 channel: 0 min: -0.267345 max: 0.402487 channel: 1 min: -0.121432 max: 0.610033 channel: 2 min: -0.263773 max: 0.426556 [ INFO ] Output file : out_dat.bmp was created [ INFO ] Execution successfull
I am running the following OpenCL devices/platforms:
PS D:\Google-Drive\clients\splitmedialabs\code\opencl-info\x64\Release> .\opencl-info.exe 1. Device: Ellesmere 1.1 Hardware version: OpenCL 2.0 AMD-APP (2442.9) 1.2 Software version: 2442.9 1.3 OpenCL C version: OpenCL C 2.0 1.4 Parallel compute units: 36 2. Device: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz 2.1 Hardware version: OpenCL 1.2 AMD-APP (2442.9) 2.2 Software version: 2442.9 (sse2,avx) 2.3 OpenCL C version: OpenCL C 1.2 2.4 Parallel compute units: 4 1. Device: Intel(R) HD Graphics 630 1.1 Hardware version: OpenCL 2.1 1.2 Software version: 23.20.16.4901 1.3 OpenCL C version: OpenCL C 2.0 1.4 Parallel compute units: 24 2. Device: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz 2.1 Hardware version: OpenCL 2.1 (Build 2) 2.2 Software version: 7.5.0.2 2.3 OpenCL C version: OpenCL C 2.0 2.4 Parallel compute units: 4 1. Platform 1.1 Name : AMD Accelerated Parallel Processing 1.2 Vendor : Advanced Micro Devices, Inc. 1.3 Version : OpenCL 2.0 AMD-APP (2442.9) 1.4 Profile : FULL_PROFILE 1.5 Extensions : cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices 2. Platform 2.1 Name : Intel(R) OpenCL 2.2 Vendor : Intel(R) Corporation 2.3 Version : OpenCL 2.1 2.4 Profile : FULL_PROFILE 2.5 Extensions : cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_fp64 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Any help appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Julien,
Could you please also share the modified sample?
Best wishes,
Anna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Anna,
Were you able to replicate the issue?
Thanks
Julien.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Julien,
I was able to reproduce this issue with CV SDK R3. I have good news for you - the bug was already fixed, the fix will be included in the next release.
Best wishes,
Anna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Anna,
Thanks - that is good news. When is the next release due?
Is there a way to get access to the fix now? i.e. by using master branch of CLDNN github project and recompiling?
--
Julien.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page