- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Community Members!
I have a program, which makes some calculations with the help of the opencl kernel, which runs on the cpu. I also make these calculations without opencl and check for errors.
The problem is that I get errors from time to time, when I use online compilation of opencl kernel. I don't get errors, when I use offline compilation.
Could anyone, please, help me and explain, why it happens?
You can find files here:
https://github.com/NataAb/opencl_pr1
ioc64 -cmd=build -input=prog1.cl -device=cpu -ir=prog1.bc
for offline comilation.
Platform: Intel(R) OpenCL Vendor: Intel(R) Corporation Version: OpenCL 1.2 LINUX
CPU: Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz
OS: Ubuntu 12.04.5 LTS
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Natalia,
I tried your program on Windows 8.1 with the latest OpenCL driver and things consistently fail with either off-line or online compilation:
Online compilation:
TEST started
16 started
i is 0 j is 2 x_1 is -0.010763 ar1 1.60594e+032
16 Compare results failed at step = 5, errors = 1
TEST started
16 started
i is 12 j is 7 x_1 is -0.0522771 ar1 -0.042029
16 Compare results failed at step = 956, errors = 1
TEST started
16 started
i is 12 j is 7 x_1 is -0.0513037 ar1 -0.0410118
16 Compare results failed at step = 958, errors = 1
Offline compilation
TEST started
allowed workgroup size is 4096
16 started
i is 12 j is 7 x_1 is -0.0522771 ar1 -0.0421079
16 Compare results failed at step = 956, errors = 1
TEST started
allowed workgroup size is 4096
16 started
i is 12 j is 7 x_1 is -0.0513037 ar1 -0.0410526
16 Compare results failed at step = 958, errors = 1
TEST started
allowed workgroup size is 4096
16 started
i is 0 j is 2 x_1 is -0.000729849 ar1 1.60594e+032
16 Compare results failed at step = 0, errors = 1
So I guess the problem is with the code itself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Robert,
thank you for you quick answer!
As you can see, the test fails on some random step, which means, that the kernel works correct in some cases. May incorrect use of
barrier( CLK_LOCAL_MEM_FENCE| CLK_GLOBAL_MEM_FENCE);
cause such strange behaviour?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Natalia,
By cursory look at your kernel, looks like barriers are in the right places. Unfortunately, with complex code like this the only option is to use printf all over the place and carefully compare results of intermediate calculations with what you are expecting. The errors don't appear quite so random to me, for example they frequently occur at i = 12, j = 7 and i = 0, j = 2. Also, you get two types of dicrepancies: one looks like an overflow, which always occurs at i=0, j=2, the other looks like an accumulation of error issue. Also note, that the issues are at i=0 and i=12, which are boundaries, so I would carefully check the boundaries of your local arrays and see that you are initializing everything properly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Robert, I think, I have found the answer. The error occured due to the following reason: I did not initialized part of local memory with zeroes.
Thank you for your help.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page