- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A question from newbie.
I am trying to use a reduction algorithm from NVidia SDK. It works correctly on Nvidia Discrete GPU, Intel HD Graphics 4400, but don't work on Intel CPU (Haswell i5).
Reduction method source:
inline void WarpReductionToFirstElement( __local float *partialDotProduct) { #define WARP_SIZE 32 // Thread local ID within a warp uint id = get_local_id(0) & (WARP_SIZE - 1); // Each warp reduces 64 (default) consecutive elements float warpResult = 0.0f; if (get_local_id(0) < get_local_size(0)/2 ) { volatile __local float* p = partialDotProduct + 2 * get_local_id(0) - id; p[0] += p[32]; p[0] += p[16]; p[0] += p[8]; p[0] += p[4]; p[0] += p[2]; p[0] += p[1]; warpResult = p[0]; } // Synchronize to make sure each warp is done reading // partialDotProduct before it is overwritten in the next step barrier(CLK_LOCAL_MEM_FENCE); // The first thread of each warp stores the result of the reduction // at the beginning of partialDotProduct if (id == 0) partialDotProduct[get_local_id(0) / WARP_SIZE] = warpResult; // Synchronize to make sure each warp is done writing to // partialDotProduct before it is read in the next step barrier(CLK_LOCAL_MEM_FENCE); // Number of remaining elements after the first reduction uint size = get_local_size(0) / (2 * WARP_SIZE); // get_local_size(0) is less or equal to 512 on NVIDIA GPUs, so // only a single warp is needed for the following last reduction // step if (get_local_id(0) < size / 2) { volatile __local float* p = partialDotProduct + get_local_id(0); if (size >= 8) p[0] += p[4]; if (size >= 4) p[0] += p[2]; if (size >= 2) p[0] += p[1]; } }
Kernel source:
__kernel void TestReductionKernel( __global float * gdata, __local float * ldata) { ldata[get_local_id(0)] = gdata[get_local_id(0)]; barrier(CLK_LOCAL_MEM_FENCE); WarpReductionToFirstElement(ldata); barrier(CLK_LOCAL_MEM_FENCE); if(get_local_id(0) == 0) { gdata[0] = ldata[0]; } barrier(CLK_LOCAL_MEM_FENCE); }
Unit test (C# + OpenCL.NET):
[TestMethod] public void IntelCPUWarpReductionTest() { var deviceChooser = new IntelCPUDeviceChooser(); TestWarpReduction(deviceChooser); } private static void TestWarpReduction( IDeviceChooser deviceChooser) { if (deviceChooser == null) { throw new ArgumentNullException("deviceChooser"); } var seed = DateTime.Now.Millisecond; var randomizer = new DefaultRandomizer(ref seed); for (var size = 2; size < 64; size++) { ConsoleAmbientContext.Console.WriteLine("size = {0}", size); using (var clProvider = new CLProvider(deviceChooser, true)) { var m = clProvider.CreateFloatMem( size, MemFlags.CopyHostPtr | MemFlags.ReadWrite); for (var cc = 0; cc < m.Array.Length; cc++) { m.Array[cc] = randomizer.Next(256)/256f; } m.Write(BlockModeEnum.Blocking); var cpuSum = m.Array.Sum(); var k = @" __kernel void TestReductionKernel( __global float * gdata, __local float * ldata) { ldata[get_local_id(0)] = gdata[get_local_id(0)]; barrier(CLK_LOCAL_MEM_FENCE); WarpReductionToFirstElement(ldata); barrier(CLK_LOCAL_MEM_FENCE); if(get_local_id(0) == 0) { gdata[0] = ldata[0]; } barrier(CLK_LOCAL_MEM_FENCE); } "; var kernel = clProvider.CreateKernel(k, "TestReductionKernel"); kernel .SetKernelArgMem(0, m) .SetKernelArgLocalMem(1, size) .EnqueueNDRangeKernel( new[] {size}, new int[] {size}); m.Read(BlockModeEnum.Blocking); var gpuSum = m.Array[0]; Assert.AreEqual(cpuSum, gpuSum); } } }
So, unit test reach full success on Nvidia GPU and Intel HD Graphics but fails on Intel CPU with size = 4.
Log:
size = 2
Choosed vendor: INTEL(R) CORPORATION
size = 3
Choosed vendor: INTEL(R) CORPORATION
size = 4
Choosed vendor: INTEL(R) CORPORATIONОшибка в Assert.AreEqual. Ожидается: <2,1875>. Фактически: <1,695313>.
в MyNN.Tests.MLP2.Reduction.WarpReductionFixture.TestWarpReduction(IDeviceChooser deviceChooser) в WarpReductionFixture.cs: line 113
в MyNN.Tests.MLP2.Reduction.WarpReductionFixture.IntelCPUWarpReductionTest() в WarpReductionFixture.cs: line 38
The difference in behavior between Intel CPU and Intel GPU on SAME machine looks strange for me.
What is the reason to this problem? How can I fix the problem?
ps. if it needs I can attach a VS 2013 sample project.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you post your code using the <code> tags? Also what is the graphics driver version? If you can attach the whole project that'll be even better.
Thanks,
Raghu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Raghu.
Driver version is 3.0.1.10878. Today I've updated video driver and Intel OpenCL SDK, but nothing changes. VS
2012 Solution and screenshot with tests results has attached. File WarpReduction.cl contains NVidia Reduction algorithm.
File WarpReductionFixture.cs contains unit tests and test kernel.
My laptop is Lenovo u430p with Intel Haswell i5 + Nvidia GeForce 730m.
Thanks!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page