OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

Same OpenCL kernel gets different results on CPU and IGP

chunhong_zhang08
Beginner
228 Views
Hi,
I wrote a very simple code for testing share resouces. My envorinment: i7-3770k win7 64-bit Intel OpenCL SDK 2013. The attached is my project. When OpenCL kernel ran on CPU device, the results are right. However, when it ran  on GPU device, the results are wrong. Anyone can help me explain this problem?

Thanks in advance.

By the way, when I wanted to debug the OpenCL source code, the debugger doesn't work. However, on the other PC it can work. I don't know why. Can Windows update  cause this issue. I also noticed that some one  met the same problem as well. http://redfort-software.intel.com/en-us/forums/showthread.php?t=101929

Sorry. I can not attach my project. So I posted the source code directly

Main.cpp

#define __CL_ENABLE_EXCEPTIONS

#define CL_USE_DEPRECATED_OPENCL_1_1_APIS

#include

#include

#include

#define DATASIZE 512

#define WORKSIZE 32

#define LOCALSIZE 16

//#define CPU

using namespace std;

int main()

{

vector<:PLATFORM> platforms;

vector<:DEVICE> cpuDevices, gpuDevices, allDevices;

cl_uint minAlign ;

try {

cl::Platform::get(&platforms);

cout << "Platform number: " << platforms.size() << endl;

cout << "Platform name: " << platforms[0].getInfo()<< endl << endl;

platforms[0].getDevices(CL_DEVICE_TYPE_CPU, &cpuDevices);

cout << "CPU device number: " << cpuDevices.size() << endl;

cout << "Device CPU name: " << cpuDevices[0].getInfo() << endl;

cout << "Compute Units: " << cpuDevices[0].getInfo() << endl;

cout << "Preferred Float Vector Width: " << cpuDevices[0].getInfo() << endl << endl;

platforms[0].getDevices(CL_DEVICE_TYPE_GPU, &gpuDevices);

cout << "GPU device number: " << gpuDevices.size() << endl;

cout << "Device GPU name: " << gpuDevices[0].getInfo() << endl;

cout << "Compute Units: " << gpuDevices[0].getInfo() << endl;

cout << "Preferred Float Vector Width: " << gpuDevices[0].getInfo() << endl << endl;

platforms[0].getDevices(CL_DEVICE_TYPE_ALL, &allDevices);

size_t time_resolution = cpuDevices[0].getInfo();

cout << "cpu device profiling resolution: " << time_resolution << endl;

time_resolution = gpuDevices[0].getInfo();

cout << "gpu device profiling resolution: " << time_resolution << endl;

minAlign = cpuDevices[0].getInfo();

cout << "CPU device memory align: " << minAlign << endl;

minAlign = gpuDevices[0].getInfo();

cout << "GPU device memory align: " << minAlign << endl;

cl_float* g_pfInput = (cl_float*) _aligned_malloc(DATASIZE * sizeof(cl_float), minAlign);

cl_float* g_pfOutput = (cl_float*) _aligned_malloc(DATASIZE * sizeof(cl_float), minAlign);

 

for(int i = 0; i < DATASIZE; i++)

{

g_pfInput = i;

g_pfOutput = -1;

}

#ifdef CPU

cl::Context context(cpuDevices);

#else

cl::Context context(gpuDevices);

#endif

std::ifstream programFile("oclWriteBuffer.cl");

std::string programString(std::istreambuf_iterator(programFile), (std::istreambuf_iterator()));

cl::Program::Sources source(1, std::make_pair(programString.c_str(), programString.length()+ 1));

cl::Program program(context, source);

try {

#ifdef CPU

program.build(cpuDevices,"-g -s \\"D:\\\\Nick\\OpenCL\\\\Tutorial1\\\\oclShareMemoryTest\\\\oclShareMemoryTest\\\\oclWriteBuffer.cl\\"");

#else

program.build(gpuDevices,"-g -s \\"D:\\\\Nick\\OpenCL\\\\Tutorial1\\\\oclShareMemoryTest\\\\oclShareMemoryTest\\\\oclWriteBuffer.cl\\"");

#endif

}

catch(cl::Error e)

{

cout << "WriteBuffer: Build Error" << endl;

cout << e.what() << ": Error code " << e.err() << endl << endl;

string log;

#ifdef CPU

log = program.getBuildInfo(cpuDevices[0]);

#else

log = program.getBuildInfo(gpuDevices[0]);

#endif

cout << log << endl << endl;

}

cl::Kernel writeKernel(program, "WriteBuffer");

cl::Buffer inputBuffer(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, DATASIZE, g_pfInput);

cl::Buffer outputBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, DATASIZE, g_pfOutput);

 

writeKernel.setArg(0, inputBuffer);

writeKernel.setArg(1, outputBuffer);

writeKernel.setArg(2, DATASIZE);

#ifdef CPU

cl::CommandQueue cmdQueue(context, cpuDevices[0]);

#else

cl::CommandQueue cmdQueue(context, gpuDevices[0]);

#endif

cmdQueue.enqueueNDRangeKernel(writeKernel, 0, WORKSIZE, LOCALSIZE);

cmdQueue.finish();

}

catch (cl::Error e) {

cout << e.what() << ": Error code " << e.err() << endl;

}

return 0;

}


 

 


oclWriteBuffer.cl file

// TODO: Add OpenCL kernel code here.

__kernel

void WriteBuffer(__global float* pfInput, __global float* pfOutput, int nLength)

{

size_t nOffset = get_global_size(0);

size_t nGID = get_global_id(0);

for (int i = 0; i < nLength / nOffset; i++)

{

pfOutput[i * nOffset + nGID] = pfInput[i * nOffset + nGID];

}

}

0 Kudos
3 Replies
Raghupathi_M_Intel
228 Views

Hi,

Let me take a look and get back to you.

Thanks,
Raghu

Raghupathi_M_Intel
228 Views
I took a look at your code and built a 64-bit version and ran on both CPU and GPU. I cannot see any difference in the output. Moreover, you are executing the kernel but never reading the output back. So I really dont know what problem you are facing. Can you give more details?

Thanks,
Raghu
chunhong_zhang08
Beginner
228 Views
I just used the debug mode to watch the memory block for the g_pfOutput. If the kernel ran on the CPU deivce, the results were right. However, if it ran on the GPU device, the results were wrong. I think I don't need to read the output back and I just can use the results directly since g_pfOutput is shared resource. Is my understanding right?
Reply