- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have following custom kernel function.
kernel void fooBar
(
const __global float* input,
int num_dims_in,
const __global int* dims_in,
const __global int* stride_in,
float offset_x,
__global float* output,
int num_dims_out,
const __global int* dims_out,
const __global int* stride_out
)
{
int w = dims_in[0];
int h = dims_in[1];
int x = get_global_id(0);
int y = get_global_id(1);
int z = get_global_id(2);
int index = x + y * w + z * w * h;
output[index] = (input[index] > 0) ? input[index] : input[index] * offset_x ; //Here
return ;
}
This program is working fine with optimized FP32 model format.
But when I am converting the model into FP16 floating point format (using MO) I am getting all my final layer output as nan values.
I am thinking the error is coming from my kernel function given above.
If I am writing
output[index] = (input[index] > 0) ? input[index] : input[index] * 0;
instead of
output[index] = (input[index] > 0) ? input[index] : input[index] * offset_x ;
I am getting non nan values.
Can somebody help me with this ?
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I was able to fix this by using vstore_half() and vload_half(). Also by using data type half instead of float.
Earlier multiplication by offset_x was converting the data into 32 bit floating point format.
Regards,
Deepak Chembakassery Rajendran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have modified my function as follows:
kernel void foobar (
const __global half* input,
int num_dims_in,
const __global int* dims_in
const __global int* stride_in,
float offset_x,
__global half* output,
int num_dims_out,
const __global int* dims_out,
const __global int* stride_out
)
{
int w = dims_in[0];
int h = dims_in[1];
int d = dims_in[2];
int x = get_global_id(0);
int y = get_global_id(1);
int z = get_global_id(2);
int index = x + y * w + z * w * h;
vstore_half((( vload_half(index, input) > 0) ? vload_half(index, input): vload_half(index, input) * offset_x), index, output);
return ;
}
This is working fine, I am getting proper output.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page