- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have following custom kernel function.
kernel void fooBar ( const __global float* input, int num_dims_in, const __global int* dims_in, const __global int* stride_in, float offset_x, __global float* output, int num_dims_out, const __global int* dims_out, const __global int* stride_out ) { int w = dims_in[0]; int h = dims_in[1]; int x = get_global_id(0); int y = get_global_id(1); int z = get_global_id(2); int index = x + y * w + z * w * h; output[index] = (input[index] > 0) ? input[index] : input[index] * offset_x ; //Here return ; }
This program is working fine with optimized FP32 model format.
But when I am converting the model into FP16 floating point format (using MO) I am getting all my final layer output as nan values.
I am thinking the error is coming from my kernel function given above.
If I am writing
output[index] = (input[index] > 0) ? input[index] : input[index] * 0;
instead of
output[index] = (input[index] > 0) ? input[index] : input[index] * offset_x ;
I am getting non nan values.
Can somebody help me with this ?
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I was able to fix this by using vstore_half() and vload_half(). Also by using data type half instead of float.
Earlier multiplication by offset_x was converting the data into 32 bit floating point format.
Regards,
Deepak Chembakassery Rajendran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have modified my function as follows:
kernel void foobar ( const __global half* input, int num_dims_in, const __global int* dims_in const __global int* stride_in, float offset_x, __global half* output, int num_dims_out, const __global int* dims_out, const __global int* stride_out ) { int w = dims_in[0]; int h = dims_in[1]; int d = dims_in[2]; int x = get_global_id(0); int y = get_global_id(1); int z = get_global_id(2); int index = x + y * w + z * w * h; vstore_half((( vload_half(index, input) > 0) ? vload_half(index, input): vload_half(index, input) * offset_x), index, output); return ; }
This is working fine, I am getting proper output.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page