- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Altera Forum Geniuses ~!~!
I have a Question! Today I made Matrix Multiplication kernel code. (.cl) it are 3 codes below~ no _simd __kernel void simpleMultiply( __global float* A, __global float* B, __global float* C, int wA, int wB) { int row = get_global_id(0); int col = get_global_id(1); float sum = 0.0f; for (int i = 0; i < wA; i++) { sum += A[row*wA + i] * B[i*wB + col]; } C[row*wB + col] = sum; } simd __attribute__((num_simd_work_items(2))) __attribute__((reqd_work_group_size(2,1,1))) __kernel void simpleMultiply( __global float* A, __global float* B, __global float* C, int wA, int wB) { int row = get_global_id(0); int col = get_global_id(1); float sum = 0.0f; for (int i = 0; i < wA; i++) { sum += A[row*wA + i] * B[i*wB + col]; } C[row*wB + col] = sum; } add compute units 2 __attribute__((num_compute_units(2))) __attribute__((num_simd_work_items(2))) __attribute__((reqd_work_group_size(2,1,1))) __kernel void simpleMultiply( __global float* A, __global float* B, __global float* C, int wA, int wB) { int row = get_global_id(0); int col = get_global_id(1); float sum = 0.0f; for (int i = 0; i < wA; i++) { sum += A[row*wA + i] * B[i*wB + col]; } C[row*wB + col] = sum; } I try to Mat_mult using this codes. And I found strange thing. it is that 3 codes has same operating time. Mat_A = (576x26) Mat_B = (26 x 6) Mat C = Mat_A x Mat_B this operation need 1440~1550us. why operation time is not change? and..... Someone told me that "if you use only global memory, memory access time overhead is big. so if you want to better performance, use local memory." But I use only global_memory.(because... i didn't understand how to use local memory in 2D multiplication....) Nevertheless, My SoC board show the better performance than use only CPU which is DE1-SoC's ARM. Now i have question. First! Why 3 codes performance time are same? SIMD code is operation? and Two Computing units are operation? if it is not working, how can i operate? Second! How can i take better performance? Third! Could you recommand 'Document that is better than OpenCL best practice guide' to me? I look forward to see your answer. Thank you so much! Have a nice day~!~!Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page