- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
There are two questions: First : In OpenCL standard it provides the cl-fast-relaxed-math to speed up and could lack of accuracy. I test the OpenCL code with this flag on INTEL,NIVIDA and AMD platforms. It could gain a speedup ~1x. But I use the AOCL compiler to add cl-fast-relaxed-math while compiling the OpenCL kernel Code. It seems that it could not gain any performance. Is the AOCL library doesn't support this flag now ? Second : I write a OpenCL program and the program might execute EnqueueNDRange API many time(use the for loop to enqueue repeatedly). The host only executes API and READ/WRITE buffer. Although from host executes EnqueueNDRange and READ/WRITE buffer to the FPGA receive the API signal to execute kernel code will waste 10~100ms overhead. Because there is no profiling tool to profile the detail situation. Therefore could any one help this problem ? SDK : 14.1 platform : DE5 ThanksLink Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page