Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
14961 Discussions

Profiling autorun kernel without --profile

Altera_Forum
Honored Contributor I
816 Views

Hi, I have a multi-kernel design that goes  

 

Reader -> (autorun) CU_0 -> (autorun) CU_1 -> (autorun) CU_2 -> Writer (where CU_0, CU_1 and CU_2 are the same). 

 

And I'm attempting to get the execution time of every kernel in the design (have already used --profile but require run at full Freq). below is a snippet below, but I wanted to check that using the time_start2 - time_end1 is correct as I haven't found any examples of using events across multiple commandqueues and the fluctuation in results appears rather large. 

 

Cheers Sam 

 

#################### Averages# #################### Reader Execution Time min 6812, avg 7308, max, 11493 Processing Execution Time min 9135, avg 36094, max, 94796 Writer Execution Time min 6614, avg 7063, max, 9013 Total Execution Time min 22657, avg 50466, max, 110813# ##################################################  

 

... uint64_t min = {0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF}; uint64_t avg = {0}; uint64_t max = {0}; # define N 200 for (uint i = 0; i < N; i++) { status = clEnqueueTask(queue1, kernel1, 0, NULL, &kernel_event1); status = clEnqueueTask(queue2, kernel2, 0, NULL, &kernel_event2); checkError(status, "Failed to launch kernel"); clFinish(queue1); clFinish(queue2); uint64_t time_start1, time_end1, time_start2, time_end2; uint64_t reader_time_ms, processing_time_ms, writer_time_ms, total_time_ms; clGetEventProfilingInfo(kernel_event1, CL_PROFILING_COMMAND_START, sizeof(uint64_t), &time_start1, NULL); clGetEventProfilingInfo(kernel_event1, CL_PROFILING_COMMAND_END, sizeof(uint64_t), &time_end1, NULL); clGetEventProfilingInfo(kernel_event2, CL_PROFILING_COMMAND_START, sizeof(uint64_t), &time_start2, NULL); clGetEventProfilingInfo(kernel_event2, CL_PROFILING_COMMAND_END, sizeof(uint64_t), &time_end2, NULL); if (i > 5) { reader_time_ms = (time_end1 - time_start1); avg += reader_time_ms; if (reader_time_ms > max) max = reader_time_ms; if (reader_time_ms < min) min = reader_time_ms; processing_time_ms = (time_start2 - time_end1); avg += processing_time_ms; if (processing_time_ms > max) max = processing_time_ms; if (processing_time_ms < min) min = processing_time_ms; writer_time_ms = (time_end2 - time_start2); avg += writer_time_ms; if (writer_time_ms > max) max = writer_time_ms; if (writer_time_ms < min) min = writer_time_ms; total_time_ms = (time_end2 - time_start1); avg += total_time_ms; if (total_time_ms > max) max = total_time_ms; if (total_time_ms < min) min = total_time_ms; } } printf("#################### Averages# ####################\n"); printf("Reader Execution Time\n"); printf("min %" PRIu64 ", avg %" PRIu64 ", max, %" PRIu64 "\n",min, avg / (N-5), max); printf("Processing Execution Time\n"); printf("min %" PRIu64 ", avg %" PRIu64 ", max, %" PRIu64 "\n",min, avg / (N-5), max); printf("Writer Execution Time\n"); printf("min %" PRIu64 ", avg %" PRIu64 ", max, %" PRIu64 "\n",min, avg / (N-5), max); printf("Total Execution Time\n"); printf("min %" PRIu64 ", avg %" PRIu64 ", max, %" PRIu64 "\n",min, avg / (N-5), max); printf("###################################################\n\n"); ...
0 Kudos
2 Replies
Altera_Forum
Honored Contributor I
52 Views

If data is being "streamed" from the reader through the compute kernels and written back by the writer kernel, I would expect the kernels to more or less start and end at the same time. You cannot separately determine the run time of each kernel when the kernels are running in parallel, since each kernel will start executing as soon as it receives the first data through its incoming channel and hence, kernel run times will largely overlap. 

 

Your method will only work if the kernels are running fully sequentially, either in the same queue or in multiple queues but forced to run sequentially using events.
Altera_Forum
Honored Contributor I
52 Views

Hi, 

 

The auto-run kernel profiling full support is expected to be in SDK 17.1. Where this able to capture more accurate results. 

 

Best Regards, 

CloseCL 

(This message was posted on behalf of Intel Corporation)
Reply