- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are some facts about the AOCL compiler i find astonishing.
aocl version 16.0.2.222, Quartus the same, 10AX115N3F40E2SG on Nallatech board pci385a and bsp. 1) the result of the compilation of the vector_add example is that no DSP is utilized and in fact performance is terrible ; Resource + Usage ; +----------------------------------------+---------------------------+ ; Logic utilization ; 10% ; ; ALUTs ; 4% ; ; Dedicated logic registers ; 6% ; ; Memory blocks ; 10% ; ; DSP blocks ; 0% ; why that ? 2) the compilation of this simple vector copy fails with compiler error : ---------------------- riinout.cl --------------------------------------- __kernel void riinout( __global const float *x, __global float *restrict y) { // get index of the work item int index = get_global_id(0); y[index] = x[index]; } --------------------------------------------------------------------------- $ aoc device/riinout.cl -o bin/riinout.aocx /media/sda1/home/nallatech/aocl/examples/riinout/riinout/device/riinout.cl:3:46: warning: declaring kernel argument with no 'restrict' may lead to low kernel performance __kernel void riinout( __global const float *x, ^ 1 warning generated. Error: Compiler Error, not able to generate hardware ---------------------------- quartus_sh_compile.log--------------------------- Info: Initializing Spectra-Q Synthesis... Info: Project = "top" Info: Revision = "top_synth" Warning (125092): Tcl Script File board/board.qip not found Info (125063): set_global_assignment -name QIP_FILE board/board.qip Info: qis_default_flow_script.tcl version:# 1 Info: Initializing Spectra-Q Synthesis... Info: Project = "top" Info: Revision = "top_synth" Info (16303): High Performance Effort optimization mode selected -- timing performance will be prioritized at the potential cost of increased compilation time Info (16303): High Performance Effort optimization mode selected -- timing performance will be prioritized at the potential cost of increased compilation time *** Fatal Error: Segment Violation at (nil) Module: quartus_syn Stack Trace: 0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8) 0x35e7d: __cxa_finalize + 0x9d (c.so.6) 0x35ae2: exit + 0xe2 (c.so.6) 0x1ed24: __libc_start_main + 0x104 (c.so.6) End-trace Error (114016): Out of memory in module quartus_syn (1671 megabytes used) *** Fatal Error: Segment Violation at (nil) Module: quartus_syn Stack Trace: 0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8) 0x35e7d: __cxa_finalize + 0x9d (c.so.6) 0x35ae2: exit + 0xe2 (c.so.6) 0x1ed24: __libc_start_main + 0x104 (c.so.6) End-trace Error (114016): Out of memory in module quartus_syn (1677 megabytes used) *** Fatal Error: Segment Violation at (nil) Module: quartus_syn Stack Trace: 0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8) 0x35e7d: __cxa_finalize + 0x9d (c.so.6) 0x35ae2: exit + 0xe2 (c.so.6) 0x1ed24: __libc_start_main + 0x104 (c.so.6) End-trace Error (114016): Out of memory in module quartus_syn (1688 megabytes used) Error: Failed to synthesize partition Info: Saving post-synthesis snapshots for 1 partition(s) *** Fatal Error: Segment Violation at (nil) Module: quartus_syn Stack Trace: 0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8) 0x35e7d: __cxa_finalize + 0x9d (c.so.6) 0x35ae2: exit + 0xe2 (c.so.6) 0x1ed24: __libc_start_main + 0x104 (c.so.6) End-trace -------------------------------------------------------------------------------------- Any answer ? thanks robertoLink Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) Only a tiny percentage of the chip is being used here. And the percentages shown are integer-ized, so you may need to use a dozen DSPs to change the 0% to 1%. But if you want to scale up the size and performance try vectorizing or num compute units.
2) The aoc compiler uses Quartus under the hood, Quartus needs a lot of memory, 64GB is a good size. The error message you're getting (below) is definitely pointing to a system with too little memory. Error (114016): Out of memory in module quartus_syn (1688 megabytes used)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
the Nallatech 385A BSP version r001.004.0001 is strictly limited to OpenCL SDK / Quartus Prime Pro version 16.0.0 (no updates). I would suggest installing 16.0.0 and trying to go through the compilation process again. Thanks, G- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I went back to quartus 16.0.0 as suggested.
I still had the cant compile from time to time. My impression is that it gets corrupted .. dont know what. Because if i reboot the host then the compiler is working again. thanks- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Nallatech 385A BSP version r001.004.0002 for OpenCL SDK / Quartus Prime pro version 16.0.2 is now released.
There are 2 versions of the Nallatech 385A BSP: 1. HPC BSP (2 x 40Gbps board-to-board IO channels) and 2. MAC BSP (2 x 10GbE MAC cores IO channels) Thanks G- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- The Nallatech 385A BSP version r001.004.0002 for OpenCL SDK / Quartus Prime pro version 16.0.2 is now released. There are 2 versions of the Nallatech 385A BSP: 1. HPC BSP (2 x 40Gbps board-to-board IO channels) and 2. MAC BSP (2 x 10GbE MAC cores IO channels) Thanks G --- Quote End --- Thanks for the info. I think i will try it. Even if I'm a bit scared because the last time passing from the beta to the official release Nallatech had the idea of changing the name of the board making unusable binaries that required hundreds of hours of cpu time to build them up. grazie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Thanks for the info. I think i will try it. Even if I'm a bit scared because the last time passing from the beta to the official release Nallatech had the idea of changing the name of the board making unusable binaries that required hundreds of hours of cpu time to build them up. grazie --- Quote End --- Now I'm scared to update definitely BSP. One first result I got from a precompiled fft1d binary is that it is 30% less performant : with new BSP R001.004.0002 Using AOCX: fft1d.aocx Reprogramming device with handle 1 Launching FFT transform for 2000 iterations FFT kernel initialization is complete. Processing time = 5.7592ms Throughput = 1.4224 Gpoints / sec (85.3449 Gflops) Signal to noise ratio on output sample: 137.677661 --> PASSED Launching inverse FFT transform for 2000 iterations Inverse FFT kernel initialization is complete. Processing time = 5.7347ms Throughput = 1.4285 Gpoints / sec (85.7101 Gflops) Signal to noise ratio on output sample: 137.041007 --> PASSED with old BSP (quartus 16.0.0) Using AOCX: fft1d.aocx Launching FFT transform for 2000 iterations FFT kernel initialization is complete. Processing time = 4.1108ms Throughput = 1.9928 Gpoints / sec (119.5692 Gflops) Signal to noise ratio on output sample: 137.677661 --> PASSED Launching inverse FFT transform for 2000 iterations Inverse FFT kernel initialization is complete. Processing time = 4.0927ms Throughput = 2.0016 Gpoints / sec (120.0977 Gflops) Signal to noise ratio on output sample: 137.041007 --> PASSED
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page