Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Valued Contributor III
1,091 Views

astonishing facts about aocl compiler : vector_add -> No DSP used, y[i]=x[i] comp er

There are some facts about the AOCL compiler i find astonishing. 

aocl version 16.0.2.222, Quartus the same, 10AX115N3F40E2SG on Nallatech board pci385a and bsp. 

 

1) the result of the compilation of the vector_add example is that no DSP is utilized and in fact 

performance is terrible 

; Resource + Usage ; 

+----------------------------------------+---------------------------+ 

; Logic utilization ; 10% ; 

; ALUTs ; 4% ; 

; Dedicated logic registers ; 6% ; 

; Memory blocks ; 10% ; 

; DSP blocks ; 0% ; 

why that ? 

 

2) the compilation of this simple vector copy fails with compiler error : 

---------------------- riinout.cl --------------------------------------- 

__kernel void riinout( __global const float *x,  

__global float *restrict y) 

// get index of the work item 

int index = get_global_id(0); 

 

y[index] = x[index]; 

--------------------------------------------------------------------------- 

$ aoc device/riinout.cl -o bin/riinout.aocx 

/media/sda1/home/nallatech/aocl/examples/riinout/riinout/device/riinout.cl:3:46: warning: declaring kernel argument with no 'restrict' may lead to low kernel performance 

__kernel void riinout( __global const float *x,  

1 warning generated. 

Error: Compiler Error, not able to generate hardware 

---------------------------- quartus_sh_compile.log--------------------------- 

Info: Initializing Spectra-Q Synthesis... 

Info: Project = "top" 

Info: Revision = "top_synth" 

Warning (125092): Tcl Script File board/board.qip not found 

Info (125063): set_global_assignment -name QIP_FILE board/board.qip 

Info: qis_default_flow_script.tcl version:# 1 

Info: Initializing Spectra-Q Synthesis... 

Info: Project = "top" 

Info: Revision = "top_synth" 

Info (16303): High Performance Effort optimization mode selected -- timing performance will be prioritized at the potential cost of  

increased compilation time 

Info (16303): High Performance Effort optimization mode selected -- timing performance will be prioritized at the potential cost of  

increased compilation time 

 

*** Fatal Error: Segment Violation at (nil) 

Module: quartus_syn 

Stack Trace: 

0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8) 

0x35e7d: __cxa_finalize + 0x9d (c.so.6) 

 

 

0x35ae2: exit + 0xe2 (c.so.6) 

0x1ed24: __libc_start_main + 0x104 (c.so.6) 

 

 

End-trace 

 

Error (114016): Out of memory in module quartus_syn (1671 megabytes used) 

 

*** Fatal Error: Segment Violation at (nil) 

Module: quartus_syn 

Stack Trace: 

0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8) 

0x35e7d: __cxa_finalize + 0x9d (c.so.6) 

 

 

0x35ae2: exit + 0xe2 (c.so.6) 

0x1ed24: __libc_start_main + 0x104 (c.so.6) 

 

 

End-trace 

 

Error (114016): Out of memory in module quartus_syn (1677 megabytes used) 

 

*** Fatal Error: Segment Violation at (nil) 

Module: quartus_syn 

Stack Trace: 

0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8) 

0x35e7d: __cxa_finalize + 0x9d (c.so.6) 

 

 

0x35ae2: exit + 0xe2 (c.so.6) 

0x1ed24: __libc_start_main + 0x104 (c.so.6) 

 

 

End-trace 

 

Error (114016): Out of memory in module quartus_syn (1688 megabytes used) 

Error: Failed to synthesize partition 

Info: Saving post-synthesis snapshots for 1 partition(s) 

 

*** Fatal Error: Segment Violation at (nil) 

Module: quartus_syn 

Stack Trace: 

0x60a43: google::protobuf::FileDescriptorTables::~FileDescriptorTables() + 0x33 (protobuf.so.8) 

0x35e7d: __cxa_finalize + 0x9d (c.so.6) 

 

 

0x35ae2: exit + 0xe2 (c.so.6) 

0x1ed24: __libc_start_main + 0x104 (c.so.6) 

 

 

End-trace 

-------------------------------------------------------------------------------------- 

Any answer ? 

 

thanks  

roberto
0 Kudos
6 Replies
Highlighted
Valued Contributor III
11 Views

1) Only a tiny percentage of the chip is being used here. And the percentages shown are integer-ized, so you may need to use a dozen DSPs to change the 0% to 1%. But if you want to scale up the size and performance try vectorizing or num compute units. 

 

2) The aoc compiler uses Quartus under the hood, Quartus needs a lot of memory, 64GB is a good size. The error message you're getting (below) is definitely pointing to a system with too little memory. 

 

Error (114016): Out of memory in module quartus_syn (1688 megabytes used)
0 Kudos
Highlighted
Valued Contributor III
11 Views

Hi, 

 

the Nallatech 385A BSP version r001.004.0001 is strictly limited to OpenCL SDK / Quartus Prime Pro version 16.0.0 (no updates). 

 

I would suggest installing 16.0.0 and trying to go through the compilation process again. 

 

Thanks, 

G
0 Kudos
Highlighted
Valued Contributor III
11 Views

I went back to quartus 16.0.0 as suggested. 

 

I still had the cant compile from time to time. 

My impression is that it gets corrupted .. dont know what. 

 

Because if i reboot the host then the compiler is working again. 

 

 

thanks
0 Kudos
Highlighted
Valued Contributor III
11 Views

The Nallatech 385A BSP version r001.004.0002 for OpenCL SDK / Quartus Prime pro version 16.0.2 is now released. 

 

There are 2 versions of the Nallatech 385A BSP:  

1. HPC BSP (2 x 40Gbps board-to-board IO channels) and  

2. MAC BSP (2 x 10GbE MAC cores IO channels) 

 

Thanks 

G
0 Kudos
Highlighted
Valued Contributor III
11 Views

 

--- Quote Start ---  

The Nallatech 385A BSP version r001.004.0002 for OpenCL SDK / Quartus Prime pro version 16.0.2 is now released. 

 

There are 2 versions of the Nallatech 385A BSP:  

1. HPC BSP (2 x 40Gbps board-to-board IO channels) and  

2. MAC BSP (2 x 10GbE MAC cores IO channels) 

 

Thanks 

--- Quote End ---  

 

Thanks for the info. 

I think i will try it. 

Even if I'm a bit scared because the last time passing from the beta to the official release  

Nallatech had the idea of changing the name of the board making unusable  

binaries that required hundreds of hours of cpu time to build them up. 

 

grazie
0 Kudos
Highlighted
Valued Contributor III
11 Views

 

--- Quote Start ---  

Thanks for the info. 

I think i will try it. 

Even if I'm a bit scared because the last time passing from the beta to the official release  

Nallatech had the idea of changing the name of the board making unusable  

binaries that required hundreds of hours of cpu time to build them up. 

 

grazie 

--- Quote End ---  

 

 

Now I'm scared to update definitely BSP. 

One first result I got from a precompiled fft1d binary is that it is 30% less performant : 

with new BSP R001.004.0002  

Using AOCX: fft1d.aocx 

 

Reprogramming device with handle 1 

Launching FFT transform for 2000 iterations 

FFT kernel initialization is complete. 

Processing time = 5.7592ms 

Throughput = 1.4224 Gpoints / sec (85.3449 Gflops) 

Signal to noise ratio on output sample: 137.677661 --> PASSED 

 

Launching inverse FFT transform for 2000 iterations 

Inverse FFT kernel initialization is complete. 

Processing time = 5.7347ms 

Throughput = 1.4285 Gpoints / sec (85.7101 Gflops) 

Signal to noise ratio on output sample: 137.041007 --> PASSED 

 

with old BSP (quartus 16.0.0)  

Using AOCX: fft1d.aocx 

 

Launching FFT transform for 2000 iterations 

FFT kernel initialization is complete. 

Processing time = 4.1108ms 

Throughput = 1.9928 Gpoints / sec (119.5692 Gflops) 

Signal to noise ratio on output sample: 137.677661 --> PASSED 

 

Launching inverse FFT transform for 2000 iterations 

Inverse FFT kernel initialization is complete. 

Processing time = 4.0927ms 

Throughput = 2.0016 Gpoints / sec (120.0977 Gflops) 

Signal to noise ratio on output sample: 137.041007 --> PASSED
0 Kudos