Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
697 Discussions

Stratix 10 MX OpenCL issue multiple constant drivers for net "const_mem_0.avm_fill_waitrequest"

TLeng2
Beginner
2,099 Views

Dear Intel Community,

Its me again,

This time I have a weird Issue.

The OpenCL BSP for the Straatix 10 MX is the Intel one which exposes all HBM Memories as single memory interfaces. not as one common global memory. I have 32 Global memories.

I tried to implement an address based Bankswitching to go from HBM1 to HBM31 and so on.

HBM0 is used for Transmission storage.

Didnt work so far. The Boardtest compiles perfectly.

My kernel compiles and generates a const mem 0 device and errors out with

Error (13264): Can't resolve multiple constant drivers for net "const_mem_0.avm_fill_waitrequest" at kernelwithsearch_sys.v(5016) File: c:/intelFPGASync/stratix_mx/generator/project/project_sys.v Line: 5016

" details from project_sys.v"

 

generate
   begin:const_mem_0
      logic snoop_clk;
      logic snoop_write;
      logic snoop_overflow;
      logic [22:0] snoop_addr;
      logic [4:0] snoop_burst;
      logic [22:0] cc_addr;
      logic cc_read;
      logic cc_waitrequest;
      logic cc_readdatavalid;
      logic [255:0] cc_readdata;
      logic avm_fill_enable;
      logic avm_fill_read;
      logic avm_fill_write;
      logic [27:0] avm_fill_address;
      logic [255:0] avm_fill_writedata;
      logic [31:0] avm_fill_byteenable;
      logic avm_fill_waitrequest;
      logic [255:0] avm_fill_readdata;
      logic avm_fill_readdatavalid;
      logic [4:0] avm_fill_burstcount;
      logic avm_fill_writeack;
      logic [22:0] avm_fill_address_word;

      // INST const_cache of acl_const_cache
      acl_const_cache
      #(
         .NUMPORTS(1),
         .LOG2SIZE(14),
         .LOG2WIDTH(8),
         .AWIDTH(23),
         .MWIDTH(256),
         .BURSTWIDTH(5),
         .FAMILY("Stratix 10"),
         .ASYNC_RESET(0),
         .SYNCHRONIZE_RESET(1),
         .FORCE1XCLK(1),
         .AVM_READ_DATA_LATENESS(0)
      )
      const_cache
      (
         .clk(clock),
         .clk2x(clock2x),
         .resetn(resetn),
         .fill_addr(avm_fill_address_word),
         .fill_read(avm_fill_read),
         .fill_waitrequest(avm_fill_waitrequest),
         .fill_readdatavalid(avm_fill_readdatavalid),
         .fill_readdata(avm_fill_readdata),
         .flush_cache(SearchDAG_finish),
         .snoop_clk(snoop_clk),
         .snoop_overflow(snoop_overflow),
         .snoop_addr(snoop_addr),
         .snoop_burst(snoop_burst),
         .snoop_write(snoop_write),
         .snoop_ready(),
         .rdport_addr(cc_addr),
         .rdport_read(cc_read),
         .rdport_waitrequest(cc_waitrequest),
         .rdport_readdatavalid(cc_readdatavalid),
         .rdport_readdata(cc_readdata)
      );

      assign snoop_clk = 1'b0;
      assign snoop_write = 1'b0;
      assign snoop_overflow = 1'b0;
      assign snoop_addr = '0;
      assign snoop_burst = '0;
      assign cc_addr = {const_avm_0_address[0][27:5]};
      assign cc_read = {const_avm_0_read[0]};
      assign {const_avm_0_waitrequest[0]} = cc_waitrequest;
      assign {const_avm_0_readdatavalid[0]} = cc_readdatavalid;
      assign {const_avm_0_readdata[0]} = cc_readdata;
      assign avm_fill_address = avm_fill_address_word << 5;
      assign avm_fill_write = 1'b0;
      assign avm_fill_byteenable = '1;
      assign avm_fill_burstcount = 5'b00001;
      assign avm_fill_enable = 1'b1;
   end
   endgenerate

 

 

I dont know why this is generated. In the Boardtest it is not generated.

Maybe the compiler tries to cache the memory accesses..... well i understand this but i am not soooo perfect in OpenCL.

If anyone has some information for me... It would be awesome.

 

I tried to modify the BSP to have a unified continuous memory like on the Stratix 10 SX cards but i was out of luck to achieve it.

Any info on this would be very helpfull.

 

Many greetings so far.

 

Thomas

0 Kudos
7 Replies
EBERLAZARE_I_Intel
2,056 Views

Hi,

Please check below if you have multi concurrent assignment which may have been the issue:

https://www.intel.com/content/www/us/en/programmable/quartushelp/13.0/mergedProjects/msgs/msgs/evrfx_vdb_net_multiple_drivers.htm

0 Kudos
TLeng2
Beginner
2,038 Views

Dear EBERLAZARE,

 

Thank You for the Information.

I understand the fact of the too many drivers. The issue is that I dont create the mem0 module.

I am using the Intel OpenCL for FPGA on this Board and it gets automatically created.

How can I prevent this ?

0 Kudos
EBERLAZARE_I_Intel
1,933 Views

Hi Steffen,

This problem may comes from the fact that the signal is technically multidrive since it is assigned multiple times as the process is inside a generate block. 

By looking into your source code, you could workaround the problem by taking the some processes out of the generate frame statement.

 

0 Kudos
TLeng2
Beginner
2,034 Views

The best would be to be able to change the BSP from 32 individual mem interfaces to one single Access...

In the BSP for the Stratix SX I have 1 Memory Interface which goes to 2 DDR4 Banks.

<!-- DDR4-2666 -->
<global_mem name="DDR" max_bandwidth="42656" interleaved_bytes="1024" config_addr="0x018">
<interface name="board" port="kernel_mem2" type="slave" width="512" maxburst="16" address="0x000000000" size="0x200000000" latency="240" waitrequest_allowance="6"/>
<interface name="board" port="kernel_mem3" type="slave" width="512" maxburst="16" address="0x200000000" size="0x200000000" latency="240" waitrequest_allowance="6"/>
</global_mem>

 

 

In the Stratix MX i have 32 Interfaces which go to 32 Memory Interfaces....

// board_spec.xml

 

<global_mem name="HBM0" max_bandwidth="16000" interleaved_bytes="512" config_addr="0x018">
<interface name="board" port="kernel_slave_0" type="slave" width="256" maxburst="16" address="0x0" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>
</global_mem>
<global_mem name="HBM1" max_bandwidth="16000" interleaved_bytes="512" config_addr="0x100">
<interface name="board" port="kernel_slave_1" type="slave" width="256" maxburst="16" address="0x10000000" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>
</global_mem>
<global_mem name="HBM2" max_bandwidth="16000" interleaved_bytes="512" config_addr="0x104">
<interface name="board" port="kernel_slave_2" type="slave" width="256" maxburst="16" address="0x20000000" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>
</global_mem>
<global_mem name="HBM3" max_bandwidth="16000" interleaved_bytes="512" config_addr="0x108">
<interface name="board" port="kernel_slave_3" type="slave" width="256" maxburst="16" address="0x30000000" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>
</global_mem>

 

......

 

<global_mem name="HBM31" max_bandwidth="16000" interleaved_bytes="512" config_addr="0x178">
<interface name="board" port="kernel_slave_31" type="slave" width="256" maxburst="16" address="0x1f0000000" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>
</global_mem>

 

which is on one side OK for a lot of sollutions. but not usefull in my Sollution.

 

I tried to change it in the boardspec but without luck.

I know i need to adjust the BSP in Quartus too but wasn't able to.....

<global_mem name="HBM0" max_bandwidth="512000" interleaved_bytes="128" config_addr="0x018">
<interface name="board" port="kernel_slave_0" type="slave" width="256" maxburst="16" address="0x0" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>
<interface name="board" port="kernel_slave_1" type="slave" width="256" maxburst="16" address="0x10000000" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>
<interface name="board" port="kernel_slave_2" type="slave" width="256" maxburst="16" address="0x20000000" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>

......

<interface name="board" port="kernel_slave_31" type="slave" width="256" maxburst="16" address="0x1f0000000" size="0x10000000" latency="240" waitrequest_allowance="6" bsp_avmm_write_ack="1"/>
</global_mem>

 

I know this BSP exists already somewhere.

 

0 Kudos
DongWang-BJTU
New Contributor I
2,013 Views

Where did get the this BSP ?

0 Kudos
TLeng2
Beginner
2,009 Views

I got the BSP in January from Intel.

We bought the Stratix MX Card with it.

At this point the OpenCL was still on intels roadmap.

 

Now it isnt anymore and it seems that we spent 50K$ for Developmentkits for nothing

0 Kudos
jomarm10
Beginner
1,573 Views

Hi, 

I am facing the same issue. We have the same S10MX board and the same BSP... we need bigger buffers.

We made some failed attempts to group them.

All we get is

Error (13224): Verilog HDL or VHDL error at lsu_token_ring.sv(207): $fatal : lsu_ic_top ring interconnect: Multiple write rings are not supported when using BSP_AVMM_WRITE_ACK
Error (16185): Can't elaborate user hierarchy "freeze_wrapper_inst|pr_region_inst"
Error (16186): Can't elaborate top-level user hierarchy

 

Did you find any solution to this?

 

Best Regards.

0 Kudos
Reply