Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16597 Discussions

Quartus II Synthesis - System Memory Issues for Large Stratix 10 Design

CADAMS
Novice
1,256 Views

Hello,

I have a Stratix 10 design that is based around an ip core generated using Intel's HLS. The core does some simple floating point operations and by itself uses very few resources (1 DSP, a few hundred flops etc).

This core sits inside a generate statement like this:

generate
    for(i = 0; i < SOMEBIGNUMBER; i=i+1)
        myhlscore u0 (inputs, outputs);
...


The design works and is proven in simulation and in hardware.

The problem comes when I try to increase the value of SOMEBIGNUMBER. Despite there being adequate resources, using values above 200 or so make the synthesis tool run out of memory.

I cannot alleviate this easily by adding more memory - I already tried synthesizing on a computer with 256GB memory and a 200GB swap space and quartus ate it all up before dying.

I'm using a .ip file from HLS right now. I'm wondering is there is some way to pre-synthesis the module and keep the results, or is there someway I need to write the generate statement so that it caches less? Perhaps there are some synthesis settings I can change?

 

We tried using a design partition, but the elaboration stage still exceeds the 120 GB of memory.

Thanks,
C

0 Kudos
8 Replies
BoonBengT_Intel
Moderator
1,225 Views

Hi @CADAMS,

 

Thank you for posting in Intel community forum and hope all is well.
If I understand correctly the situation I would say the looping might be the cause here.

 

Hence would recommend to look into pipelining the loops which will enable parallelism.
Here are some explanation on the concept, and would recommend to look into the guide of writing the loop in HLS.
Hope that clarify.

 

Best Wishes
BB

0 Kudos
CADAMS
Novice
1,219 Views

No that's not the issue. The loop in my original post is a generate statement (RTL) wrapping the HLS core. I don't think the guides you sent are relevant.

Cheers,

C

0 Kudos
BoonBengT_Intel
Moderator
1,133 Views

Hi @CADAMS,

 

Apologies for the confusion, if I understand correctly what has been implemented is a for loop to trigger the ip core that are performing the simple floating point. If the for loop are increase to a big number, memory are overloaded.

 

Mind if I asked what are the value for the SOMEBIGNUMBER which is causing the issues?
As well as are you able to share how the floating-point operation are written?
Hope to hear from you soon.

 

Best Wishes
BB

0 Kudos
BoonBengT_Intel
Moderator
1,091 Views

Hi @CADAMS,

 

Good day, just following up on the previous clarification.
By any chances did you managed to look into the it?

 

Best Wishes
BB

0 Kudos
CADAMS
Novice
1,075 Views

@BoonBengT_Intel ,

 

Thanks for getting back to me. Sorry for slow reply.

 

I'm not sure what you mean by 'trigger', the Verilog generate-for loop instantiates parallel instances of the IP core. If the for loop is large, then during compilation the memory usage is untenable.

 

SOMEBIGNUMBER is approximately 200. I cannot share the exact HLS code, but it is essentially a cumulative sum across 128 inputs. Here is some psuedo code:

 

float myHlsCore(16bit integer stream_in)

{

    static float runSum = 0;

    for  c=1:128

        runSum+= stream_in[c];

 

   return runSum;

}

 

We have partially solved the issue by creating a very large swap file on the system (~500GB), but this is not a realistic solution as memory access is extremely slow on the swap.

 

Now the compilation process fails, saying the design cannot be routed, despite resource usage being less than 60%. It is a slightly different issue, but I think they are related, and I do not think that the swap file is a good solution to the original problem either.

 

Thanks,

Chris

0 Kudos
BoonBengT_Intel
Moderator
928 Views

Hi @CADAMS,

Greetings and apologies for the delayed in response.
We did try to test out a simple floating-point matrix in HLS together with the quartus compilation.

However, we did not notice the increase of resources.
Hence our guess is on the three might be some resources usage in the quartus design, which we would suggest sharing more on the qsys design you have.
Hope to hear from you soon.

 

Best Wishes
BB

0 Kudos
BoonBengT_Intel
Moderator
909 Views

Hi @CADAMS,

Good day, just checking in to see if there is any further doubts in regards to this matter.
Hope we have clarify your doubts.

Best Wishes
BB

0 Kudos
BoonBengT_Intel
Moderator
869 Views

Hi @CADAMS,

Greetings, as we do not receive any further clarification on what is provided, we would assume challenge are resolved. For new queries, please feel free to open a new thread and we will be right with you or let us know if challenge are still open and we would get back to you as soon as convenient. Pleasure having you here.

Best Wishes
BB

0 Kudos
Reply