Quartus complaint: My LOOP is too big? But the code is from the book, what's wrong?

Altera_Forum · ‎01-30-2016

Hello friends. I'm trying to make addition and subtraction of floating point. My guide is a book "Computer Arithmetic and Verilog HDL Fundamentals" by Cavanagh. Inside the module he use a code for aligning exponents as I show as follows.

always @ (oper_1 or oper_2)

begin

exp_a = oper_1 [31:24];

exp_b = oper_2 [31:24];

fract_a = oper_1 [23:0];

fract_b = oper_2 [23:0];

// bias exponents

exp_a_bias = exp_a + 8'b0111_1111;

exp_b_bias = exp_b + 8'b0111_1111;

// align fractions

if (exp_a_bias < exp_b_bias)

ctrl_align = exp_b_bias - exp_a_bias;

while (ctrl_align)

begin

fract_a = fract_a >> 1;

exp_a_bias = exp_a_bias + 1;

ctrl_align = ctrl_align - 1;

end

if (exp_b_bias < exp_a_bias)

ctrl_align = exp_a_bias - exp_b_bias;

while (ctrl_align)

begin

fract_b = fract_b >> 1;

exp_b_bias = exp_b_bias + 1;

ctrl_align = ctrl_align - 1;

end

Quartus II give me the following error:

Error (10119): Verilog HDL Loop Statement error at ADD_SUB_FLO.v(40): loop with non-constant loop condition must terminate within 250 iterations

I searched and it seems that since Quartus can't be sure what will be the size of "ctrl_align" resultant it won't synthesize. Quartus site says I can edit the .qsf file, But I couldn't find in my file any

set_global_assignment -name VERILOG_NON_CONSTANT_LOOP_LIMIT 300

Moreover, I'm somehow impressed that the book teach that step with a wrong code, not syntesizable, I also wish to actually fix the problem not to workaround, what can I do?

Altera_Forum · ‎01-30-2016

You realise there is a big difference between HDL and procedural languages right?

Loops make little sense in hardware - it doesn't run as a loop, the synthesizer breaks it all up into hardware - so if you did a loop that added 1 four times, it would probably make a chain of four adders each adding 1.

In your case, how does the synthesizer know when to stop? If ctrl_align was say 10, it would require 10 copies of the hardware. If it was 20, it would require 20 copies of the hardware. So how does it know how many to put?

Altera_Forum · ‎01-30-2016

--- Quote Start ---

You realise there is a big difference between HDL and procedural languages right?

Loops make little sense in hardware - it doesn't run as a loop, the synthesizer breaks it all up into hardware - so if you did a loop that added 1 four times, it would probably make a chain of four adders each adding 1.

In your case, how does the synthesizer know when to stop? If ctrl_align was say 10, it would require 10 copies of the hardware. If it was 20, it would require 20 copies of the hardware. So how does it know how many to put?

--- Quote End ---

Yes sir, I have been reading more information and I understand what you say. Although, I'd like to point that this code is from book itself. I'm learning Verilog just recently, and Bibliography is very confusing and contradictory for a beginner. I was trying to make Floating Point operation, and Cavanagh book take this behavioral path. So I wonder, if is senseless to make such of operations by this way, why to write a whole book under this concept, while it should really be implemented differently? This is not a general book, it is about arithmetic in verilog. My first impression was that, it would be ok to use this model, and there is not a better way.

What way do you personally would suggest, considering this is a hypothetical real life implementation?

One more thing, it seems that is possible to Set .qsf files to a different number, Quartus complaint that 250 is the highest set, and my ctrl_aligh is 8 bits, that would be 255, so I just need to change that setting in the files?

Altera_Forum · ‎01-30-2016

The code from the book looks like it was written for use in a simulator. This code clearly cannot be synthesized.

Are you are studying how to develop hardware for doing floating point? Unless that is the case, using floating point in an FPGA is not the best solution. If your goal is to do a parallel calculation, you should convert your problem domain to integer or fixed point. Nearly every computation using physical world data can be done with integers or fixed point. Study numerical methods to see how this is done. Once you've converted your problem space then you can start your FPGA work.

Altera_Forum · ‎01-30-2016

Just because you can write HDL doesnt mean it makes sense in hardware. HDL is also meant to produce behavioral code to aid simulation that is not for use in a real design.

Altera_Forum · ‎01-30-2016

I want to ask here is while keyword in group of synthesized . Is it possible to use generic in this code or even lpm_constant ? Thanks

Altera_Forum · ‎01-31-2016

--- Quote Start ---

The code from the book looks like it was written for use in a simulator. This code clearly cannot be synthesized.

Are you are studying how to develop hardware for doing floating point? Unless that is the case, using floating point in an FPGA is not the best solution. If your goal is to do a parallel calculation, you should convert your problem domain to integer or fixed point. Nearly every computation using physical world data can be done with integers or fixed point. Study numerical methods to see how this is done. Once you've converted your problem space then you can start your FPGA work.

--- Quote End ---

I'm not sure what parallel calculation means, but isn't exactly what the above code is trying to do? It is, to separate exponents and fractions and consider them Integers, doing subsequent separated evaluations. Yet, it needs of this awkward loops. If is not, what is this numerical method which I should search for?

Now, I understand if I follow the book's method, Quartus will end generating at least 255 (If I manage to set configuration files) hardware units, which may or may not be used depending of the value reached by "ctrl_align", I wonder, is this completly wrong to do if my goal is not only to simulate but to actually fabricate a chip?

I will briefly explain what I'm trying to do, you may remember something of my previous posts.

I want to design a chip, this chip performs the calculation of an equation known as "FCM" (Fuzzy C means). Equation itself includes iterative additions, subtraction, multiplication and division. Because it performs iterative divisions, fractional numbers are necessary, otherwise It will end giving me a wrong results. From my point of view, I should create a logic in Verilog able to do this, using Quartus and Synthesis simulation of what should become in a chip later. So, is this the way to do it, or am I wrong?

I've searched in internet, there is a lot of material regarding Floating Point operations in verilog, they say efficient, but efficient enough for a real chip? Idk

Altera_Forum · ‎01-31-2016

--- Quote Start ---

Just because you can write HDL doesnt mean it makes sense in hardware. HDL is also meant to produce behavioral code to aid simulation that is not for use in a real design.

--- Quote End ---

I was under the conception that HDL is for designing chips or hardware itself, so what's the point to make a book with no synthesizable coding in HDL in the first instance. I can understand it for Test Benchs because they won't become a chip ever.

Updating:

If direct verilog isn't correct for doing this kind of operation, would it be a better way to design a kind of processor, with registers, stacks, simple add and subs? I know I could easily do this equation in Assembly.

Altera_Forum · ‎01-31-2016

--- Quote Start ---

I was under the conception that HDL is for designing chips or hardware itself, so what's the point to make a book with no synthesizable coding in HDL in the first instance. I can understand it for Test Benchs because they won't become a chip ever.

Updating:

If direct verilog isn't correct for doing this kind of operation, would it be a better way to design a kind of processor, with registers, stacks, simple add and subs? I know I could easily do this equation in Assembly.

--- Quote End ---

You're entering the hardware world. So forget everything you know about programming and assembler. Their concepts and coding style will give you poor results in HDL (if they work at all).

You need to start from scratch - learn about basic componenets (rams, registers, gates etc) and DRAW out your intended circuit on a peice of paper. Only then can you transcribe it into HDL. Verilog has a problem because it looks like C, so many people think it must be written in a similar fashion. This is a dangerous trap (as you're finding). As you're starting from scratch, you may be better off learning VHDL instead (as it is nothing like C).

I have no knowledge of the book you're talking about - but if it did work it would give very poor performance - loops like the one above unroll into sequential logic, which will give a terrible FMax (maybe 20Mhz if you're lucky). Also, floating point is rubbish in FPGAs as it requires large amounts of logic and lots of pipelining (latency). FPGAs are built to suit integer (fixed point) arithmatic. Is there any reason why you want floating and not fixed?

"direct verilog" as you put it would be perfectly suited to your application - just not with the code you've used above. I suggest using the floating point IP provided by altera.

Altera_Forum · ‎02-01-2016

I'd advise forgetting about floating point. FPGAs aren't designed for it. Sure, Altera provides floating point IP but you end up needing huge amounts of FPGA resources to use it unless you really know what you are doing. As others have said, you need to be thinking of designing hardware, not writing code. First analyze your problem. What are the min/max values of the inputs, outputs, and intermediate results? What precision do you need to keep in order to obtain sufficiently accurate results? What accuracy is required? How many bits are needed to represent these in binary? Then you come up with how many bits are before and after the decimal point. Prescale any inputs and intermediate constants so that no exponents are needed.

Only after all this (see a numerical methods or scientific programming book for details) will you know your fixed point format. Then do your calculations on fixed point integers. Draw out on paper where you need adders, multipliers. Learn about pipeline computing. Research parallel processing and determine if your calculation can be done in parallel. Is this a vector algorithm? If so FPGAs are particularly good at this.

Unfortunately, in spite of the marketing hype, you can't just take arbitrary algorithms and expect good results in an FPGA without learning the underlying techniques. I personally know of the existence of some of them but haven't actually implemented more than a few of these techniques.

Altera_Forum · ‎02-01-2016

--- Quote Start ---

I'd advise forgetting about floating point. FPGAs aren't designed for it. Sure, Altera provides floating point IP but you end up needing huge amounts of FPGA resources to use it unless you really know what you are doing. As others have said, you need to be thinking of designing hardware, not writing code. First analyze your problem.

--- Quote End ---

I think in this case is mandatory to use floating point, I need iterative divisions, divisions will bring up fractional numbers of different sizes, example. 2.9 - 2.1899, it is necessary to align exponents before make any operations, thus is necessary exponents to vary according to the input size difference, and this is exactly what floating point is about.

Altera_Forum · ‎02-01-2016

--- Quote Start ---

I think in this case is mandatory to use floating point, I need iterative divisions, divisions will bring up fractional numbers of different sizes, example. 2.9 - 2.1899, it is necessary to align exponents before make any operations, thus is necessary exponents to vary according to the input size difference, and this is exactly what floating point is about.

--- Quote End ---

In that case - stick with the altera floating point IP cores rather than trying to write your own floating point arithmatic.

This is of course, if there is no chance you can redesign the algorhith to fit with an FPGA. If you want lots of floating point (and especially division - which is particularly expensive in FPGA) why not use a DSP?

Altera_Forum · ‎02-01-2016

--- Quote Start ---

In that case - stick with the altera floating point IP cores rather than trying to write your own floating point arithmatic.

This is of course, if there is no chance you can redesign the algorhith to fit with an FPGA. If you want lots of floating point (and especially division - which is particularly expensive in FPGA) why not use a DSP?

--- Quote End ---

Tricky, sorry for the question, this DSP you mentioned is like an accessory chip in the development board, or is it the NIOS-II?

In theory. I would need about 57.600 Processing Elements (240 x 240 pixels from a black and white picture) , Each processing element calculates an equation like this

http://www.alteraforum.com/forum/attachment.php?attachmentid=11768&stc=1

where A = is a positive integer from 0 to 255, 1 pixel = 1 processing element, (B, X and Y start with random numbers which will update by addition of the others pixels-PE results, but that's a different story) which I expected to implement in hardware somehow.

In fact, since this is a school project, requirements are pretty flexible, so I have the freedom to use any technology (But I have later to explain the reason).

Altera_Forum · ‎02-01-2016

a DSP chip is processor designed for digitial signal processing. You can get ones that work in fixed or floating point: https://en.wikipedia.org/wiki/digital_signal_processor

If you try and make 57600 PEs in a single FPGA, you're doomed to failure. For a start, your image (240x240) will not arrive in parrallel. You will get the data as a streaming set of pixels. Then you need to calculate the results in series.

Also, the algorithm looks about ready for some re-design, and I dont understand why it cannot be fixed point. If you know the width of the inputs, then you can with out the max width of U.

Altera_Forum · ‎02-01-2016

--- Quote Start ---

a DSP chip is processor designed for digitial signal processing. You can get ones that work in fixed or floating point: https://en.wikipedia.org/wiki/digital_signal_processor

If you try and make 57600 PEs in a single FPGA, you're doomed to failure. For a start, your image (240x240) will not arrive in parrallel. You will get the data as a streaming set of pixels. Then you need to calculate the results in series.

Also, the algorithm looks about ready for some re-design, and I dont understand why it cannot be fixed point. If you know the width of the inputs, then you can with out the max width of U.

--- Quote End ---

Tricky, thanks as always for your support. 57600 PE might be an exaggeration (I could actually implement only 1.000 pixels for sake of simplicity) , but the way data get PE is actually not relevant (according to the specs of my project), Once each data is in "position" is when the processing must start. In the other hand. I'm thinking to reformulate my project to use my board DE1-SOC's resources. Do you think I could use NIOS-II and C or Assembly programming language for the equation above for (let's say) 1.000 independent and parallel cores (in a way which makes sense)?

Altera_Forum · ‎02-13-2016

I don't think that you need a DSP or other processor to perform the calculation. You can also write HDL code that processes data sequentially, using block RAM to hold the array elements. You have to set up a state machine that controls the calculation flow.

I presume that the parallel cores idea doesn't fit any FPGA hardware available to you.