Me and my simple questions: Can I do this in FPGA?

Altera_Forum · ‎12-05-2015

Hi dear Altera friends. Sorry for the simple questions, i'm a beginner. My FPGA is DE1-SoC University, i'm learning how to use it.

My question today is the following one:

Until now, with the help of the student material of Altera, I have learn how to implement simple circuits, ALU and registers. Now, Is it possible to implement in hardware complicated equations? Like 1 divided by a a big number, or square roots, I mean, to work with real numbers like 0.0031416 and so on?

I know how to do this either in C and Assembler, but I don't know if this is possible to do in FPGA and VHDL.

A senior friend told me, this is not possible to do in VHDL, he said I have to make of use ARM chip in my FPGA and program it fon C, since FPGA and VHDL alone is not able to perform such of calculations.

What can you say about this?

Example:

http://www.alteraforum.com/forum/attachment.php?attachmentid=11555&stc=1

Altera_Forum · ‎12-05-2015

Your friend doesnt know much. Altera provides floating point cores to do floating point arithmatic. https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/ug/ug_altfp_mfug.pdf

While you wouldnt be using much VHDL, you are designing a circuit. The VHDL would just glue the cores together.

But floating point in FPGA does have it's issues - it has high resource usage and high latency. If you have a constant data stream then the FPGA can process far more than a processor could handle (it can do it in real time - you basically build a custom co-processor). But if it's just a few calculations then it's probably easier to let an arm do the work.

So the answer is "Yes it can do it". But you will need to think seriously about how to implement it.

Altera_Forum · ‎12-05-2015

Tricky right. You can

But your equation can be more optimized to parallel computation) and should be rewritten as you compute power. it depends how data stream come

You have to use floating point cause you need exponent and logarithm or you can build your own for integer data type or fixedpoint

Altera_Forum · ‎12-05-2015

y**x = 2**(x*log2 y) if y > 0.

avoid division and use rational approach 1/x = x**(-1) = 2**(-log2 x) , x>0.

You can rewrite your equation fully in 2**x and log2 x function.

if you need to compute sum the problem of accuracy still exists. you have to remember if you will add unsorted floating-point. Does it take place in FPGA?

Altera_Forum · ‎12-05-2015

When the proper units are chosen, very few caculations in the physical world cannot be done with 64 bit (or higher) integers. DSPs were for a long time strictly integer many techniques developed then are apply to FPGAs. Look into these techniques and take another look at how your problem might be solved with them.

I don't recommend using the floating point FPGA blocks that are available. They waste FPGA resources and aren't needed with some careful analysis of the problem. The same goes for floating point in OpenCL and Vivado HLS.

Altera_Forum · ‎12-06-2015

Oh. If Galfonz recommend using dsp I suggest you find docs about transformation based on fft for arithmetic operation with big integers

It is not on surface. Even googling web cannot provide fast answer

Altera_Forum · ‎12-10-2015

Thanks so much guys, I'll carefully study all your answers. My professor says I don't need to use Floating point unit for this, he says by designing a simple Adder-Multiplier-Accumulator would be enough (maybe is what Galfonz is mentioning). I will implement this on my FPGA but what i'm intending is to simulate a Chip which can perform this equation.

I have another question, this Floating Operation Module on the FPGA, Can it serves to several PE at the same time? Or it's a single module that will serve one PE one at a time? I ask this because I may need to do this equation in several PE at the same moment in paralell.

Altera_Forum · ‎12-10-2015

It has a single input and a single output.

But if the data rate is half the clock rate, you can mux the input between two sources.

Altera_Forum · ‎12-20-2015

--- Quote Start ---

Thanks so much guys, I'll carefully study all your answers. My professor says I don't need to use Floating point unit for this, he says by designing a simple Adder-Multiplier-Accumulator would be enough (maybe is what Galfonz is mentioning). I will implement this on my FPGA but what i'm intending is to simulate a Chip which can perform this equation.

--- Quote End ---

Hi, Before to build a MAC unit you can found ready to use or make from scratch if this is your course goal, you need do some analysis of math underneath the equation you posted, so from where are coming Xi m Ci Cj and how long is summation and where data is stored? If you need retrieve from main memory then maybe really the DE1 Arm core can do better.

MAC unit is the base of DSP processing, better knowledge of data handling is a start point to evaluate faster and precise solution.

--- Quote Start ---

I have another question, this Floating Operation Module on the FPGA, Can it serves to several PE at the same time? Or it's a single module that will serve one PE one at a time? I ask this because I may need to do this equation in several PE at the same moment in paralell.

--- Quote End ---

To answer to this question is again necessary to know how indexes and parameter are handled, you can build a block and replicate on FPGA to compute more than one result as a SIMD or MIMD architecture but this need data get not overwritten from result, so:

where are to be stored uij results, from where are taken Xi Cj Ck m C, are same on input and output vector/matrix?

Altera_Forum · ‎12-28-2015

Hey rromano001, I have been doing an analysis. All what I say is from my ignorance, so you guys are free to correct me, First I must say that I won't be able to make this using the nice resources of my DE1 because my intention is to design a system that can handle not 2 or 3 PE but hundreds PE communicating each other in a torus network. Each of them have to be able to perform Adding, Subtraction, Multiply and Divide to perform the purposed Equation.

Regarding Floating Processing Unit I think is not necessary to make this kind of unit, Since I don't require so exact accuracy. Instead, I will use a very exact Fixed point method, making use of a group of registers to be filled with BCD numbers divided like this

88 . 88888888 (base-10) = 1000 1000 . 1000 1000 1000 1000 1000 1000 1000 1000 (Binary)

I'm pretty sure this would be enough, I did a couple of calculations with examples values and they never overpasses these limits and when they do, it doesn't matter for my purpose.

Input data can come directly in BCD directly or be converted from base-10, it doesn't matter.

Then I will construct the modules for each of operations, Adding, Subtraction, Multiply and Divide for separated. Something that I've found is that VHDL is a REALLY low language, I suspect I cannot directly operate two decimal numbers, but like in Assembler, I have to convert them to BCD and make operations of arithmetic adding, complements (for subtraction), carry, etc, in order to be able to Add decimals numbers. am I right??

Being honest, my professor seems not to have a single IDEA of what I'm doing, he is more of high level language, and he seems to think arithmetic operations is so simple as to start code and call it a day. :(

Altera_Forum · ‎12-28-2015

Why do you want to use bcd? That makes things very complicated and use a lot of resources. Why not just work in binary like every other language? You can easily do arithmatic in vhdl with the IEEE.numeric_std library

A <= b + (c*d);

Altera_Forum · ‎12-29-2015

--- Quote Start ---

Why do you want to use bcd? That makes things very complicated and use a lot of resources. Why not just work in binary like every other language? You can easily do arithmatic in vhdl with the IEEE.numeric_std library

A <= b + (c*d);

--- Quote End ---

Very interesting, so you suggest me to convert the input from decimal to entirely Binary, process everything inside and then to convert back to decimal?

Altera_Forum · ‎12-29-2015

Why is the input decimal? Where is the input coming from? Usually numbers are binary. Very very odd to work in decimal.

Altera_Forum · ‎12-30-2015

Tricky now that you mentioned, I'm not that sure that the input should be decimal, but I'm quite sure that in the end I have to reflect somehow results in decimal (for showing purpose).

This is for a project in my course, I'm working with a hypothetical FPGA system that read a black and white picture in which every pixel has a different pixel intensity ranging from 0 to 99, those values are loaded into a list, that I have to cluster according to "Membership value" of every pixel into only 3 groups.

Each pixel should be processed by 1 PE to calculate "membership" as shown equation in my OP. (Then, each PE will communicate each other so they can "update" membership, but that's another story).

This equation includes Adding, subtraction, multiply and dividing. One big issue is that factional values (example: 0.0002) are generated in the process, (it also generates real periodic numbers like 0.1416.........., but I will keep them limited to a certain fixed number).

My work doesn't include how to catch the pixel intensity numbers, I will just take the values from a list, so I think you are right, maybe I can consider the list originally in binary, and in the end I cluster them in groups by binary also, without needing decimals ever.

Do you think doing divisions and multiplications of fractional values is any easy or feasible? My intention is to make this as simple as possible. Thanks so much!

Altera_Forum · ‎12-30-2015

--- Quote Start ---

Tricky now that you mentioned, I'm not that sure that the input should be decimal, but I'm quite sure that in the end I have to reflect somehow results in decimal (for showing purpose).

--- Quote End ---

If this is an image then is better to display as gray level or color shades than printing some number other than image is very limited on resolution.

--- Quote Start ---

Do you think doing divisions and multiplications of fractional values is any easy or feasible? My intention is to make this as simple as possible. Thanks so much!

--- Quote End ---

IQ mathematics are possible in FPGA logic as is possible on Integer processor without FP. ALso FP unit can be built on FPGA too..

Again the equation you posted has a lot of coefficients and indexes, if you wish some help please tell us about numbers and range.

Equation in first can be reworked using a negative exponent, this don't free from checking for negative values nor from division by zero. Exponent is fractional too so care must be applied to evaluate m coefficient can also raise to infinite values.

The board you own has a lot of power on both FPGA and HPS sections, both can handle that equation but for both you must have clean the scope.

From question you are posting I fear you need grasp VHDL before do this job is this right? If so some basic exercises can clean most of your doubt.

VHDL is a complex hardware descriptor language but also a programming and simulation language, it is not simple but not so difficult it just need some hint and patience and you can do everything you think in hardware/software.

Altera_Forum · ‎12-30-2015

http://www.alteraforum.com/forum/attachment.php?attachmentid=11659&stc=1

Sorry, I cant make it size smaller. (weird loading picture sys)

This is the equation more simplified. Exponents are only squares, no need of fractional squares. About ranges, Maximun integer number should be 99 (2 digits) and fractional part .999999 (6 digits), no negative numbers because subtraction is absolute value.

I will consider input and oupt binary so I can work more easily. I have been reading about fractional operations in VHDL, it seems that there are already a library for such, do you know about this?

Altera_Forum · ‎12-30-2015

You dont need to consider binary at all - it doesnt matter what base the number is in - the equation is still the same. All computers, maths programs and everything work in binary - decimal is just a convenient representation for the human brain to understand. Just imagine your inputs are decimal. It will only make a difference for the number of bits required for operation.

If the input range is 0-99, then you need 7 integer bits (0-127). You cannot get exact resolution to make .999999, you need to decide how many fraction 2^n bits you need. 6 fractional bits gives you precision to the nearest 0.015625. 12 bits = 0.00048828125.

But usually you can work backwards - what precision is required at the output? this can give you the precision at the input.

Altera_Forum · ‎12-30-2015

Hey I have found this library for fixed point operations “fixed_float_types.vhdl”, “fixed_generic_pkg.vhdl”, “fixed_generic_pkg-body.vhdl”,

and “fixed_pkg.vhdl”.

I might just declare values directly at binary in integer.decimal format and perform the operations normally with + and *

My question however if this kind of "libraries" are synthesizable and quartus II can work with them without problems. I know for example that the "/" division can be used in quartus II, but it won't synthesize in the FPGA, correct?

Altera_Forum · ‎12-30-2015

The fixed point package is part of the vhdl 2008 language spec. But quartus does not fully support 2008 yet - but David Bishop wrote a '93 compatible version of the fixed_pkg that compiles well with quartus (at least it worked just fine about 6 years ago and I dont see why it would stop working now - I infered rams and multipliers with it just fine). You can download it from here: http://www.vhdl.org/fphdl/

This package doesnt really do anything other than integer arithmetic - it is just holds the numbers in an easier to understand (and modify) format. There is nothing you can do with this package you cannot do with integers (but it takes a little more careful though). The logic created is identical (as fixed point is simply integer arithmatic with an offset).

The "/" function can be used perfectly happily from any library in the FPGA - but it wont be pipelined so it will have a really slow fmax. You'll need to generate an lpm_divide megafunction to get any decent speed out of your design.

PS. in VHDL an integer type has no binary representation directly - it needs to be converted to some binary type (but it will synthesise just fine).

ie.

signal int : integer := 16;

you cannot access individual bits. But you could convert it to an unsigned type:

signal my_uns : unsigned(6 downto 0);

my_uns := to_unsigned(int, my_uns'length);

some_bit <= my_uns(3); -- bit 3 of the my_uns signal

but just so you understand than an integer is just a number, you can also assign it from base 16, base 2 etc:

int := 16#ABCDE#;

int := 2#110100101001#;

Altera_Forum · ‎12-30-2015

--- Quote Start ---

This is the equation more simplified. Exponents are only squares, no need of fractional squares. About ranges, Maximun integer number should be 99 (2 digits) and fractional part .999999 (6 digits), no negative numbers because subtraction is absolute value.

I will consider input and oupt binary so I can work more easily. I have been reading about fractional operations in VHDL, it seems that there are already a library for such, do you know about this?

--- Quote End ---

Sign has no trouble on this equation due exponents are all even number so it can just rewritten with numerator and denominator raised to 4th power and again even so no trouble with sign.

This forever has trouble with denominator of summation, division by zero is not prevented from.

From assumption this is a module i and k parameter are constants across computation so are of no interest on formula.

Calculus is not optimized from VHDL perspective so you have to reduce strength before to sintetyze.

but now how large is image and how uki xi vk vl interact between them? and from where are coming inputs and where are going outputs?

Hint: start do some program in computer language and familiarize with integer fractional number and on scaling fractional to use integers then do a first step of manipulate integers number on FPGA, I think you get a better result than try to build an impossible project from scratch.

Altera_Forum · ‎12-31-2015

I wanted to consult one more thing to you guys, the Soft Core Nios II, would it be more easy to use this CPU for my purpose by programming a custom logic? In that case I'd use C language ,right? so in the end will be more easy?

rromano001 yes, i will start by programming as you said, regarding your questions

but now how large is image :

There is not actual image, I want to use a list of numbers, every number (binary) indicate a pixel intensity, I'm reducing everything to the clustering of a list of numbers, which can be 1000 numbers for example to be grouped in 3 clusters by calculating its membership value and centroids.

This is manage by loading only 1 number to a PE to do the calculation of the equation above. It will produce the first Uki and Vk (centroid), by communicating with the other clones PE it will update Vk and will calculate Uki again, and so on.

So, the difficult part here is to make the PE to perform Subtraction, Adding, Multiplication and Division of fractional numbers.

and how uki xi vk vl interact between them? As is in the equation.

and from where are coming inputs and where are going outputs?

I think I will use the memory of the development board to load a table with the 1000 numbers to be distributed to each PE, and output goes to the neighbor PE to the update, and when it finish to do clustering, it will load the results in memory I guess.