Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16624 Discussions

How to implement a System like this easily using Altera Tools of my DE1-SOC?

Altera_Forum
Honored Contributor II
1,684 Views

Hi, 

 

I want to implement a system as follows: Four cores (Processing Elements) to read a black and white image of 240x240 pixels = 57.600 pixels in total (each pixel with a intensity integer value of 0 to 255). 

 

Every core will take care of 14.400 of the pixels. Every core will read 1 pixel at a time, calculate an equation and update a register, then read the next pixel and do the same again until finishing its 14.400 pixels. 

 

Finally, each of the four cores will sum their results with the results of the other 3, and perform a different equation, spitting out a value "V", this value "V" will be used as input along with the (static and constant) pixel value to do the whole process again, until reaching "V" which comply a condition (less than a predefined value). 

 

I want to say, I have more or less an Idea for how to implement everything in Verilog (State Machines, ROM, task instantiation). However, I'd like to take advantage of the resources of my board DE1-SOC for this proyect, specially the software tools (Qsys?, Nios-II?). 

 

For example, I would like to NOT relay on my code to read the Picture itself nor transform it in integers, so probably I will have to use ROM from the board, my intention is to use the board and altera tools as much as possible, for easiness and time economy. I could code the Cores and implemented into Nios-II maybe? 

 

There is not speed or energy specifications, if Altera tools can make them automatically. 

 

I'd like to know your advice and general guidance. 

 

Thank you
0 Kudos
10 Replies
Altera_Forum
Honored Contributor II
539 Views

Do you want to write software or HDL? 

 

If you want to write HDL, then create a new Qsys component IP block and instantiate it four times. Give it however many Avalon-ST ports necessary to make your connections. Use e.g. Modular SGDMA to read the frame data out of RAM and supply it to the components, and have a NIOS with software to control the flow and manage the DMA. 

 

If you want to write software, something as simple as a large shared on-chip RAM with (4) NIOS connected directly to it would work. Qsys would automatically take care of arbitration/contention, but since you don't have a speed requirement this is fine.
0 Kudos
Altera_Forum
Honored Contributor II
539 Views

 

--- Quote Start ---  

Do you want to write software or HDL? 

 

If you want to write HDL, then create a new Qsys component IP block and instantiate it four times. Give it however many Avalon-ST ports necessary to make your connections. Use e.g. Modular SGDMA to read the frame data out of RAM and supply it to the components, and have a NIOS with software to control the flow and manage the DMA. 

 

If you want to write software, something as simple as a large shared on-chip RAM with (4) NIOS connected directly to it would work. Qsys would automatically take care of arbitration/contention, but since you don't have a speed requirement this is fine. 

--- Quote End ---  

 

 

Thank you. By writing software or HDL, do you mean whether to use C or Verilog? I'd like to use HDL Verilog as possible. So I guess the first option is the correct. 

 

Excuse my noobish, when you say to create the QSYS component IP, would this depend on an specific hardware, depending on the NIOS-II? Because I'd like to run the cores by FPGA itself not by the processor NIOS, board resources would work as "auxiliary" (read and supply image from ram, etc).
0 Kudos
Altera_Forum
Honored Contributor II
539 Views

You sound like you are still very confused. 

If you are writing your own HDL, then you probably wont need a NIOS. To generate a NIOS you need to generate in QSYS (QSYS is the Quartus System Builder application). Without QSYS - you just write your HDL and away you go. 

 

From your description, it sounds like having 4 processing elements would be a waste of resources. With such a small image, you could easily read the whole image in via a single stream and process it with a signle logic block. But you havent really given us the algorithm you are using.  

 

but there are many unanswered questions that need to be answered before you continue: 

Why would you only process 1 pixel at a time?  

why cant you just stream in the picture?  

are you processing a single image or video?  

where does the image come from and how does it get loaded?  

what external interfaces are you going to use?  

what clock speed are you going to use to get the data throughput you need? 

 

Without specifying data rates and interfaces, it's pointless trying to design the system as they can affect the entire design.
0 Kudos
Altera_Forum
Honored Contributor II
539 Views

Hi tricky, reason because I want to use 4 cores, is to later experiment with different Pixel/PE ratios. I am assuming if I manage to divide the picture over 4 cores doing the same equation, system will finish much before, although is used more hardware. But is true that with one PE would be enough. 

 

1. The reason why I want to process 1 pixel per core at a time (per clck?), is because this is how I have visualized the algorithm, I need to calculate the membership value of every pixel to update a centroid equation.  

 

2. why cant you just stream in the picture? I'm not so sure what is stream in the picture. I still have to figure this out. But the easiest way the better! 

 

3.are you processing a single image or video? It is just an image, static, constant, black and white 240x240. 

 

4. where does the image come from and how does it get loaded? Image can come from any source, can be a Computer connected to the board or an USB. I was guessing there is a way to load the image from my pc to the RAM of the DE1-SOC, filling the memories with the binary representation of the intensity of each pixel. 

 

5. what external interfaces are you going to use? external interface can be USB port of development board, the output should be also a memory filled with the final calculated membership values and the centroids, send this info back to the Pc to construct a simple graphic. 

 

6. what clock speed are you going to use to get the data throughput you need? I'm not sure, I guess the default or the most common, but it should be some something fast enough to achieve it in a couple of seconds at most.
0 Kudos
Altera_Forum
Honored Contributor II
539 Views

As for the algorithm, I have this. 

 

http://postimg.org/image/p7kitagrj/ 

 

 

http://s11.postimg.org/p7kitagrj/algor.jpg (http://postimg.org/image/p7kitagrj/

 

I'm just looking the easiest way to do it.
0 Kudos
Altera_Forum
Honored Contributor II
539 Views

I would suggest staying away from usb for now. It is a very complicated protocol and I don't think there are any free cores out there. I think you can get ethernet for free but rs232 it's the easiest to implement yourself. From this you can easily load an image into the on board ram. 

 

Streaming means data is streamed into the design. This would usually mean delivering the pixels one at a time and your design keeps track of which pixel is being processed. This means you can build a pipeline which is much more efficient then read/modify/write operations. 

 

As for clock speed, this is up to you to decide. There is no "standard". Clock spotted is dictated by your interfaces or data rate requirements.
0 Kudos
Altera_Forum
Honored Contributor II
539 Views

It sounds like you are constructing a prototype and maybe this is your first time doing any FPGA development.  

 

In that case, I would suggest just doing as much as you can on the ARM Linux side (transferring the image, converting it if necessary, etc) and then write the image to the FPGA side. On the FPGA side, just implement a single Qsys IP component with an Avalon-MM Slave interface, and then inside there write your Verilog to receive the pixels one at a time and produce the results one at a time. 

 

There are several appnotes and training videos you could/should run through to get an idea of what is involved. Here is one example: http://wl.altera.com/education/training/courses/osoc1000 

 

After you have the algorithm running OK, you'll likely find that software moving the image data is the bottleneck. Adding a DMA would be a small incremental change on top of that.
0 Kudos
Altera_Forum
Honored Contributor II
539 Views

You can just pay for a solution, there are a lot of mini company out there are willing to do it for you.

0 Kudos
Altera_Forum
Honored Contributor II
539 Views

 

--- Quote Start ---  

You can just pay for a solution, there are a lot of mini company out there are willing to do it for you. 

--- Quote End ---  

 

 

Not useful.
0 Kudos
Altera_Forum
Honored Contributor II
539 Views

 

--- Quote Start ---  

It sounds like you are constructing a prototype and maybe this is your first time doing any FPGA development.  

 

In that case, I would suggest just doing as much as you can on the ARM Linux side (transferring the image, converting it if necessary, etc) and then write the image to the FPGA side. On the FPGA side, just implement a single Qsys IP component with an Avalon-MM Slave interface, and then inside there write your Verilog to receive the pixels one at a time and produce the results one at a time. 

 

There are several appnotes and training videos you could/should run through to get an idea of what is involved. Here is one example: http://wl.altera.com/education/training/courses/osoc1000 

 

After you have the algorithm running OK, you'll likely find that software moving the image data is the bottleneck. Adding a DMA would be a small incremental change on top of that. 

--- Quote End ---  

 

 

Thanks for this, certainly I have only 3 months studying FPGA, because I'm learning totally by myself and from internet, my concepts are a bit confused, beside Altera has not a proper book to explain this, a ton of disperse guides, that's a shame. I found books like this hidden a library http://www.amazon.com/fpga-based-embedded-system-design-chinese/dp/7560634516/ref=sr_1_1?ie=utf8&qid=1454822983&sr=8-1&keywords=fpga+soc+altera which are very comprehensive for a beginner, saddly it's fully wrote in Chinese, I'd kill for a English version.
0 Kudos
Reply