Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- FPGAs and Programmable Solutions
- Intel® Quartus® Prime Software
- How to accelerate the memory access to DDR in order to doing the affine transformation

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

SMats22

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-13-2020
01:52 PM

178 Views

How to accelerate the memory access to DDR in order to doing
the affine transformation

Hello,

I am trying to use OpenCL kernel code to output the input image with affine transformation.

First approach is to transform the axis value of output image with inverse affine transformation sequentially then access the input image in the DDR memory randomly.

On this time, “aocx” compiler creates the private cash implicitly in the FPGA automatically, then when accessing the input image (DDR) randomly, this private cash seems to be used.

This implementation is my fastest method so far after trialing the several approaches, however I would like to improve it much more faster.

New approach is to split the output image into meshes and transform them with the inverse affine. Then save them corresponding to input image into FPGA temporally. (refer to the attached image)

The new approach is as follows:

1. Mesh the output image with rectangle area.

2. Transform the center point of one rectangle area with inverse affine.

3. Determine the which of rectangle area of input image to be corresponding using the transformed center point.

4. Save the image of rectangle area in input image into temporary input rectangle area.

5. Reserve the temporary rectangle area for output image.

6. Transform the each pixel value of temporary output rectangle area with inverse affine, and then read the corresponding pixel value from the temporary input rectangle area.

7. Continue the same operation to the all of pixels in temporary output rectangle area.

8. Write the image of temporary output rectangle area into the corresponding output image.

9. Continue the same operation to the all of rectangle areas in output image.

This method would be expected to accelerate by using the burst read from the temporary rectangle area rather than the input image (DDR Memory). Unfortunately it makes slower than using private cash method.

Is there any idea to accelerate for the affine transformation?

Thanks,

Link Copied

0 Replies

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.