- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm trying to accelerate a 2D visual tracker using ARBB. My application needs to evaluate multiple regions of interest (ROIs) from each frame. These ROIs are independent, so they can be evaluate in parallel.
To do that, I develop a simple program using ARBB technology. I create a map function which extracts each ROI using the ARBB section function. Then, a sum reduction is applied to compute its weight. The code:
[cpp]void evaluateROIMap (const densewhere ROIX and ROIY represent the upper-left corner of each ROI, and ROIWidth and ROIHeight represent the ROI size.
Im running this code in a Intel Core I7-2620 CPU 2.70GHz (Windows7-64 bits), but ARBB version is spending more time than non-optimized serial version. Launch configuration consist in:
- Number of ROIs: 4096
- ROIWidth: 32
- ROIHeight: 32
- Frame size: 512x512
Serial
version elapsed time after 50 executions: 0.0048 seconds
ARBB
version elapsed time after 50 executions: 3.32 seconds
Am I doing something wrong?. Any help will be appreciated.
Cheers.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, the large variance between serial = 0.0048 and parallel == 3.32 seconds is about 691:1.
Running VTune may indicate where your program is hiding.
I suspect your code is
Instantiating thead teams 4096/8 times (each ROI)
As opposed to
Instantiating a thread team once, each thread examining 4096/8 (512) seperateROI's
IOW excessive thread team build/takedown
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page