- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Who else here has an interest and experience in that area?
Rdwells, any pressing threading questions to like ask? experiences to share?
--D
rdwells wrote:
"I've been working with multithreaded software for about 5 years now (so I'm a relative newcomer around here, it seems!), mainly to do pipelined image processing.:
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I hope you are (or someone similar) is still around.
I'm prepping to create a 2D image processing DLL which use classic Erosion/Dilation algorithms, BUT, on several million pictures in kind of a batch mode. (64 bit with 32 bit compatibility).
I'm somewhat new to the Intel toolset so first, I need advise on which tools I really need. I've already ordered the compiler, IPP and vTune; I suspect I need the MKL and ThreadChecker. Your thoughts?
Also, I could really benifit from any advise, white papers, sample code etc. that deal with things like erosion, dilation and related functions.
Thanks in advance for any advise!
--Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The questions might better be dealt with on the Visual Computing forum.
MKL TBB and IPP are included in the Intel Professional C++ compiler.
VTune is an excellent tool for performance tuning, particularly for batch mode runs taking upwards of several seconds. Even without VTune, the openmp profiling library is useful for OpenMP, /Qparallel, and MKL threaded region analysis (don't know about IPP).
Parallel Studio Inspector has been advocated pending a new product capable of running on Windows 7 as a successor to Thread Checker. You might try evaluations when you are ready. You don't need them unless you thread explicitly, by OpenMP or thread library calls.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rob,
You might wish to read a blog of mine here: http://www.drdobbs.com/go-parallel/ Titled: Two Variations on Parallel Pipelines.
This article illustrates the benefit of using a well crafted parallel_pipeline for image processing. You can email me if you have questions not pertinant to this forum.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings gents,
I was going to create another thread but this sounded similar to what i wanted to do.
I have jpeg images on disk that are around 1.9 MB each. At the moment i am reading them in a secuence one by one that doing some image processing from image to image.
Would anyone please suggest a efficient way to read those images from disk? I have identified that is the slowest part in the application.
I was thinking of using TBB to thread the read but some people mentioned that might not help.
Anything in IPP that would be useful? Or should i do something like image slicing perhaps to improve on this?
Sincerely
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Erosion/Dilation on million of pictures...why not create a large canvas (background) and do a collage with giant pictures containing batches of these pictures (especially if they are of the same dimension, if not, it is easy to make them by adding pixels. The problem then suitability is for the GPU. Some problems are more suitable for the CPU but this problem I believe can be dealt more efficiently in the GPU.
Kind Regards,
Alexander Agathos.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps you are referring to a movie, most probably it is a movie that produces so many pictures, then you have indeed the same dimension. I do not know the application that you have in mind but I think making a collage of the frames in a giant frame in the CPU in Parallel feeding it to the GPU and then returning the output and let the CPU dismantle the giant frame in Parallel again you can achieve real time results with some overhead of making the collage and dismantling it. This works because the SE operates in a local area and the operation needs the original pixel elements so this makes the whole process completely independant. So the trheads in the GPU can blissfully work on each pixel without caring what the other threads are doing.
All the best in your project.
Kind Regards,
Alexander Agathos.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe you want to take a look at my multicore framework "Fiber Pool" (http://www.thinkmeta.de/en/fiberpool_overview.html).
It has a special File I/O Scheduler which uses a technique called "Parallel File Processing" for maximum CPU performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Who else here has an interest and experience in that area?
Rdwells, any pressing threading questions to like ask? experiences to share?
--D
rdwells wrote:
"I've been working with multithreaded software for about 5 years now (so I'm a relative newcomer around here, it seems!), mainly to do pipelined image processing.:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great framework. I plan to fully implement also this collage idea and present it. This Framework can come veyy handy.
Cheers,
Alexander.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Best,
Alexander.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Threading and concurrency are issues are in the next phase of this projects development. I'm now focusing on algorithms and the like.
I will read up on Rdwells as I'm absorbing all the information I can get.
All the best!
Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Appreciate the tips on the new evals, I'll certainly be trying those out!
--Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I didn't know Dr.Dobbs has this info just for Parallel. Many thanks for the info!
--Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my case, I must perform mathematical morphology (dilation/erosions etc.) on several thousand images per minute. (read image from disk, process it, save to disk)
First I aquired a good machine with dual 10Krpm Sata drives and 16GB of ram, quad core processor.
Now, I'm developing the best image algorithms I can using IPP/MKL and then once I'm confident I have the more better algotithms, build a threading/concurrency model to squeek out as much cpu utilization as possible.
Tim and Alex have some good thoughts and I'm sure I'll implement some variation of these.
IPP has some dilation and MM type functions and I'm just now looking at integrating these.
I plan to share my experiences of this project. I'm going to jump into OpenMP/CnC and do some experiments there also. TBB may be all I need, but only experimentation will tell.
All the best!
--Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When the image files are not interrelated (can be processed independently) then it is recommended to implement coarse grained parallelization bymoving theparallelizationto the outermost layer (file by file). For these situations a parallel pipeline works exceptionally well.
If this application is a production system that performs this task many times I suggest you determine if the bottleneck is due to I/O, processing, or both. I/O can be resolved by using more disks (6)and RAID10.
If the performance issue is determined to be processing (I assume it is a blend of I/O and processing) then maybe your motherboard could accomidate a higher performing processor.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My application must analyse still pictures from radar images. The "regular" cpp code I have is good, but not good enough. We need a throughput similar to a video processing app, which I think is attainable. I can see the assembly instructions being generated by the compiler are compact, not too much overhead at all.
So, bottom line is that this is going to require some experimentation on my part. Got the right tools, just need to figure out how best to use them.
One thought I had, which I think would be a good start, would be to actually USE the processing capability of my 4 core CPU. Like many commercial apps, it is using one core, at about 20-50% utilization; not so good.
So bottom line, I think using IPP and parallel (on Native code) is the best way to head. (?)
This is going to be very interesting.
Thanks again! --Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
are frames n and n+1 being processed seperately or compared with each other.
When compared seperately, then the parallelization process can be coarse grained (next available core processes next waiting frame). Each frame processed (in parallel), is processed essentially using your serial code. Your serial code is mostely untouched, but some changes may be required to move static (global) state variables into dynamic (frame orthread) seperate areas. Excepting for start and end of applicaiton the coding changes for this type of applicaiton is quite easy to do.
When adjacent immages are compared together - e.g. compressing or computing trajectory of object in view, then you have to determine how best to seperate the functional tasks. One way might be
thread 0 working on differences between frame n and n+1
thread 1 working od differences between frame n+1 and n+2
...
As long as your are not writing annotations into the images, this too may require relatively little code change.
The above two instances would be considered coarse grained parallization.
Now then, when you spend a long time on each frame and frames are interrelated, you may need to focus on parallization of the code that processes each frame. This will require careful analysis of your program in order to make the frame processing multi-thread safe and inorder to be efficient.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page