fast removal of duplicate IppiPoints

Kadir_Kirtac · ‎04-29-2011

Hi, I want to remove the duplicate IppiPoints from an array as fast as possible.

[bash]IppiPoint* P = new IppiPoint;[/bash]

The array size N is 1520 but approximately first 400 index of it is modified at each iteration. The total number of iterations is image width x image height.. Thus I need a fast solution..

Namely, any duplicate of point1 which satisfies

[bash](point1.x = point2.x) && (point1.y = point2.y)[/bash]

should be removed..

The trivial solution is to check for each point if it matches with the next one..but the running time complexity would then be O(N^2)..

Any idea is welcome..

Hollow_V_ · ‎03-11-2013

Hello,

Does anyone has a solution for this problem?

In general the question would be how to remove duplicates in an array, in a multi-threaded manner.

Thank you very much.

SergeyKostrov · ‎03-11-2013

>>...The trivial solution is to check for each point if it matches with the next one..but the running time complexity would >>then be O(N^2).. Any idea is welcome... In the solution you're considering a sorting step ( by x and y ) needs to be added. Also, if you really want to learn how to do it properly please take a look at how Relational Database Management Systems ( DBMSs ) do a GROUP BY of a data set. Note: Even if it is Not related to IPP please take a look how simple it could be with an SQL query: SELECT count(*), ptX, ptY FROM SomeTable GROUP BY ptX, ptY HAVING count(*)=1 ORDER BY ptX, ptY ASC and it will create a data set with All Duplicates removed Ordered by X and Y values in Ascending order.

SergeyKostrov · ‎03-11-2013

Do you use IPP library on a Linux or on Windows platform?

SergeyKostrov · ‎03-11-2013

>>...In general the question would be how to remove duplicates in an array... If the array is simple, that is a set of un-ordered values ( not pairs of un-ordered values like x and y ) then an algorithm based on a 1st part of the Pigeonhole sorting algorithm will remove all duplicates. A 2nd part ( reconstruction ) of the Pigeonhole sorting algorithm doesn't need to be used and it needs to be replaced with a compression. However, that algorithm is very sensitive to a range of values and ideally the range needs to be from 0 to 65536. Another limitation is that the algorithm can not be used with floating-point data types. So, take a look at the Pigeonhole sorting algorithm.

SergeyKostrov · ‎03-16-2013

>>...The trivial solution is to check for each point if it matches with the next one..but the running time complexity would >>then be O(N^2).. Any idea is welcome... STL's / Boost's map or MFC's CMapXxx classes could be used to detect and delete all duplicates in a data set. Here is an example with MFC's class CMapStringToString ( however, it is Not the fastest one because it is a string based ): ... CMapStringToString mS2S; ... if( mS2S.Lookup( csKey, csKeyValue ) == FALSE ) // Not in a Result Set { ... mS2S.SetAt( csKey, csKeyValue ); ... } else // Duplicate detected / Already in a Result Set { ... }