Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

String Functions and efficiency in SSE4.2


I'm a student and doing my College Project about pattern searching. Hence String Functions are useful to achieve the goal.

The point is that Find function just returns the first match position in a text, e.g:

Text: Surfing n Turfing
pos: 0123456789abcdef0
pattern to search: urfi

Find function would return position '1' as index, but actually there is a match in position 'b' too. I know that the most logical solution would be search again in the text from position '2', but maybe it would not be the most efficient solution.

As far as I know (*1), processors with SSE4.2 technology have 16-byte registers and they can work in parallel in the pattern matching process, as shown below:

pattern to search: urfi
<-- 16 bytes -->
Text: Surfing n Turfin
result: 0100000000010000

That is, Find function returns just first position but actually the processor knows all matched positions in 16-byte length.

So, I think it could be more efficient if there would be any function that would return all matched positions.

My questions are the next:
  1. Does anyone know if there is any function that works on that way?
  2. If not (and maybe is outside the scope of this forum), how could I create my own function to achieve that goal? Is there any Intel tool to create custom functions? Would either FASM or NASM be a good option?
0 Kudos
1 Reply
Valued Contributor II

Here is a description of an algorithm for a "Hash" like approach for finding all Sub-strings in a String:

For example,

Input String ( IS ) : "000123000123000123000"
Search Sub-string ( SS ): "123"

0. If len(SS) > len(IS) there are no Sub-strings and go to 11

1. Allocate an Array for Positions of 1st characters of all Found Sub-strings ( APFS )
( Estimated length is: ( len(IS) / len(SS) ) + some number of guard bytes )

2. Calculate a Hash for the Search Sub-string "123" ( HSS ): HSS = '1'+'2'+'3' = 49+50+51 = 150

3. Calculate a Hash for a Current Sub-string ( HCS ): HCS = '0'+'0'+'0' = 48+48+48 = 144

4. Compare HSS and HCS

5. If equal, save in APFS a position of the 1st character of the found Sub-string in the IS and advance for len(SS) characters

6. End of the Input String ( IS )? If Yes, go to step 9. If Not, continue processing

7. Calculate a Hash for a new Current Sub-string: HCS = HCS - < CS[0] > + < IS[next-char] > = 144-'0'+'1'= 144-48+49 = 145

8. Go to step 4

9. Do all the rest processing with found Sub-strings ( use APFS )

10. Release memory of APFS

11. Done


- This is almost a Classic or a Book like approach
- Could be implemented with SSE
- Could be done in parallel for very long input strings
- Since this is your College Project I can't provide the source codes for the algorithm, but
I could help you with a review, tune-ups, etc. You could switch to aprivate posting if you wish.

0 Kudos