Here is a description of an algorithm for a "Hash" like approach for finding all Sub-strings in a String:
For example,
Input String ( IS ) : "000123000123000123000"
Search Sub-string ( SS ): "123"
0. If len(SS) > len(IS) there are no Sub-strings and go to 11
1. Allocate an Array for Positions of 1st characters of all Found Sub-strings ( APFS )
( Estimated length is: ( len(IS) / len(SS) ) + some number of guard bytes )
2. Calculate a Hash for the Search Sub-string "123" ( HSS ): HSS = '1'+'2'+'3' = 49+50+51 = 150
3. Calculate a Hash for a Current Sub-string ( HCS ): HCS = '0'+'0'+'0' = 48+48+48 = 144
4. Compare HSS and HCS
5. If equal, save in APFS a position of the 1st character of the found Sub-string in the IS and advance for len(SS) characters
6. End of the Input String ( IS )? If Yes, go to step 9. If Not, continue processing
7. Calculate a Hash for a new Current Sub-string: HCS = HCS - < CS[0] > + < IS[next-char] > = 144-'0'+'1'= 144-48+49 = 145
8. Go to step 4
9. Do all the rest processing with found Sub-strings ( use APFS )
10. Release memory of APFS
11. Done
Notes:
- This is almost a Classic or a Book like approach
- Could be implemented with SSE
- Could be done in parallel for very long input strings
- Since this is your College Project I can't provide the source codes for the algorithm, but
I could help you with a review, tune-ups, etc. You could switch to aprivate posting if you wish.
For more complete information about compiler optimizations, see our Optimization Notice.