Andrei in the office

lordandrei

Andrei's Universe

One man's journey from infinity to nothingness


Previous Entry Share Next Entry
Andrei in the office
lordandrei

Algorithm note

I saw a post on one of my coding boards talking about a solution for an 'interview-style' problem.

The issue was to find the longest common substring.

What this means is: Given the word "abracadabra" you are looking for the longest repeated set of characters.

The longest repeated sub-string is "abra" which appears at positions 0 and 7 (counting from 0).

The solution the person presented involved putting the strings from smallest to largest into a dictionary (hash table) with counts of iterations.
a-4
b-2
ab-2

And then walking the tree to find the largest result. This solution seems odd to me.

My solution is to make a dynamic array that can be queried for contents. Further, doing it from longest to shortest:

Edit gwenix points out a string can't be repeated until it's no more than half the length. So start from longest string (len /2) rounded down.
End Edit

Solution:
Get longest string: abracadabra
Is string in array? YES? got answer and bail; No --> add to array
*Reduce look string by 1 character
Move to start of target string:
**Get look string: abracadabr
Is string in array? YES? got answer and bail; No --> add to array
Is look string at end of target string? Yes --> loop back to *; No --> move forward one character and loop back to **

This is massively not optimised for time or processing. But it's a first stab.
Also, just pseudo code... loop counters would be my next pass.

Comments? Improvements?

  • 1
Seems to me that a string can't be repeated until the halfway point, so why not reduce by half to start?

You're hired. Well, if was hiring. Good addition.

Re: Oooh, good point.

Hey, you and I did work well together before!

Re: Oooh, good point.

Agreed!

Maybe I don't understand, but ...

Why can't a string be repeated until the halfway point? How about a string that is repeated within the first half? Example: insinuate.

Re: Maybe I don't understand, but ...

This takes into account counting from the largest string to the shortest string.
So, going back to the example of "abracadabra"

We can't look for duplications of "abraca" which is 6 of 10 characters long because we'd need to move to the next position to compare it. However, there are only 4 characters left, so we won't find that string again.

As a result, we start from the largest duplicatable string, which in 10 characters would be 5.

Now theoretically if you had a string like "AAAAAA" and looked for "AAAAA" that does exist twice. So I'd have to throw that back to gwenix; if that means I should go back to my initial solution ;)

Edited at 2009-03-11 03:19 pm (UTC)

Re: Maybe I don't understand, but ...

Oh, good point. I obviously didn't think of overlap.

Re: Maybe I don't understand, but ...

I didn't mean that the repeat would be halfway, but without overlap, the longest string that can repeat can be no more than half the number of characters. "Insinuate" has a total of 9 characters, but the repeating string totals 2 characters -- less than half the number of characters.

But as pointed out later, overlap means that my contribution is shot.

  • 1
?

Log in

No account? Create an account