9 Replies Latest reply on Jun 16, 2010 8:16 PM by 843853

# String matching algorithm

Hi,
I was wondering if there was a string matching algorithm which I could use to do the following. I have a ArrayList of strings that I would like to be parsed. I would like to pass a string to the algorithm which would return either the single string that matches exactly the value passed in OR a list of strings that best matches it. I feel like I am asking a lot here...

Hope you can help,
Paul.
• ###### 1. Re: String matching algorithm
You could use the [Levenshtein algorithm|http://en.wikipedia.org/wiki/Levenshtein_distance] to rank the words. Easy to implement.
• ###### 2. Re: String matching algorithm
Hi,
Thanks for that. I did some searching on the net and found this which people might find interesting and useful. It also has the java implementation for it too.

http://en.wikipedia.org/wiki/Levenshtein_distance

Initially the alogrithm seemed perfect : ) Then I started to put in strange test string in(which are real user inputs) and it started to fail. My string can contain spaces as well.

Maybe if I split the strings up and recursively go through the combination, to end up with best fit would be idea...

Cheers,
Paul
• ###### 3. Re: String matching algorithm
paul_b wrote:
Initially the alogrithm seemed perfect : ) Then I started to put in strange test string in(which are real user inputs) and it started to fail.
Fail? No, that just means that your criteria for "close" are different than Levenshtein's criteria.
• ###### 4. Re: String matching algorithm
paul_b wrote:
Hi,
Thanks for that. I did some searching on the net and found this which people might find interesting and useful. It also has the java implementation for it too.

http://en.wikipedia.org/wiki/Levenshtein_distance
Err ... that is the link I gave you!

>
Initially the alogrithm seemed perfect : ) Then I started to put in strange test string in(which are real user inputs) and it started to fail.
Then you need to provide a metric.
My string can contain spaces as well.
So what?

>
Maybe if I split the strings up and recursively go through the combination, to end up with best fit would be idea...

Cheers,
Paul
• ###### 5. Re: String matching algorithm
Yes. You are correct. This should have been the link:

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
• ###### 6. Re: String matching algorithm
Hi,
Just coming back to this and wondering if there are any better algorithms that would be more suited to my needs?
• ###### 7. Re: String matching algorithm
paul_b wrote:
Hi,
Just coming back to this and wondering if there are any better algorithms that would be more suited to my needs?
That depends on what you mean by "are". If you are asking whether somebody has designed any algorithm which suits your undocumented needs in a better way than the Levenshtein algorithm, then probably not. On the other hand I don't know what your needs are so I could be wrong.

Of course you could always design your own algorithm, after which it would automatically be a better algorithm which would be more suited to your needs. However that assumes you have a precise specification of what your needs are. I suspect you just have a gut feeling about what's a good string match and what's a bad string match. And if that's the case you would have to try to document exactly what's a good match before you can start asking about which algorithm is better for you.

Looking at existing algorithms (e.g. Levenshtein) is a perfectly good way of clarifying your requirements, but you do have to sit down and say "No, Algorithm X says that's a good match but it isn't a good match BECAUSE..."
• ###### 8. This Thread is now moved
Note: This thread was originally posted in the [Collections: Lists, Sets, and Maps|http://forums.sun.com/forum.jspa?forumID=24] forum, but moved to this forum for closer topic alignment.
• ###### 9. Re: This Thread is now moved
could you not take a look at the boyer moore algorithm

http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

and adapt it so that when you create the tables you also asign a counter, meaning for every potential comparison, if you dont get a full match you can define the comparison a weighting, which you can then use for some sort of use for close but not perfect matches

just throwing an idea out there, it may be useless!