9 Replies Latest reply: Jun 16, 2010 3:16 PM by 843853 RSS

    String matching algorithm

    843853
      Hi,
      I was wondering if there was a string matching algorithm which I could use to do the following. I have a ArrayList of strings that I would like to be parsed. I would like to pass a string to the algorithm which would return either the single string that matches exactly the value passed in OR a list of strings that best matches it. I feel like I am asking a lot here...

      Hope you can help,
      Paul.
        • 1. Re: String matching algorithm
          843853
          You could use the [Levenshtein algorithm|http://en.wikipedia.org/wiki/Levenshtein_distance] to rank the words. Easy to implement.
          • 2. Re: String matching algorithm
            843853
            Hi,
            Thanks for that. I did some searching on the net and found this which people might find interesting and useful. It also has the java implementation for it too.

            http://en.wikipedia.org/wiki/Levenshtein_distance

            Initially the alogrithm seemed perfect : ) Then I started to put in strange test string in(which are real user inputs) and it started to fail. My string can contain spaces as well.

            Maybe if I split the strings up and recursively go through the combination, to end up with best fit would be idea...

            Cheers,
            Paul
            • 3. Re: String matching algorithm
              DrClap
              paul_b wrote:
              Initially the alogrithm seemed perfect : ) Then I started to put in strange test string in(which are real user inputs) and it started to fail.
              Fail? No, that just means that your criteria for "close" are different than Levenshtein's criteria.
              • 4. Re: String matching algorithm
                843853
                paul_b wrote:
                Hi,
                Thanks for that. I did some searching on the net and found this which people might find interesting and useful. It also has the java implementation for it too.

                http://en.wikipedia.org/wiki/Levenshtein_distance
                Err ... that is the link I gave you!

                >
                Initially the alogrithm seemed perfect : ) Then I started to put in strange test string in(which are real user inputs) and it started to fail.
                Then you need to provide a metric.
                My string can contain spaces as well.
                So what?

                >
                Maybe if I split the strings up and recursively go through the combination, to end up with best fit would be idea...

                Cheers,
                Paul
                • 5. Re: String matching algorithm
                  843853
                  Yes. You are correct. This should have been the link:

                  http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
                  • 6. Re: String matching algorithm
                    843853
                    Hi,
                    Just coming back to this and wondering if there are any better algorithms that would be more suited to my needs?
                    • 7. Re: String matching algorithm
                      DrClap
                      paul_b wrote:
                      Hi,
                      Just coming back to this and wondering if there are any better algorithms that would be more suited to my needs?
                      That depends on what you mean by "are". If you are asking whether somebody has designed any algorithm which suits your undocumented needs in a better way than the Levenshtein algorithm, then probably not. On the other hand I don't know what your needs are so I could be wrong.

                      Of course you could always design your own algorithm, after which it would automatically be a better algorithm which would be more suited to your needs. However that assumes you have a precise specification of what your needs are. I suspect you just have a gut feeling about what's a good string match and what's a bad string match. And if that's the case you would have to try to document exactly what's a good match before you can start asking about which algorithm is better for you.

                      Looking at existing algorithms (e.g. Levenshtein) is a perfectly good way of clarifying your requirements, but you do have to sit down and say "No, Algorithm X says that's a good match but it isn't a good match BECAUSE..."
                      • 8. This Thread is now moved
                        791266
                        Note: This thread was originally posted in the [Collections: Lists, Sets, and Maps|http://forums.sun.com/forum.jspa?forumID=24] forum, but moved to this forum for closer topic alignment.
                        • 9. Re: This Thread is now moved
                          843853
                          could you not take a look at the boyer moore algorithm

                          http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

                          and adapt it so that when you create the tables you also asign a counter, meaning for every potential comparison, if you dont get a full match you can define the comparison a weighting, which you can then use for some sort of use for close but not perfect matches

                          just throwing an idea out there, it may be useless!