1 2 Previous Next 16 Replies Latest reply: Dec 7, 2009 10:23 AM by 807580 RSS

    Implementing an efficent sort for a collection

    807580
      Hello eveyone.

      I have a list of rank strings i am trying to sort (of the form {NAME, RANK, INFO}), I have a class that implements the Comparator interface which i pass into a Collections.sort(myArrayList, myComparator) but its currently taking longer than it should. Ill describe the problem below and if anyone can suggest how i can make it better that would be great! :)

      I have a list of 'ranks' of the form nx, where x is any integer from 1-10 and x is a single alpha charecter, say a or b. The lowest rank is a 10a, moving up to 1a, then 1b with 10b being the strongest rank. I want to be able to sort this list in order of strongest to weakest using my custom comparator. I currently am using the below

      class StringArrayComparator implements Comparator<String[]>
           {
                //for sorting
                Pattern numPattern = Pattern.compile("(\\d+)");
                Matcher matcher;
      
                StringArrayComparator(int index)
                {
                     this.index = index;            
                }
      
                private int compareRanks(String left, String right){
                     if(left.equals(right)){
                          return 0;
                     }else if(left.contains("a") && (right.contains("b") ){
                          return -1;
                     }else if(right.contains("a") && (left.contains("b") ){
                          return 1;
                     }else{
                          //both the same a or b
      
                          matcher = numPattern.matcher(left);
                          matcher.find();
                          String leftStr = matcher.group(1);
                          int leftInt = Integer.parseInt(leftStr); 
      
                          matcher = numPattern.matcher(right);
                          matcher.find();
                          String rightStr = matcher.group(1);
                          int rightInt = Integer.parseInt(rightStr); 
      
                          if(left.contains("a")){
                               if(leftInt < rightInt){
                                    return 1;
                               }else if(leftInt > rightInt){
                                    return -1;
                               }else{
                                    return 0;
                               } 
                          }else{
                               if(leftInt < rightInt){
                                    return -1;
                               }else if(leftInt > rightInt){
                                    return 1;
                               }else{
                                    return 0;
                               }
                          }
      
                     }
                }
      
                @Override
                public int compare(String[] left, String[] right)
                {
                     
                     return compareRanks(left[RANK], right[RANK]);                 
                     
                }
                private final int index;
           }
      I can see that this methods creates thousands of objects (or a lot of GC work anyway) where my simple sorting by name method returns instantaniously. Any advice for improvement would be great!

      Dori
        • 1. Re: Implementing an efficent sort for a collection
          791266
          Why are you using regexp? I don't see a need of it.
          • 2. Re: Implementing an efficent sort for a collection
            791266
            This part does also look inefficient
            }else if(left.contains("a") && (right.contains("b") ){
            return -1;
            }else if(right.contains("a") && (left.contains("b") ){
            What does your data look like?
            • 3. Re: Implementing an efficent sort for a collection
              807580
              Hello,

              1) the data is of the form "10a" or "9b" (described in first post) where the number is between say 0-20 and the letter is a or b, sometimes an extra symbol is appended (say "10a&" which i ignore)

              2) I am using regex to extract the number portion, if the number was between 0-9 i would just take the first char but unfortunatly thats not the case. if there is a better or more efficent way then i would be very grateful to know! :)
              • 4. Re: Implementing an efficent sort for a collection
                791266
                Sir_Dori wrote:
                Hello,

                1) the data is of the form "10a" or "9b" (described in first post) where the number is between say 0-20 and the letter is a or b, sometimes an extra symbol is appended (say "10a&" which i ignore)
                So you know that either a or b will be the last or next to last character in the string? Is this always true?

                .. and you never mentioned that you could have e.g '&' at the end, so what you described in your initial post didn't describe the complete problem. Please give examples of what you data can look like, since we can't give a correct answer otherwise.
                2) I am using regex to extract the number portion, if the number was between 0-9 i would just take the first char but unfortunatly thats not the case. if there is a better or more efficent way then i would be very grateful to know! :)
                See above.
                • 5. Re: Implementing an efficent sort for a collection
                  807580
                  Apolagies for not describing the problem correctly.

                  This is always true that the alpha will always be the last or next to last char.

                  Dori
                  • 6. Re: Implementing an efficent sort for a collection
                    791266
                    Do something like this:
                    private static int compareRanks(String left, String right){
                        boolean leftIsA = isOfType(left, 'a');
                        boolean rightIsA = isOfType(right, 'a');
                        
                        //TODO:
                        //Your checks if left is of same type as b, and return if they aren't.
                        
                        //They were of same type
                        int leftValue = parseInt(left);
                        int rightValue = parseInt(right);
                        
                        //TODO:
                        //Check the values and return
                    }
                    
                    
                    public static boolean isOfType(String value, char type) {
                        int length = value.length() -1;
                        return value.charAt(length) == type || value.charAt(length-1) == type; 
                    }
                    
                    //Will only work if the number is in integer range,
                    public static int parseInt(String s) {
                        int result = 0;
                        boolean positive = true;
                        int startIndex = 0;
                        char firstChar = s.charAt(0);
                        if (firstChar == '-') {
                            positive = false;
                            startIndex++;
                        } else if (firstChar == '+') {
                            startIndex++;
                        }
                        for (int i=startIndex; i<s.length(); i++) {
                            int digit = s.charAt(i) - '0';
                            if (digit < 0 || digit > 9) {
                                break;
                            }
                            result *= 10;
                            result += digit;
                        }
                        return positive ? result : result * -1;
                    }
                    You do of course need to add some error handling, like adding checks for null and strings that are only 1 character long.

                    Kaj
                    • 7. Re: Implementing an efficent sort for a collection
                      807580
                      Excellent, thanks for that, ill put it in and compare how it does. I havent seen this done before
                      for (int i=startIndex; i<s.length(); i++) {
                              int digit = s.charAt(i) - '0';
                              if (digit < 0 || digit > 9) {
                                  break;
                              }
                              result *= 10;
                              result += digit;
                          }
                      but i understand what its doing.

                      Thanks for the code, will let you know if it improves the performance!
                      • 8. Re: Implementing an efficent sort for a collection
                        791266
                        Sir_Dori wrote:
                        Thanks for the code, will let you know if it improves the performance!
                        It should be less expensive.
                        • 9. Re: Implementing an efficent sort for a collection
                          807580
                          Yup, your right, a 100 times better, thanks a lot for the help. Are reg-ex's things to be avoided if speed is an issue?
                          • 10. Re: Implementing an efficent sort for a collection
                            807580
                            if the answer to the above is 'yes' then is there also a way i can improved performance on something like this
                            data.trim().split(" {1,}");
                            which im using to 'tokenize' a string of the form
                            "      word       word     num     symbol     etc"
                            in which the 'tokens' are seperated by x amount of whitespace chars, to give a String[] of the tokenised data.
                            ?

                            Thanks for the help!
                            • 11. Re: Implementing an efficent sort for a collection
                              791266
                              Sir_Dori wrote:
                              Yup, your right, a 100 times better, thanks a lot for the help. Are reg-ex's things to be avoided if speed is an issue?
                              No, regexp can be superfast and efficient for certain things, usually when you want to check or parse complex data. They might however be slow if you want to do something trivial like getting the digits from a string, where you already know what the string looks like.

                              Kaj
                              • 12. Re: Implementing an efficent sort for a collection
                                807580
                                BTW fastest way to sort a collection these days should include concurrency code from oswego.edu
                                • 13. Re: Implementing an efficent sort for a collection
                                  796440
                                  kajbj wrote:
                                  Sir_Dori wrote:
                                  Yup, your right, a 100 times better, thanks a lot for the help. Are reg-ex's things to be avoided if speed is an issue?
                                  No, regexp can be superfast and efficient for certain things, usually when you want to check or parse complex data. They might however be slow if you want to do something trivial like getting the digits from a string, where you already know what the string looks like.

                                  Kaj
                                  I'd bet that this sort can be plenty fast with a regex--especially a precompiled one.
                                  • 14. Re: Implementing an efficent sort for a collection
                                    791266
                                    jverd wrote:
                                    kajbj wrote:
                                    Sir_Dori wrote:
                                    Yup, your right, a 100 times better, thanks a lot for the help. Are reg-ex's things to be avoided if speed is an issue?
                                    No, regexp can be superfast and efficient for certain things, usually when you want to check or parse complex data. They might however be slow if you want to do something trivial like getting the digits from a string, where you already know what the string looks like.

                                    Kaj
                                    I'd bet that this sort can be plenty fast with a regex--especially a precompiled one.
                                    His regexp was precompiled. :)
                                    1 2 Previous Next