This discussion is archived
5 Replies Latest reply: Mar 30, 2012 4:24 PM by 796440 RSS

Generating all substrings of a string

892826 Newbie
Currently Being Moderated
To generate all substrings of a string the simplest thing which i came to my mind is traversing the entire string using two for loops which can generate all the substrings. Following is the code which i am using-
          ArrayList<String> list = new ArrayList<String>();
          String str = "abcd";
          int len = str.length();
          for (int i = 0; i < len; i++) {
               for (int j = 0; j < len - i; j++) {
                    String ss = str.substring(i, i + j + 1);
                    if (!list.contains(ss))
                         list.add(ss);
               }
          }
As far as, this seems ok for small strings but what if the input string is too large. Above approach takes o(n^2) time to complete. Google suggests me to use suffix tree in such cases. A suffix tree plays with all the suffix of a string (not substrings). So how can a suffix tree generate all substrings?
I just want to know any approach which can generate substrings in less than o(n^2) time.
  • 1. Re: Generating all substrings of a string
    892826 Newbie
    Currently Being Moderated
    no one? am i going wrong?
  • 2. Re: Generating all substrings of a string
    DrClap Expert
    Currently Being Moderated
    Well, if your requirement is to actually generate them all and add them to a list, then since the number of substrings is O(n^2) then the list.add(substring) method must be called O(n^2) times.

    But if you're allowed to reinterpret the requirement in such a way that "generating all substrings" doesn't involve generating things, or involves generating things other than substrings, then sure, you might be able to produce a faster algorithm, I suppose.
  • 3. Re: Generating all substrings of a string
    892826 Newbie
    Currently Being Moderated
    i need to know a substring which is in lexicographically i^th^ position. that is why, i am firstly adding all unique substring into arraylist, then performing lexicographical sorting but if i could get that substring without processsing two for loops then it will definitely saves my time and improves the program too.
  • 4. Re: Generating all substrings of a string
    796440 Guru
    Currently Being Moderated
    Ravi wrote:
    i need to know a substring which is in lexicographically i^th^ position. that is why, i am firstly adding all unique substring into arraylist, then performing lexicographical sorting but if i could get that substring without processsing two for loops then it will definitely saves my time and improves the program too.
    I'm not a math guy, so I may be way off here, but if nothing else, maybe this will get you thinking about some possible approaches:

    The lexicographically first substring will consist of exactly one character--the lexicographically first character in the string.

    The next one will be that character followed by one more character--the next character after that one, if there are any, otherwise it will be a single character--the lexi'ly second one in the string.

    and so on...

    So, start by building a list of indices in lexi' order of the chars. For instance if you have the string "blue", your list would be [0, 3, 1, 2], since the 0th character ('b') is the lexi'ly first, and so on.

    You keep building 1-char-longer substrings from the lexi'ly first char, until you either reach the i'th one or have hit the end of the string. Then move on to the next char, and so on
    b
    bl
    blu
    blue
    (next char is at index 3)
    e
    (no more chars, next is at 1)
    l
    lu
    I don't know if that's any less than O(n^2), and I can't be bothered to figure it out.

    As a further refinement, rather than actually generating each substring, you can compute them. Given [0, 3, 1, 2], we know there are str.len substrings starting at position 0, str.len - 3 starting at position 1 (so str.len + str.len - 3 so far), and so on. So if we're looking for i=6, then 4-0=1 < 6, so add 4-3=1 for a total of 5<6 so add 4-1=3 for a total of 8>6, so we know now exactly where to find lexi'ly 6th substring

    This assumes each char appears only once. If a char appears more than once, the logic gets somewhat more complicated, but not terribly so, I don't think.

    Personally, you probably should go investigate suffix trees a bit more. I don't know what they are or how/if they fit here, but it might be worth looking into.

    Edited by: jverd on Mar 30, 2012 6:45 PM
  • 5. Re: Generating all substrings of a string
    796440 Guru
    Currently Being Moderated
    Ravi wrote:
    i need to know a substring which is in lexicographically i^th^ position. that is why,
    In the future, please ask about your actual problem, and not about your attempted solution to it. If you have an idea for a solution in mind, it's fine to mention that too, but your path to your final goal will be more productive if you start by telling about the problem itself.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points