6 Replies Latest reply: Nov 28, 2012 6:13 PM by jschellSomeoneStoleMyAlias RSS

    Java Regex Pipe Delimited ?

    801904
      Hello

      I am trying to split the string which is pipe delimited. I am new to Regex and new to Java.

      My Java/Regex code line to split is:

      listColumns = aLine.split("\\|"); // my code has 2 backslash-escapes chars plus 1 pipe char but this forum does not allow me to put pipes or escapes correctly and plain text help is of NO HELP 8^(

      My input string has 3 leading and 4 trailing pipe characters

      My Output from split: (3 leading emptry strings work but 4 trailing pipe delimiters dont work)

      SplitStrings2:[]
      SplitStrings2:[]
      SplitStrings2:[]
      SplitStrings2:[col1]
      SplitStrings2:[col3]
      SplitStrings2:[col4]

      I do get 3 empty strings for all 3 leading pipes but no empty strings for the any traling 4 pipe characters.

      What do I need to change the code such that all repeated pipes resulted in same number of empty strings returned by split method?


      thanks
      YuriB

      Edited by: yurib on Nov 28, 2012 12:25 PM

      Edited by: yurib on Nov 28, 2012 12:25 PM

      Edited by: yurib on Nov 28, 2012 12:29 PM
        • 1. Re: Java Regex Pipe Delimited ?
          jschellSomeoneStoleMyAlias
          1. The pipe is a meta-character so escape it.
          2. Split rolls things up for you unless you tell it otherwise.
          String s = "|||A|B|C||||";
          String[] array = s.split("[|]", 10);
          for(int i=0; i < array.length; i++)
               System.out.println("" + i + ": " + array);
          
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
          • 2. Re: Java Regex Pipe Delimited ?
            801904
            thanks, I was missing the limit parameter, I incorrectly assumed that the default limit was reasoable i.e. will process the whole line without limit or some large number.
            I was wrong.

            So thanks for the correct and quick answer! My program works now.


            Given I don't know how many pipes/delimiters will be present on any one line/record, I used -1 as a limit.

            I could have used some arbitrarily large number, like 10000, as limit.

            Which is a better option, assuming performance is important?

            The files I will be processing will be many GB in size and will have nundreds of millions of records/rows.

            Or perhaps String.split() method is too slow for such large files and something else needs to be used instead?

            Many thanks again
            Yuri
            • 3. Re: Java Regex Pipe Delimited ?
              Kayaman
              yurib wrote:
              thanks, I was missing the limit parameter, I incorrectly assumed that the default limit was reasoable i.e. will process the whole line without limit or some large number.
              I was wrong.
              No you weren't, without the limit parameter the whole line is processed, but trailing empty elements are discarded (which the method with the limit parameter doesn't do).
              Given I don't know how many pipes/delimiters will be present on any one line/record, I used -1 as a limit.
              I could have used some arbitrarily large number, like 10000, as limit.
              Which is a better option, assuming performance is important?
              That's not a performance issue.
              The files I will be processing will be many GB in size and will have nundreds of millions of records/rows.

              Or perhaps String.split() method is too slow for such large files and something else needs to be used instead?
              Well it depends, are you trying to read the whole file in the memory and then split it?
              If split() doesn't perform well enough, you can always try parsing it by hand, it shouldn't be too hard and would perform better than using regexes.
              • 4. Re: Java Regex Pipe Delimited ?
                801904
                thanks,

                re this: "Well it depends, are you trying to read the whole file in the memory and then split it?"

                I am planning to use BufferedReader object and use readLine() mehod and then split each input line into multiple string objects and pass these objects on to the database (using JDBC) for further processing once the delimited line is split into separate columns.

                If the Split() using Regex using simple pipe delimiter [|] is slow then it's very easy to hand-craft it using char arrays (or Strings/StringBuilders or ArrayList<String>). But I prefer to reuse standard library when possible if performance hit is moderate.

                thanks
                • 5. Re: Java Regex Pipe Delimited ?
                  DrClap
                  yurib wrote:
                  If the Split() using Regex using simple pipe delimiter [|] is slow then it's very easy to hand-craft it
                  However it isn't going to be so slow compared to hand-crafted alternatives that the difference will even be noticeable, given that you're reading data from a file and writing data to a database.
                  • 6. Re: Java Regex Pipe Delimited ?
                    jschellSomeoneStoleMyAlias
                    I am planning to use BufferedReader object and use readLine() mehod and then split each input line into multiple string objects and pass these objects on to the database (using JDBC) for further processing once the delimited line is split into separate columns.
                    The data import tools that come with the database will probably do exactly that.
                    And they would be faster too.