yurib wrote:No you weren't, without the limit parameter the whole line is processed, but trailing empty elements are discarded (which the method with the limit parameter doesn't do).
thanks, I was missing the limit parameter, I incorrectly assumed that the default limit was reasoable i.e. will process the whole line without limit or some large number.
I was wrong.
Given I don't know how many pipes/delimiters will be present on any one line/record, I used -1 as a limit.That's not a performance issue.
I could have used some arbitrarily large number, like 10000, as limit.
Which is a better option, assuming performance is important?
The files I will be processing will be many GB in size and will have nundreds of millions of records/rows.Well it depends, are you trying to read the whole file in the memory and then split it?
Or perhaps String.split() method is too slow for such large files and something else needs to be used instead?
yurib wrote:However it isn't going to be so slow compared to hand-crafted alternatives that the difference will even be noticeable, given that you're reading data from a file and writing data to a database.
If the Split() using Regex using simple pipe delimiter [|] is slow then it's very easy to hand-craft it
I am planning to use BufferedReader object and use readLine() mehod and then split each input line into multiple string objects and pass these objects on to the database (using JDBC) for further processing once the delimited line is split into separate columns.The data import tools that come with the database will probably do exactly that.