This discussion is archived
6 Replies Latest reply: Nov 3, 2009 5:06 PM by 843789 RSS

large arraylist running out of memory, what are my options?

843789 Newbie
Currently Being Moderated
Hi

I have a program that needs to load a very large csv file (2meg file, 192718 items) - I have a loop going through adding each item to as a new arraylist item.

while ((line = bufRdr.readLine()) != null) {
     StringTokenizer st = new StringTokenizer(line, "\n");
     while (st.hasMoreTokens()) {
          // get next token and store it in the array
          String word = st.nextToken().toUpperCase();
          boolean containsApostrophe = false;
          if(word.indexOf('\'') != -1)
               containsApostrophe = true;
          if (word.length() > 3 && !containsApostrophe)
          {
               char[] wordCh = word.toCharArray();
               Word newWord = new Word(wordCh);
               words.add(newWord);
               System.out.println(words.size());
          }
     }
}
After about 17 thousand the console (in eclipse) starts displaying memory addresses and nothing else, eg:

com.package.example.Word@1e41969, com.package.example.Word@407d11

etc.

I would like to take all these in once, and then save them (as an arraylist) to a file so I don't have to do it again. What are my options?
  • 1. Re: large arraylist running out of memory, what are my options?
    796447 Newbie
    Currently Being Moderated
    edzillion wrote:
    I would like to take all these in once, and then save them (as an arraylist) to a file so I don't have to do it again. What are my options?
    Don't store the whole enchilada in memory at the same time unless you really need to. I doubt you really need to; that this is just your current approach and the only way you've thought of.
  • 2. Re: large arraylist running out of memory, what are my options?
    DrClap Expert
    Currently Being Moderated
    while ((line = bufRdr.readLine()) != null) {
      StringTokenizer st = new StringTokenizer(line, "\n");
    The first line of code says you're reading from the file one line at a time. So your "line" variable will contain a line of code. It can't contain a new-line character, so that makes the StringTokenizer pointless. It will only produce one token for each line.

    Edit: you said you were reading a "csv" file. That suggests you have tokens delimited by commas. So maybe you should have used a comma as your delimiter and not a new-line character?
    boolean containsApostrophe = false;
    if(word.indexOf('\'') != -1)
      containsApostrophe = true;
    It would be shorter and simpler to write those three lines of code as one line:
    boolean containsApostrophe = word.indexOf('\'') != -1;
  • 3. Re: large arraylist running out of memory, what are my options?
    843789 Newbie
    Currently Being Moderated
    boolean containsApostrophe = word.indexOf('\'') != -1;
    nice. I am not a very experienced coder so I tend to write things out rather verbosely.

    I suppose I don't need to load them all into memory; but what if I did? 2 meg doesn't seem all that large... It is just a words file and I originally thought I would just have a mechanism to load multiple files into memory and select them for various tasks.
  • 4. Re: large arraylist running out of memory, what are my options?
    796447 Newbie
    Currently Being Moderated
    edzillion wrote:
    I suppose I don't need to load them all into memory; but what if I did?
    Then you have to give the VM the expected max heap size you think it needs.
    java -?
    for details on the VM arguments you can provide it with.

    But always consider the scaleability of the program. If today you only process 2MB size files and give it enough memory to accommodate that, tomorrow you may need to process even larger files. You don't want to end up having to give all the available memory to the VM at best, and at worst not be able to give it enough without installing more memory, do you?

    Edit: Actually at worst, you could end up never being able to give it enough memory because there's an upper limit anyway, and you could "need" more than that. So then you'd be forced to re-design it. Might as well start now.
  • 5. Re: large arraylist running out of memory, what are my options?
    DrClap Expert
    Currently Being Moderated
    edzillion wrote:
    I suppose I don't need to load them all into memory; but what if I did? 2 meg doesn't seem all that large...
    Well, it isn't. But you convert the 2 meg file into 2 million Unicode characters (which take two bytes each), so there's 4 megabytes. Then you're creating a list of Word objects, which are comprised of fragments of that file, so there's another 4 megabytes, or probably more depending on what's in the Word class.

    People will tell you not to optimize until you have to. And I'm one of those people. But on the other hand there's absolutely no reason to read everything into memory and then write the cooked version out at the end, at least not in your case. So why do it?
  • 6. Re: large arraylist running out of memory, what are my options?
    843789 Newbie
    Currently Being Moderated
    edzillion wrote:
    After about 17 thousand the console (in eclipse) starts displaying memory addresses and nothing else, eg:

    com.package.example.Word@1e41969, com.package.example.Word@407d11
    I've never seen the number of elements effect how an object's toString method works.

    Did you actually override Word.toString()?