6 Replies Latest reply: Nov 3, 2009 5:06 PM by 843789 RSS

    large arraylist running out of memory, what are my options?

    843789
      Hi

      I have a program that needs to load a very large csv file (2meg file, 192718 items) - I have a loop going through adding each item to as a new arraylist item.

      while ((line = bufRdr.readLine()) != null) {
           StringTokenizer st = new StringTokenizer(line, "\n");
           while (st.hasMoreTokens()) {
                // get next token and store it in the array
                String word = st.nextToken().toUpperCase();
                boolean containsApostrophe = false;
                if(word.indexOf('\'') != -1)
                     containsApostrophe = true;
                if (word.length() > 3 && !containsApostrophe)
                {
                     char[] wordCh = word.toCharArray();
                     Word newWord = new Word(wordCh);
                     words.add(newWord);
                     System.out.println(words.size());
                }
           }
      }
      After about 17 thousand the console (in eclipse) starts displaying memory addresses and nothing else, eg:

      com.package.example.Word@1e41969, com.package.example.Word@407d11

      etc.

      I would like to take all these in once, and then save them (as an arraylist) to a file so I don't have to do it again. What are my options?
        • 1. Re: large arraylist running out of memory, what are my options?
          796447
          edzillion wrote:
          I would like to take all these in once, and then save them (as an arraylist) to a file so I don't have to do it again. What are my options?
          Don't store the whole enchilada in memory at the same time unless you really need to. I doubt you really need to; that this is just your current approach and the only way you've thought of.
          • 2. Re: large arraylist running out of memory, what are my options?
            DrClap
            while ((line = bufRdr.readLine()) != null) {
              StringTokenizer st = new StringTokenizer(line, "\n");
            The first line of code says you're reading from the file one line at a time. So your "line" variable will contain a line of code. It can't contain a new-line character, so that makes the StringTokenizer pointless. It will only produce one token for each line.

            Edit: you said you were reading a "csv" file. That suggests you have tokens delimited by commas. So maybe you should have used a comma as your delimiter and not a new-line character?
            boolean containsApostrophe = false;
            if(word.indexOf('\'') != -1)
              containsApostrophe = true;
            It would be shorter and simpler to write those three lines of code as one line:
            boolean containsApostrophe = word.indexOf('\'') != -1;
            • 3. Re: large arraylist running out of memory, what are my options?
              843789
              boolean containsApostrophe = word.indexOf('\'') != -1;
              nice. I am not a very experienced coder so I tend to write things out rather verbosely.

              I suppose I don't need to load them all into memory; but what if I did? 2 meg doesn't seem all that large... It is just a words file and I originally thought I would just have a mechanism to load multiple files into memory and select them for various tasks.
              • 4. Re: large arraylist running out of memory, what are my options?
                796447
                edzillion wrote:
                I suppose I don't need to load them all into memory; but what if I did?
                Then you have to give the VM the expected max heap size you think it needs.
                java -?
                for details on the VM arguments you can provide it with.

                But always consider the scaleability of the program. If today you only process 2MB size files and give it enough memory to accommodate that, tomorrow you may need to process even larger files. You don't want to end up having to give all the available memory to the VM at best, and at worst not be able to give it enough without installing more memory, do you?

                Edit: Actually at worst, you could end up never being able to give it enough memory because there's an upper limit anyway, and you could "need" more than that. So then you'd be forced to re-design it. Might as well start now.
                • 5. Re: large arraylist running out of memory, what are my options?
                  DrClap
                  edzillion wrote:
                  I suppose I don't need to load them all into memory; but what if I did? 2 meg doesn't seem all that large...
                  Well, it isn't. But you convert the 2 meg file into 2 million Unicode characters (which take two bytes each), so there's 4 megabytes. Then you're creating a list of Word objects, which are comprised of fragments of that file, so there's another 4 megabytes, or probably more depending on what's in the Word class.

                  People will tell you not to optimize until you have to. And I'm one of those people. But on the other hand there's absolutely no reason to read everything into memory and then write the cooked version out at the end, at least not in your case. So why do it?
                  • 6. Re: large arraylist running out of memory, what are my options?
                    843789
                    edzillion wrote:
                    After about 17 thousand the console (in eclipse) starts displaying memory addresses and nothing else, eg:

                    com.package.example.Word@1e41969, com.package.example.Word@407d11
                    I've never seen the number of elements effect how an object's toString method works.

                    Did you actually override Word.toString()?