14 Replies Latest reply: Nov 7, 2008 9:40 AM by 807589 RSS

    Handling huge result from DB fetch

    807589
      Folks,

      My java app fetches about 3 lakh records in its result set.

      ResultSet rs = stmt.executeQuery();

      After that, I try to goto the last record (so as to find the exact total count of the records) by saying

      rs.last();

      This seems to throw java.lang.OutOfMemory : java heap space exception.

      One way to solve this, IMO, is to fetch the records from the DB in smaller chunks (say 1 lakh records in each fetch or even smaller count). How can I do this (meaning what API method to use)? I see there's something called setFetchSize() in ResultSet API but that would fetch "in total" only that number of records. Kindly suggest how to do this or if you think there's a better way to overcome this memory space problem, do suggest. I do not want to use JVM params like -Xmx because I read that that's not a good way to handle OutOfMemory errors.
        • 1. Re: Handling huge result from DB fetch
          807589
          To find the number of records? personally, I would approach it thus:
          // SNIP---
          int count = 0;
          
          while(resultSet.next())
          {
              count ++;
          //SNIP---
          }
          
          System.out.print("Number of records: " + count);
          • 2. Re: Handling huge result from DB fetch
            807589
            If you're only interested in counting elements wouldn't it be possible to have the DB do it for you with the COUNT function?
            If you are also going to do some additional computations with the data do you really need all that data at the same time,
            parhaps you should consider puting more strict conditions on your select querys so you only get the data you really need at any particular moment.

            PS. My hands on experience with data base access it very limited so I might be missing something
            • 3. Re: Handling huge result from DB fetch
              807589
              Thanks for your replies. Its not about getting the count of records. I want to fetch the records from DB in small chunks instead of all at once. How do I do it using JDBC API? i.e. say 5 laksh records are returned for a query. I do not want to fetch all the 5 lakh records at one shot. Instead I want to fetch the records one lakh at a time, then process them & then fetch the next lakh of records.

              Thanks.
              • 4. Re: Handling huge result from DB fetch
                807589
                I'm inclined to agree with __Kraken__, it's more about the SQL; you don't say what RMDBS you're working with, but in some cases there are built-in methods to handle subsets of data and pick up right where you left off with the next query.

                Here's what I would do. Perform a COUNT statement that returns the total number of records returned from your SELECT statement. Decide what breakpoint you want to use, whether it's 100 records or 1,000 records, etc. and alter your SELECT to use a LIMIT clause to return that subset of records at a time. Store the current state and refer back to it on the next action, up to that last record value. This is not unlike the 'pagination' you see at the bottom of most websites (including this one) where it breaks results up into different pages depending on how many you want to view at any given time.
                • 5. Re: Handling huge result from DB fetch
                  807589
                  GSP wrote:
                  Thanks for your replies. Its not about getting the count of records. I want to fetch the records from DB in small chunks instead of all at once. How do I do it using JDBC API? i.e. say 5 laksh records are returned for a query. I do not want to fetch all the 5 lakh records at one shot. Instead I want to fetch the records one lakh at a time, then process them & then fetch the next lakh of records.

                  Thanks.
                  I am really not sure what you want.

                  If you want the result set to only hold X records then you'll have to do the correct thing in SQL for your database (as mentioned). This will involve an ORDER BY with either some good WHERE clauses and/or some database specific syntax like LIMIT or TOP.

                  If you are only concerned in how many records you are fetching from the database at one time then setFetchSize is what you want. That's a hint to the driver to fetch rows in batches of X.

                  For example if your result set will have 100 records and you setFetchSize to 10 (this number is VERY low, set it to a reasonably high number, test) then when you call next the first time the first 10 records are fetched at once. That means the next nine next calls don't require the ResultSet to communicate with the DB. When you call fetch the 11th time it fetches the next 10 records. Etc.
                  • 6. Re: Handling huge result from DB fetch
                    807589
                    blangsocal wrote:
                    I'm inclined to agree with __Kraken__, it's more about the SQL; you don't say what RMDBS you're working with, but in some cases there are built-in methods to handle subsets of data and pick up right where you left off with the next query.

                    Here's what I would do. Perform a COUNT statement
                    For the batch processing the OP seems to be doing s/he doesn't need the count.
                    • 7. Re: Handling huge result from DB fetch
                      807589
                      Try with JDBCRowSet or WebRowSet might work....
                      • 8. Re: Handling huge result from DB fetch
                        807589
                        GSP wrote:
                        Folks,

                        One way to solve this, IMO, is to fetch the records from the DB in smaller chunks (say 1 lakh records in each fetch or even smaller count). How can I do this (meaning what API method to use)? I see there's something called setFetchSize() in ResultSet API but that would fetch "in total" only that number of records. Kindly suggest how to do this or if you think there's a better way to overcome this memory space problem, do suggest. I do not want to use JVM params like -Xmx because I read that that's not a good way to handle OutOfMemory errors.
                        It's a common misunderstanding that ResultSet stores all the rows from the query. Actually the RowSet normally only holds one row at a time and the handle for the database cursor to fetch more. setFetchSize hints that the rows should be retrieved in batches of a given size. It doesn't affect the way the program receives the data, it's just for optimisation.

                        Only when you start asking for more complex methods of moving through result sets may the implementation need to get whole tables back (for example if it has to simulate random row access when the database doesn't handle it). This happens if you use result set types other than the default FORWARD_ONLY.

                        If you can read each row, deal with it and discard it then you'll never run out of memory however many rows you read.

                        However it's pointless to get the row count by going to the last row, or counting the rows in the program. Use "SELECT COUNT(*) FROM ...". If there are other programs that can change the table the count cannot be counted on to be the same twice either way.
                        • 9. Re: Handling huge result from DB fetch
                          807589
                          Thanks to all for your replies.

                          Am using Sybase DB & calling a stored procedure which returns 3 lakh records. I have no control over the DB end & so cannot alter the way teh DB is returning the records. So I need to do something from the java end. Am trying to use 'setFetchSize', but then that's causing another exception.

                          Am doing this:

                          stmt = connection.prepareCall ( queryString );
                          stmt.setQueryTimeout(0);
                          stmt.setFetchSize(30000);
                          ResultSet rs = stmt.executeQuery();

                          Previously, I didn't have 'setFetchSize' method call. That time, I got the exception out of memory in the line 'executeQuery'. Now, on adding the all to 'setFetchSize', am getting the following exception in the statement 'execueQuery':

                          "Execute cursor is declared on a procedure which contains a non-SELECT or a SELECT with a COMPUTE clause. For the declaration of this cursor to be legal, it should have a single SELECT statement without a COMPUTE clause."

                          Thanks.
                          • 10. Re: Handling huge result from DB fetch
                            DrClap
                            Are we agreed that you don't need to know the number of rows that you're going to get when you read them? Okay, good. Then the next step you should take is to change your code to use a forward-only result set. If you do that then the driver only really needs to keep one row in memory at a time. If you don't, then the driver has to cache all rows in case you decide to go back and get them again.
                            • 11. Re: Handling huge result from DB fetch
                              807589
                              DrClap wrote:
                              Are we agreed that you don't need to know the number of rows that you're going to get when you read them? Okay, good. Then the next step you should take is to change your code to use a forward-only result set. If you do that then the driver only really needs to keep one row in memory at a time. If you don't, then the driver has to cache all rows in case you decide to go back and get them again.
                              A forward-only result set is the way to go. You will be able to read many crore records, and they will pour out of the ResultSet like water out of a fire hose.
                              • 12. Re: Handling huge result from DB fetch
                                807589
                                Thanks to all for your replies. I need to know the count of the rows returned because I need to initialize an array where I need to populate the data from the resultset, & so for initializing the array, I need to know the size. Previously, I was using a FORWARD_ONLY resultset to fetch the data & populate them to a List object. Then I tried to convert the List to array but that failed, because JVM was unable to have a List of 3 lakh objects, & at the same time, initialize an array of 3 lakh size, & so caused a OutOfMemory error.
                                • 13. Re: Handling huge result from DB fetch
                                  807589
                                  GSP wrote:
                                  Thanks to all for your replies. I need to know the count of the rows returned because I need to initialize an array where I need to populate the data from the resultset, & so for initializing the array, I need to know the size. Previously, I was using a FORWARD_ONLY resultset to fetch the data & populate them to a List object. Then I tried to convert the List to array but that failed, because JVM was unable to have a List of 3 lakh objects, & at the same time, initialize an array of 3 lakh size, & so caused a OutOfMemory error.
                                  If you're loading them into a collection of some sort then of course you're going to use lots of heap however you do it. About the best you're going to do is to do a count(*) first and create an array or ArrayList with a bit of extra space in case any rows have been inserted between the count and the actual retrieval. Remember an ArrayList has to allocate a second, larger array every time it grows anyway.

                                  But the first question you need to ask yourself is if you really need to hold all those rows at once. Far better if you can process each one as it comes in (even if this might mean more than one pass).
                                  • 14. Re: Handling huge result from DB fetch
                                    807589
                                    malcolmmc wrote:
                                    But the first question you need to ask yourself is if you really need to hold all those rows at once. Far better if you can process each one as it comes in (even if this might mean more than one pass).
                                    I agree with that completely. And if I needed to hold all the data in memory I would use a collection. I hardly ever use arrays except for trivial things.