Forum Stats

  • 3,770,415 Users
  • 2,253,104 Discussions
  • 7,875,447 Comments

Discussions

Inserting huge volume of records in JDBC

User_19BPU
User_19BPU Member Posts: 1,086 Blue Ribbon

Hi,

   I am inserting a batch of records say 100000 records using PreparedStatement, whether it is better use to executeUpdate() as below or go with a batch update by adding the batches? Whether batchInsert offers better performance than executeUpdate()? Or we will insert the records in an iterative manner using small batch size? 

  for (TestData testDataModel : testDataList) { // 100000 data

  PreparedStatement preStatement = null;

  try {

  preStatement = connectionObj.prepareStatement(CUSTOMER_QUERY);

  preStatement.setString(1, testDataModel .getCustomerId());

  preStatement.setDate(2, testDataModel .getCreationDate());

  preStatement.setString(3, testDataModel .getDisplayName());

  preStatement.setString(4, testDataModel .getAddress());

  preStatement.executeUpdate();

  } catch (SQLException sqlExp) {

  }finally{

  if(preStatement != null){

  try {

  preStatement.close();

  } catch (SQLException sqlExp) {

  }

}

}

Also, whether we need to close the connection and prepared statement with in the for loop or outside as there many multiple connections based on the record count? Please clarify.

Thanks.

Tagged:

Answers

  • Joe Weinstein-Oracle
    Joe Weinstein-Oracle Member Posts: 516
    edited Sep 30, 2015 11:21AM

    Hi. Depending on the DBMS architecture, and the indexes on the table etc, it may be no faster with multiple concurrent clients inserting

    than it would be for one orderly inserter. And it might be faster, if the table size starts small and grow large, to drop the indexes, do the

    mass insert, then recreate the indexes.

    Anything you do to decrease the client-DBMS roundtrips will also make this faster.

    Surely you have the time to test your own question, right? And don't remake the PreparedStatement in the loop.

    Make it once and re-use it. That's what they are for.

    I will offer you another alternative to test, that may beat all:

    Statement s = c.createStatement();

    StringBuffer batchUpdate = new StringBuffer();

    for (TestData testDataModel : testDataList) { // 100000 data

      batchUpdate.append("insert into myTable values(" + testDataModel.getCustomerId() + ", "

                                                       + testDataModel.getCreationDate() + ", "

                                                       + testDataModel.getDisplayName() + ", "

                                                       + testDataModel.getAddress() + ")\n"

    }

    s.executeUpdate(batchUpdate.toString());

  • Joe Weinstein-Oracle
    Joe Weinstein-Oracle Member Posts: 516
    edited Sep 30, 2015 3:14PM

    And I edited the suggestion in case you're using the Oracle DBMS, below.

    The original SQL will work for some other DBMSes, such as SQLServer or Sybase...

    Statement s = c.createStatement();

    StringBuffer batchUpdate = new StringBuffer("BEGIN; ");

    for (TestData testDataModel : testDataList) { // 100000 data

      batchUpdate.append("insert into myTable values(" + testDataModel.getCustomerId() + ", "

                                                       + testDataModel.getCreationDate() + ", "

                                                       + testDataModel.getDisplayName() + ", "

                                                       + testDataModel.getAddress() + "); "

    }

    batchUpdate.append("END;");

    s.executeUpdate(batchUpdate.toString());

  • Unknown
    edited Sep 30, 2015 5:24PM
    Anything you do to decrease the client-DBMS roundtrips will also make this faster.
    
     I will offer you another alternative to test, that may beat all: 
    

    Agree with #1 above.

    Disagree with #2 - certainly for Oracle you don't want to send individual INSERT statements. You want to use bind variables.

    .

  • Unknown
    edited Sep 30, 2015 5:31PM
       I am inserting a batch of records say 100000 records using PreparedStatement, whether it is better use to executeUpdate() as below or go with a batch update by adding the batches?
    
    

    You should use batch updating with statements that use bind variables as the examples in the JDBC Developer Guide show.

    Performance Extensions

    1. Write your prepared statement, and specify input values for the first row, as follows:
      PreparedStatement ps = conn.prepareStatement ("INSERT INTO dept VALUES (?,?,?)");

    You do that BEFORE the loop.

    Then inside the loop you set each of the bind variables to a value and use an executeUpdate to add it to the batch.

     whether we need to close the connection and prepared statement with in 
    the for loop or outside as there many multiple connections based on the
    record count?

    You close a statement or connection when you are done using them.

    What does 'many multiple connections based on the record count' mean?

    1. Create a connection

    2. Create a statement

    3. set the batch count

    4. set autocommit to FALSE

    5. execute your loop of data

    6. execute a COMMIT or ROLLBACK as appropriate

    7. close the statement if you are done using it

    8. close the connection if you are done using it

    The MOST IMPORTANT thing is to disable autocommit before you do anything else. Otherwise you will be committing each row and that will REALLY slow things down.

    Use batches of 100 or so. As the doc explains that reduces the number of round trips to the database.

  • User_19BPU
    User_19BPU Member Posts: 1,086 Blue Ribbon
    edited Sep 30, 2015 11:58PM

    Thanks, whether we need to use addBatch() and executeBatch() like the below OR we need to set the batch size as well so that only that many number of records will be processed:-

    PreparedStatement preStatement = connectionObj.prepareStatement(CUSTOMER_QUERY);

    for (TestData testDataModel : testDataList) { // 100000 data

        try {

      preStatement.setString(1, testDataModel .getCustomerId());

      preStatement.setDate(2, testDataModel .getCreationDate());

      preStatement.setString(3, testDataModel .getDisplayName());

      preStatement.setString(4, testDataModel .getAddress());

      preStatement.addBatch();

       preStatement.executeBatch();

      } catch (SQLException sqlExp) {

      }

  • Unknown
    edited Oct 1, 2015 11:19AM
    Thanks, whether we need to use addBatch() and executeBatch() like the below OR we need to set the batch size as well so that only that many number of records will be processed:-
    
    
    PreparedStatement preStatement = connectionObj.prepareStatement(CUSTOMER_QUERY);
    for (TestData testDataModel : testDataList) { // 100000 data
        try {
      preStatement.setString(1, testDataModel .getCustomerId());
      preStatement.setDate(2, testDataModel .getCreationDate());
      preStatement.setString(3, testDataModel .getDisplayName());
      preStatement.setString(4, testDataModel .getAddress());
      preStatement.addBatch();
       preStatement.executeBatch();
      } catch (SQLException sqlExp) {
    
      }
    

    You should ALWAYS set the appropriate batch size. And it is recommended to use generic batching and not the Oracle extended batching. That is so that your code is NOT Oracle dependent.

    There is NOTHING wrong with the Oracle batching. In fact it is more convenient if you know you only need to work with Oracle DBs.

    But your code above is NOT what you should use.

    RTFM - I gave you a link to the specific section of the JDBC Dev Guide that talks about Oracle extensions. To use those extensions you MUST use the Oracle classes; e.g. OraclePreparedStatement.

    As that doc shows the 'executeBatch' method operates DIFFERENTLY depending on whether you use generic batching

    The generic Java batching requires YOU to keep track of the number of 'addBatch' statements you have executed and then call 'executeBatch' to actuall send them to the DB.

    For generic batching every time you call 'executeBatch' (as in your code above) you send data to the DB REGARDLESS of the size of the current batch. So your code is sending EVERY row one at a time.

    The doc has examples. Read the doc. Try the examples. Then if you still have questions post your CORRECTED code and ask them.

This discussion has been closed.