This discussion is archived
4 Replies Latest reply: Oct 14, 2009 8:43 AM by 504035 RSS

Is JPA effecient enough to use Bulk Insert  using JPA ?

721988 Newbie
Currently Being Moderated
Hi,
I tiried bulk insert(15000 records) in JPA , which is taking 2 mins and 30 seconds.
But in jdbc it takes just 39 seconds.
I m using Derby DB.
Any idea to tune up the performance in JPA?
  • 1. Re: Is JPA effecient enough to use Bulk Insert  using JPA ?
    JamesSutherland Pro
    Currently Being Moderated
    Depending on how optimized your JDBC code is, there will probably be some unavoidable overhead.

    To optimize TopLink/EclipseLink JPA persistence try,
    - enable batch writing, "eclipselink.jdbc.batch-writing"="JDBC"
    http://www.eclipse.org/eclipselink/api/1.1.1/org/eclipse/persistence/config/PersistenceUnitProperties.html#BATCH_WRITING

    - disable caching (it adds overhead on inserts), "eclipselink.cache.shared.default"="false"
    http://www.eclipse.org/eclipselink/api/1.1.1/org/eclipse/persistence/config/PersistenceUnitProperties.html#CACHE_SHARED_DEFAULT

    - don't use a single EntityManager, insert in batches of 1000, and create a new EntityManager or clear() between batches.

    - ensure you are using sequence number preallocation, and not using IDENTITY sequencing

    - ensure weaving is enabled

    - use field access

    - turn logging OFF

    - enable statement caching and parameter binding (may also try dynamic statements, some dbs do better on either)

    - maybe use a different database such as Oracle

    - try,
    http://www.eclipse.org/eclipselink/api/1.1.1/org/eclipse/persistence/config/PersistenceUnitProperties.html#PERSISTENCE_CONTEXT_PERSIST_ON_COMMIT

    - try,
    http://www.eclipse.org/eclipselink/api/1.1.1/org/eclipse/persistence/config/PersistenceUnitProperties.html#PERSISTENCE_CONTEXT_CLOSE_ON_COMMIT

    - try latest EclipseLink 2.0 (or 1.2) build

    - try profiling the insert and see where the time is spent.


    ---
    James : http://www.eclipselink.org
  • 2. Re: Is JPA effecient enough to use Bulk Insert  using JPA ?
    504035 Explorer
    Currently Being Moderated
    We have noticed that batch writing using binding is very expensive, half of the time of the commit is spent creating arrays. We would expect most of the time is done in the database doing something more useful.

    How could I use binding for query, but batch writing without binding, so just send a batch of strings representing the insert/delete/update?
  • 3. Re: Is JPA effecient enough to use Bulk Insert  using JPA ?
    JamesSutherland Pro
    Currently Being Moderated
    That is unexpected, at least in Oracle the parametrized batch writing is very efficient and significantly faster than other mechanisms. What database/driver are you using?
    Every database/driver can have different optimal configurations, so there is no general rule. For Oracle parametrized batch writing is normally the most efficient, and dynamic batch writing is normally inefficient.

    You can enable dynamic batch writing by setting parameter binding to false, and batch writing to JDBC. You would then need to enable binding on specific queries if you wanted binding on in some cases.
  • 4. Re: Is JPA effecient enough to use Bulk Insert  using JPA ?
    504035 Explorer
    Currently Being Moderated
    It seems natural that the options available for batch writing be extended to allow using batch writing without binding:
    - JDBC – use JDBC batch writing.
    - Buffered – do not use either JDBC batch writing nor native platform batch writing.
    - Oracle-JDBC – use both JDBC batch writing and Oracle native platform batch writing.
    - None – do not use batch writing (turn it off).

    See http://wiki.eclipse.org/Using_EclipseLink_JPA_Extensions_(ELUG)

    You said: You can enable dynamic batch writing by setting parameter binding to false, and batch writing to JDBC. You would then need to enable binding on specific queries if you wanted binding on in some cases.

    It's not really working, because when set read query to use binding, the setting is not propagated to its relationship.

    Even if I setup descriptor queries to use binding with the following code, it's not covering for all the cases:

    private void setupBindingToAllQueryGeneratedFromMapping(ClassDescriptor descriptor) {
    for (DatabaseMapping databaseMapping : (Vector<DatabaseMapping>) descriptor.getMappings()) {
    if ((databaseMapping instanceof ForeignReferenceMapping)) {

    ForeignReferenceMapping foreignReferenceMapping = (ForeignReferenceMapping) databaseMapping;
    ReadQuery readQuery = foreignReferenceMapping.getSelectionQuery();
    if (readQuery != null) {
         readQuery.setShouldBindAllParameters(true);
    }
    }
    }
    }

    Example of not propagating settings, only few attributes of the original query are copied to the new query below, extract of EclipseLink code:
    public ReadQuery prepareNestedBatchQuery(ReadAllQuery query) {
    ReadAllQuery batchQuery = new ReadAllQuery();
    batchQuery.setReferenceClass(getReferenceClass());
    batchQuery.setDescriptor(getReferenceDescriptor());
    batchQuery.setSession(query.getSession());

    //bug 3965568
    // we should not wrap the results as this is an internal query
    batchQuery.setShouldUseWrapperPolicy(false);

    And because EclipseLink/TopLink API is closed by always hardcoding instantiation I didn't find any hook to fix the case above.

    Please consider again to introduce some well known patterns to allow customization and bug/limitation fix:
    - factory based on static field so can change concrete class without having to derive from the factory
    - use dependency injection

    You said: is unexpected, at least in Oracle the parametrized batch writing is very efficient and significantly faster than other mechanisms

    Some part of Oracle JDBC driver seem to be written by summer interns.

    The creation of a bunch of arrarys and the resizing of them show up as very expensive.

    When we think about it, binding is about pushing the work to the client side, make the client prepare the data exactly like Oracle DB would like to receive it. So it is good for scalability because DB tier is the most expensive. However, if the client work required to prepare the data is very high, and not optimized, then that may not hold. Also with batch writing using binding, many trips are done to the DB during commit because each different SQLs need to be send in a different batches, so DB cannot receive all the work in batch, which would allow for some optimization. So enough reasons why batch writing with binding is unlikely the best mechanism.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points