This discussion is archived
8 Replies Latest reply: Nov 17, 2012 5:23 PM by 947960 RSS

Distributed Query Overhead

947960 Newbie
Currently Being Moderated
Hi All,

I have a distributed deployment of two oracle instances where database A keeps a replication of a schema from database B.

I have A and B linked together, B sees A as a remote database, and my application sends queries to database B.

Let's say I have the following two queries:

The following is issued to B:

select * from magic.accountejb@A a where a.profile_userid = ( select userid from magic.accountprofileejb@A ap where ap.userid = 'uid:174')

and the following issued directly to A (which is basically the same query as above):

select * from accountejb a where a.profile_userid = ( select userid from accountprofileejb ap where ap.userid = 'uid:174')


when I measure the time through my Java application, the second query executes more than 3 times faster than the first query (23ms on A compared to 80ms on B). However, when I use the sqlplus client on B to issue the exact same query, the execution time reported by sqlplus is almost identical to the second one(20ms).

When I monitor the execution plan through *@UTLXPLAN*, it seems like the query sent to B is also fully executed remotely and on A. with a network latency of 11ms between A and B, I am not sure why I see such a long delay for the first query. Also playing with DRIVING_SITE did not have any perceived effect on improving performance.

I wonder if anybody has any explanation for the difference I see? is a distributed query really 3 times slower than a regular query even though both are pretty much handled by the same database engine? or is it so that I need some other sort of tuning?

Any thoughts or advice on how I can achieve comparable performance is highly appreciated.

thanks!

Edited by: 944957 on 16-Nov-2012 20:25

Edited by: 944957 on 16-Nov-2012 20:29
  • 1. Re: Distributed Query Overhead
    rp0428 Guru
    Currently Being Moderated
    Welcome to the forum!

    Whenever you post a JDBC question you need to provide:
    1. the 4 digit Oracle version (or other DB version)
    2. the JDK version
    3. the JDBC jar name and version
    4. the code you are using that shows an issue or problem.
    5. for performance related issues - the data volume being queried or processed

    Without seeing the code to see how you are measuring the timing it is hard to comment on what you posted.

    1. How much data does the query return?
    2. How was the timing computed in Java?
    3. How was the timing computed in sql*plus?
    4. Were the connections already created before the timing started? Or is the creation of the connection part of the timing result?
    5. Do the timings include the retrieval of ALL result set data? Or just the first set of results?

    Can you post the explain plans for the java and the sql*plus executions?
  • 2. Re: Distributed Query Overhead
    947960 Newbie
    Currently Being Moderated
    Thanks a lot for the quick response:
    rp0428 wrote:
    1. the 4 digit Oracle version (or other DB version)
    2. the JDK version
    3. the JDBC jar name and version
    I am using ojdbc14 with Oracle 11g XE and JDK 7.
    4. the code you are using that shows an issue or problem.
    The queries I am using is basically the two queries I provided earlier, and here is the exact Java code. I loop over the code below 20 times and discard the first two retrieved results for each query and calculate an average on the remaining 18 results collected.

    -----

    static Connection c1 = null, c2 = null;
    static Statement _session;

    public void getStats(){

    long start;

    for (int i = 0; i < 20; i++) {

    c1 = (c1 != null) ? c1 :
         DriverManager.getConnection("jdbc:oracle:thin:@//" + System.getProperty("host.1")+"/XE", "magic", "magic");
    _session = c1.createStatement();
    session.executeUpdate("ALTER SESSION SET CURRENTSCHEMA=magic");          
    start = System.currentTimeMillis();
    _session.executeUpdate(query);
    values[0] = System.currentTimeMillis() - start;     
    _session.close();


    c2 = (c2 != null) ? c2 :
         DriverManager.getConnection("jdbc:oracle:thin:@//" + System.getProperty("host.2")+"/XE", "magic", "magic");
    _session = c2.createStatement();
    _session.executeUpdate("ALTER SESSION SET CURRENT_SCHEMA=magic");          
    start = System.currentTimeMillis();
    _session.executeUpdate(distQuery);          
    values[1] = System.currentTimeMillis() - start;     
    _session.close();

    } // end for loop     

    } // end method

    -----
    5. for performance related issues - the data volume being queried or processed
    The data volume is rather small. I measure the data and it is roughly about 10K of data transfer.

    >
    Without seeing the code to see how you are measuring the timing it is hard to comment on what you posted.

    3. How was the timing computed in sql*plus?
    for sqlplus, I issue *set timing on* prior to executing the queries.
    4. Were the connections already created before the timing started? Or is the creation of the connection part of the timing result?
    As you see in the code, the connection is only created the first time I issue a query, and I discard the results of the first two queries using the connection as the timing is far off specially for the first query. I think the first query also download some metadata information that I don't consider in calculating the performance.
    5. Do the timings include the retrieval of ALL result set data? Or just the first set of results?
    the time only consists of executing the first set.
    Can you post the explain plans for the java and the sql*plus executions?
    Here is the results of the explain plan

    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 3819315806

    --------------------------------------------------------------------------------
    -----------------------------

    | Id | Operation          | Name          | Rows | Bytes | Cos
    t (%CPU)| Time     | Inst |

    --------------------------------------------------------------------------------
    -----------------------------


    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT REMOTE |               |     1 |     43 |
    2 (0)| 00:00:01 |     |

    | 1 | TABLE ACCESS BY INDEX ROWID| ACCOUNTEJB     |     1 |     43 |
    2 (0)| 00:00:01 | CORONA |

    |* 2 | INDEX RANGE SCAN     | ACCOUNT_USERID     |     1 |     |
    1 (0)| 00:00:01 | CORONA |

    |* 3 | INDEX UNIQUE SCAN     | PK_ACCOUNTPROFILEEJB |     1 |     9 |
    0 (0)| 00:00:01 | CORONA |

    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------

    --------------------------------------------------------------------------------
    -----------------------------


    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("A1"."PROFILE_USERID"= (SELECT "A2"."USERID" FROM "MAGIC"."A
    CCOUNTPROFILEEJB"


    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
         "A2" WHERE "A2"."USERID"='uid:174'))
    3 - access("A2"."USERID"='uid:174')

    Note
    -----
    - fully remote statement

    Edited by: 944957 on 16-Nov-2012 20:51

    Edited by: 944957 on 16-Nov-2012 20:53

    Edited by: 944957 on 16-Nov-2012 20:55

    Edited by: 944957 on 16-Nov-2012 20:56

    Edited by: 944957 on 16-Nov-2012 20:57

    Edited by: 944957 on 16-Nov-2012 20:59
  • 3. Re: Distributed Query Overhead
    rp0428 Guru
    Currently Being Moderated
    Thanks for posting the info. It will take some time to go through it all so won't be able to respond until tomorrow.

    But one thing for sure I suggest you try - because of this
    I am using ojdbc14 with Oracle 11g XE and JDK 7.
    Why? You should always use the latest JDBC driver available that supports your database and JDK versions.

    I suggest you redo your tests using the ODBC6.jar file, determine if there is still an issue and then go forward from there.

    Unless you have some overriding reason to still be using that ancient jar file you should abandon it immediately.
    http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-faq-090281.html#02_02

    Also, there is no need to set the set the schema since you are connecting to the schema you are using.

    And why are you using this code?
    _session.executeUpdate(query);
    The queries you posted return a result set and you are not using a method that returns a resultset.

    The executeUpdate is for
    >
    int executeUpdate(String sql)
    throws SQLExceptionExecutes the given SQL statement, which may be an INSERT, UPDATE, or DELETE statement or an SQL statement that returns nothing, such as an SQL DDL statement.
    Note:This method cannot be called on a PreparedStatement or CallableStatement.

    Parameters:
    sql - an SQL Data Manipulation Language (DML) statement, such as INSERT, UPDATE or DELETE; or an SQL statement that returns nothing, such as a DDL statement.
    Returns:
    either (1) the row count for SQL Data Manipulation Language (DML) statements or (2) 0 for SQL statements that return nothing
    >
    http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html
  • 4. Re: Distributed Query Overhead
    947960 Newbie
    Currently Being Moderated
    rp0428 wrote:
    Thanks for posting the info. It will take some time to go through it all so won't be able to respond until tomorrow.

    But one thing for sure I suggest you try - because of this
    I am using ojdbc14 with Oracle 11g XE and JDK 7.
    Why? You should always use the latest JDBC driver available that supports your database and JDK versions.

    I suggest you redo your tests using the ODBC6.jar file, determine if there is still an issue and then go forward from there.

    Unless you have some overriding reason to still be using that ancient jar file you should abandon it immediately.
    http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-faq-090281.html#02_02
    I couldn't find an equivalent of preparedStatement.getOriginalSql() in ojdbc6 and I need it to analyze the internals of the sql queries. is there any way in ojdbc6 to get the sql statement from the PreparedStatement in a Java application?

    Edited by: 944957 on 16-Nov-2012 21:17
  • 5. Re: Distributed Query Overhead
    rp0428 Guru
    Currently Being Moderated
    >
    I couldn't find an equivalent of preparedStatement.getOriginalSql() in ojdbc6 and I need it to analyze the internals of the sql queries. is there any way in ojdbc6 to get the sql statement from the PreparedStatement in a Java application?
    >
    Well now you are just wasting everyone's time.

    The code you posted and said you were using doesn't even have a preparedStatement in it; it uses a statement object and calls the wrong method for the SELECT queries you posted and also said you were using.

    "Lucy, you got some splainin' to do!" - Ricky Ricardo
  • 6. Re: Distributed Query Overhead
    947960 Newbie
    Currently Being Moderated
    sorry for the confusion I made.
    And why are you using this code?
    _session.executeUpdate(query);
    The queries you posted return a result set and you are not using a method that returns a resultset.
    The code I have provided above is the benchmark I used to test the performance of the queries made in my original application. In the actual application I use a PreparedStatement but in the benchmark code, I chose a fixed value uid:174 and pass it directly to the statement, so I don't need to make it a prepared statement. Also the ojdbc driver I use in my actual application is ojdbc14 since it provides an implementation for preparedStatement.getOriginalSql(). I use the same driver in my benchmark to get consistent results comparable with the actual application. And the numbers I measure in the benchmark implementation (the one I provided in this post) are exactly identical to the numbers I see in my actual application.

    as for why I use executeUpdate, in the benchmark tool, I am not interested in the results. I was trying different execute methods to see if there is a difference between their execution time. The measurements are exactly the same regardless of whether I use executeUpdate or execute or executeQuery.

    Hopefully that clarifies it.

    EDIT: Here is the updated benchmark which uses executeQuery and PreparedStatement. The results are exactly the same.

    static Connection c1 = null, c2 = null;
    static PreparedStatement _statement;
         
    public void getStats(){

    long start;

    for (int i = 0; i < 20; i++) {

         c1 = (c1 != null) ? c1 :
              DriverManager.getConnection("jdbc:oracle:thin:@//" + System.getProperty("host.1")+"/XE", "magic", "magic");
         _statement = c1.prepareStatement(query);
         start = System.currentTimeMillis();
         _statement.executeQuery();
         values[0] = System.currentTimeMillis() - start;     
         _statement.close();
         
         // The second query
         
         c2 = (c2 != null) ? c2 :
              DriverManager.getConnection("jdbc:oracle:thin:@//" + System.getProperty("host.2")+"/XE", "magic", "magic");
         _statement = c2.prepareStatement(distQuery);
         start = System.currentTimeMillis();
         _statement.executeQuery();          
         values[1] = System.currentTimeMillis() - start;          
         _statement.close();

    } // end for loop     

    } // end method

    Edited by: 944957 on 16-Nov-2012 21:48

    Edited by: 944957 on 16-Nov-2012 22:10

    Edited by: 944957 on 17-Nov-2012 17:22
  • 7. Re: Distributed Query Overhead
    EJP Guru
    Currently Being Moderated
    Why are you calling a PeeparedStatement a session? This sort of thing just confuses people. Not least yourself.
  • 8. Re: Distributed Query Overhead
    947960 Newbie
    Currently Being Moderated
    Just as an update, I tried the benchmark with ojdbc6 and the results are worse. There is a 5 times difference between a direct query to host A and a distributed query to A through B.

    I also updated the code and change the PreparedStatement variable to _statement;

    Edited by: 944957 on Nov 17, 2012 12:01 PM

    Edited by: 944957 on 17-Nov-2012 17:23

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points