Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

SQL Loader error -- Row not loaded due to data errors

mradul goyalSep 4 2017 — edited Sep 4 2017

Hi,

I am trying to load data using SQL Loader but getting below error.

Table to be loaded -

CREATE TABLE TEST_PIPE_SEP (FILE_NAME VARCHAR2(3000), KEY_COL VARCHAR2(4000), DESCR VARCHAR2(100), RUN_DATE DATE );

I am creating dynamic CTL file using Shell and SED command so that I can get the current processing file and pass it as default value for the FILE_NAME column

#!/bin/ksh

 

echo "starting script"

 

#cd data

 

for i in data/Key_Mismatch_Output_UAT*.csv

do

  #echo "$i"

  filename=`basename "${i}"`

  echo "$filename"

  #sed '1d' "$i" >> test.csv

  sed -e "s/#file_name#/file_name \"${filename}\",/g" test.ctl > new_test_3.ctl

 

  sqlldr ERRORS=100000 userid=$USER_CRED control=new_test_3.ctl data=data/$filename silent=all log=data/$filename".log" bad=data/$filename.bad skip=1

wait

done

 

echo "ending script"

The format of test.ctl is as below which I am editing on run time using SED to pass the current processing file name --

LOAD DATA

CHARACTERSET WE8ISO8859P1

APPEND

INTO TABLE TEST_PIPE_SEP

FIELDS TERMINATED BY ','

(

#file_name#

key_col "trim(:key_col)",

descr   "trim(:descr)",

run_date "SYSDATE"

)

The new file which will get created after the SED commend is as below --

LOAD DATA

CHARACTERSET WE8ISO8859P1

APPEND

INTO TABLE TEST_PIPE_SEP

FIELDS TERMINATED BY ','

(

file_name  "Key_Mismatch_Output_UAT.csv.20170804070448.1_LIVE.csv.20170804070448.2_20170804070448.csv",

key_col "trim(:key_col)",

descr   "trim(:descr)",

run_date "SYSDATE"

)

The CSV file and its contains --

Key Columns,Description

"C"|"G000053929"|"ABCD"|"G000053929"|""|""|"TTA"|""|""|""|""|""|"DRR"|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|"",Not Present in file1

"C"|"G000053621"|"HGHQ"|"G000053621"|""|""|"CBI"|""|""|""|""|""|"DRR"|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|"",Not Present in file2

"C"|"G000053929"|"HGHQ"|"G000053929"|""|""|"TTA"|""|""|""|""|""|"DRR"|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|""|"",Not Present in file2

and I am getting below error and not able to find the route cause tried changing the CTL file as much as possible but not able to load data

Table TEST_PIPE_SEP, loaded from every logical record.

Insert option in effect for this table: APPEND

 

   Column Name                  Position   Len  Term Encl Datatype

------------------------------ ---------- ----- ---- ---- ---------------------

FILE_NAME                           FIRST     *   ,       CHARACTER           

    SQL string for column : "TRIM('Key_20170804070448.csv')"

KEY_COL                              NEXT     *   ,       CHARACTER           

DESCR                                NEXT     *   ,       CHARACTER           

RUN_DATE                             NEXT     *   ,       CHARACTER           

    SQL string for column : "SYSDATE"

 

 

Table TEST_PIPE_SEP:

  0 Rows successfully loaded.

  1 Row not loaded due to data errors.

  0 Rows not loaded because all WHEN clauses were failed.

  0 Rows not loaded because all fields were null.

Please ask me if more information is required.

I am using

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

Comments

Paulzip

Can you paste the execution plan for this...

select /*+ driving_site(sd_res_id_case) */ *

from   (select *

        from   (select t.school_id,

                       t.school_name,

                       t.photo_path,

                       r.subject_id,

                       r.version_id,

                       r.year_mark,

                       r.textbook_code

                from   (select *

                        from   sd_res_id_case@res r2

                        where  :"SYS_B_0" = :"SYS_B_1" and

                               ROWID in

                                 (select   MIN(ROWID)

                                  from     sd_res_id_case@res r1

                                  where    r1.delete_flag = :"SYS_B_2" and

                                           r1.is_school_check = :"SYS_B_3" and

                                           r1.subject_id is not null and

                                           r1.version_id is not null and

                                           r1.textbook_code is not null

                                  group by r1.school_id

                                  having   COUNT(*) >= :"SYS_B_4")) r

                       left join base_school t on r.school_id = t.school_id

                where  :"SYS_B_5" = :"SYS_B_6" and t.county_id = :1) a

        where  ROWNUM < :"SYS_B_7")

where  ROWNUM <= :2

Quanwen Zhao

Hello,Paulzip

I've no idea about the value of colon behind,such as ':"SYS_B_6"' or ':2' and etc.

Best Regards,

Quanwen Zhao

Mustafa KALAYCI

small addition to what Paul said, what indexes do you have on sd_res_id_case table at remote? maybe an index that includes is_school_check and delete_flag could be increase your performance.

Paulzip

I can only go by what you give me. Those are parameters, you can still explain plan with parameters, or simply replace with typical values. I don't have your db tables, data so you need to make that decision.

Mustafa KALAYCI

also one more thing: ":"SYS_B_5" = :"SYS_B_6"" what is this? if there is a condition to run this sql check it in your applicaiton first like using an if statement then run the query.

Billy Verreynne

Quanwen Zhao wrote:

Hello,Paulzip

I've no idea about the value of colon behind,such as ':"SYS_B_6"' or ':2' and etc.

These are bind variables. Likely either string, or numeric.

You can (in SQL*Plus) use the VAR command to define these bind variables, prior to running the EXPLAIN PLAN command.

AndrewSayer

Mustafa KALAYCI wrote:

also one more thing: ":"SYS_B_5" = :"SYS_B_6"" what is this? if there is a condition to run this sql check it in your applicaiton first like using an if statement then run the query.

Oracle will use those as the names for bind variables it replaces literals with when shared_cursor is force/similar.

The presense of the t.county_id = :1 suggests that actually the application is able to use bind variables so there should be no reason for shared_cursor to be anything other than exact.

AndrewSayer

Quanwen Zhao wrote:

Hello,Paulzip

I've no idea about the value of colon behind,such as ':"SYS_B_6"' or ':2' and etc.

Best Regards,

Quanwen Zhao

Someone should know, probably the person who's running this silly SQL?

As I've already said, the SYS_B.. are from forced cursor sharing, this means:

SELECT  * 

FROM    sd_res_id_case@res r2 

WHERE   :"SYS_B_0" =:"SYS_B_1" 

Really means something like

select * from table where 1= 1

Or possibly where 1=0

As you can already see, your live monitor and your plan are different, that's because the plan is an explain plan and obviously can't be trusted.

AND rowid IN 

        SELECT  MIN(rowid)

MIN(rowid)?!?!?! that probably isn't what whoever wrote this silly SQL think it is.

And if it is, then that whole bit can probably be written as an analytic, something like (untested)

FROM 

        ( 

                SELECT  subject_id  

                       ,version_id  

                       ,year_mark   

                       ,textbook_code 

                       ,row_number() over (partition by school_id order by rowid) rown

                       ,count(*) over (partition by school_id) cnt

                FROM    sd_res_id_case@res r

                WHERE   :"SYS_B_0" =:"SYS_B_1" 

                AND     r.delete_flag        =:"SYS_B_2" 

                AND     r.is_school_check=:"SYS_B_3" 

                AND     r.subject_id    IS NOT NULL 

                AND     r.version_id    IS NOT NULL 

                AND     r.textbook_code IS NOT NULL 

        ) r where r.rown = 1 and r.cnt >= :"SYS_B_4"    

Will mean that you only goto that remote table once

I'm confident that those literal comparisons can be done better

I'm confident that cursor_sharing should be exact for this particular SQL.

Mustafa KALAYCI

Hi Andrew,

I know about bind variables what I tried to ask is why 2 bind variables equality is being checked on SQL. probably in sql there is a line

where ...

and :my_number = :my_Second_number

etc.

I am asking to OP why she/he doesn't check this on application instead adding this check to sql. checking two variables can be done by an if in application no need for sql.

AndrewSayer

Mustafa KALAYCI wrote:

Hi Andrew,

I know about bind variables what I tried to ask is why 2 bind variables equality is being checked on SQL. probably in sql there is a line

where ...

and :my_number = :my_Second_number

etc.

I am asking to OP why she/he doesn't check this on application instead adding this check to sql. checking two variables can be done by an if in application no need for sql.

Don't worry, I wouldn't doubt you knowing about bind variables. It just wasn't clear if the actual names of the bind variables had stuck out as being created by forced cursor sharing to replace literals that were actually submitted.

Quanwen Zhao

Andrew Sayer 撰写:

Quanwen Zhao wrote:

Hello,Paulzip

I've no idea about the value of colon behind,such as ':"SYS_B_6"' or ':2' and etc.

Best Regards,

Quanwen Zhao

Someone should know, probably the person who's running this silly SQL?

As I've already said, the SYS_B.. are from forced cursor sharing, this means:

SELECT *

FROM sd_res_id_case@res r2

WHERE :"SYS_B_0" =:"SYS_B_1"

Really means something like

select * from table where 1= 1

Or possibly where 1=0

As you can already see, your live monitor and your plan are different, that's because the plan is an explain plan and obviously can't be trusted.

AND rowid IN

(

SELECT MIN(rowid)

MIN(rowid)?!?!?! that probably isn't what whoever wrote this silly SQL think it is.

And if it is, then that whole bit can probably be written as an analytic, something like (untested)

FROM

(

SELECT subject_id

,version_id

,year_mark

,textbook_code

,row_number() over (partition by school_id order by rowid) rown

,count(*) over (partition by school_id) cnt

FROM sd_res_id_case@res r

WHERE :"SYS_B_0" =:"SYS_B_1"

AND r.delete_flag =:"SYS_B_2"

AND r.is_school_check=:"SYS_B_3"

AND r.subject_id IS NOT NULL

AND r.version_id IS NOT NULL

AND r.textbook_code IS NOT NULL

) r where r.rown = 1 and r.cnt >= :"SYS_B_4"

Will mean that you only goto that remote table once

I'm confident that those literal comparisons can be done better

I'm confident that cursor_sharing should be exact for this particular SQL.

Thank you very much,Andrew Sayer

Your advices is very nice!

Because I am a SQL beginner.Afterwards,i found the value of my oracle db parameter cursor_sharing is FORCE.

Therefore,there has existed this similar value (such as :"SYS_B_1") in SQL via my EMCC 12c.

Next,i changed parameter cursor_sharing to be EXACT.

After a period of time,i found a similar SQL statement same as above and all of this value (such as :"SYS_B_1") have become exact number(such as 0 or 1).

At the same time,its SQL_ID has become '4gd0562wb2n13', not original '83gn36c1fu9dw'.

Please see my SQL statement found below,

SELECT  *

FROM

        (

                SELECT  *

                FROM

                        (

                                SELECT  t.school_id  ,

                                        t.school_name,

                                        t.photo_path ,

                                        r.subject_id ,

                                        r.version_id ,

                                        r.year_mark  ,

                                        r.textbook_code

                                FROM

                                        (

                                                SELECT  *

                                                FROM    sd_res_id_case@res r2

                                                WHERE   1          =1

                                                        AND rowid IN

                                                        (

                                                                SELECT  MIN(rowid)

                                                                FROM    sd_res_id_case@res r1

                                                                WHERE   r1.delete_flag        =0

                                                                        AND r1.is_school_check=2

                                                                        AND r1.subject_id    IS NOT NULL

                                                                        AND r1.version_id    IS NOT NULL

                                                                        AND r1.textbook_code IS NOT NULL

                                                                GROUP BY r1.school_id

                                                                HAVING COUNT(*)>=1

                                                        )

                                        )

                                        r

                                        LEFT JOIN base_school t

                                        ON      r.school_id = t.school_id

                                WHERE   1                   =1

                                        AND t.county_id     =:1

                        )

                        a

                WHERE   rownum<6

        )

WHERE   rownum <= :2;

And its execution plan as follows,

select * from table(dbms_xplan.display_cursor(sql_id => '4gd0562wb2n13'));

SQL_ID  4gd0562wb2n13, child number 0

-------------------------------------

select * from ( select * from(  select

t.school_id,t.school_name,t.photo_path,r.subject_id,r.version_id,r.year_

mark,r.textbook_code   from   (  select * from sd_res_id_case@res r2  

where 1=1 and  rowid  in (   select min(rowid) from sd_res_id_case@res

r1    where r1.delete_flag=0        and r1.is_school_check=2       and

r1.subject_id is not null        and r1.version_id is not null      

and r1.textbook_code is not null        group by r1.school_id having

count(*)>=1 )   )  r  left join base_school t on r.school_id =

t.school_id   where  1=1           and t.county_id=:1      ) a where

rownum<6 ) where rownum <= :2

Plan hash value: 2069821418

----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                       | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|

----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                |                    |       |       |   682K(100)|          |        |      |

|*  1 |  COUNT STOPKEY                  |                    |       |       |            |          |        |      |

|   2 |   VIEW                          |                    |     5 |  1525 |   682K  (1)| 02:16:29 |        |      |

|*  3 |    COUNT STOPKEY                |                    |       |       |            |          |        |      |

|*  4 |     HASH JOIN                   |                    |  1001 |  9805K|   682K  (1)| 02:16:29 |        |      |

|   5 |      TABLE ACCESS BY INDEX ROWID| BASE_SCHOOL        |   193 | 20651 |   133   (0)| 00:00:02 |        |      |

|*  6 |       INDEX RANGE SCAN          | IDX_BASE_COUNTY_ID |   195 |       |     4   (0)| 00:00:01 |        |      |

|   7 |      NESTED LOOPS               |                    |  1156K|  5473M|   749K (10)| 02:30:00 |        |      |

|   8 |       VIEW                      | VW_NSO_1           | 11147 |    42M|    86   (2)| 00:00:02 |        |      |

|   9 |        REMOTE                   |                    |       |       |            |          |    RES | R->S |

|  10 |       REMOTE                    | SD_RES_ID_CASE     |   578K|   529M|   341K  (1)| 01:08:13 |    RES | R->S |

----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter(ROWNUM<=:2)

   3 - filter(ROWNUM<6)

   4 - access("R2"."SCHOOL_ID"="T"."SCHOOL_ID")

   6 - access("T"."COUNTY_ID"=:1)

Remote SQL Information (identified by operation id):

----------------------------------------------------

   9 - SELECT MIN("A1".ROWID) FROM "SD_RES_ID_CASE" "A1" WHERE "A1"."DELETE_FLAG"=0 AND

       "A1"."IS_SCHOOL_CHECK"=2 AND "A1"."SUBJECT_ID" IS NOT NULL AND "A1"."VERSION_ID" IS NOT NULL AND

       "A1"."TEXTBOOK_CODE" IS NOT NULL GROUP BY "A1"."SCHOOL_ID" HAVING COUNT(*)>=1 (accessing 'RES' )

  10 - SELECT ROWID,"SUBJECT_ID","TEXTBOOK_CODE","SCHOOL_ID","VERSION_ID","YEAR_MARK" FROM "SD_RES_ID_CASE"

        "R2" WHERE ROWID=:1 (accessing 'RES' )

Remote Oracle DB's execution plan as follows,

select * from table(dbms_xplan.display_cursor(sql_id => '7pbcm2rqbp7hs'));

SQL_ID  7pbcm2rqbp7hs, child number 0

-------------------------------------

SELECT MIN("A1".ROWID) FROM "SD_RES_ID_CASE" "A1" WHERE

"A1"."DELETE_FLAG"=0 AND "A1"."IS_SCHOOL_CHECK"=2 AND "A1"."SUBJECT_ID"

IS NOT NULL AND "A1"."VERSION_ID" IS NOT NULL AND "A1"."TEXTBOOK_CODE"

IS NOT NULL GROUP BY "A1"."SCHOOL_ID" HAVING COUNT(*)>=1

Plan hash value: 766909583

----------------------------------------------------------------------------------------------

| Id  | Operation           | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT    |                |       |       |       |   613K(100)|          |

|*  1 |  FILTER             |                |       |       |       |            |          |

|   2 |   HASH GROUP BY     |                | 11148 |  1284K|  1156M|   613K  (1)| 02:02:40 |

|*  3 |    TABLE ACCESS FULL| SD_RES_ID_CASE |  8880K|   999M|       |   530K  (1)| 01:46:05 |

----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter(COUNT(*)>=1)

   3 - filter(("A1"."IS_SCHOOL_CHECK"=2 AND "A1"."DELETE_FLAG"=0 AND

              "A1"."SUBJECT_ID" IS NOT NULL AND "A1"."VERSION_ID" IS NOT NULL AND

              "A1"."TEXTBOOK_CODE" IS NOT NULL))

Wait a moment,according to your rewrite SQL method(Analytic function),i will try it.

By the way,current SQL about execution plan using sql monitor to capture has been executed successfully and another about only post execution plan has not been executed successfully.Sorry,i had descripted unclearly last time.

Best Regards,

Quanwen Zhao

Quanwen Zhao

Paulzip 撰写:

Can you paste the execution plan for this...

select /*+ driving_site(sd_res_id_case) */ *

from (select *

from (select t.school_id,

t.school_name,

t.photo_path,

r.subject_id,

r.version_id,

r.year_mark,

r.textbook_code

from (select *

from sd_res_id_case@res r2

where :"SYS_B_0" = :"SYS_B_1" and

ROWID in

(select MIN(ROWID)

from sd_res_id_case@res r1

where r1.delete_flag = :"SYS_B_2" and

r1.is_school_check = :"SYS_B_3" and

r1.subject_id is not null and

r1.version_id is not null and

r1.textbook_code is not null

group by r1.school_id

having COUNT(*) >= :"SYS_B_4")) r

left join base_school t on r.school_id = t.school_id

where :"SYS_B_5" = :"SYS_B_6" and t.county_id = :1) a

where ROWNUM < :"SYS_B_7")

where ROWNUM <= :2

Thank you,Paulzip

I changed parameter cursor_sharing to be EXACT,from then on, this value (such as :"SYS_B_1") have not existed except real Bind Variable(such as :1 and :2).

Afterwards,i use SQLT to generate serial REPORT for SQL above(another SQL_ID is '4gd0562wb2n13' not originally),at the same time,i see some the value of bind variable ':1' and ':2'.

According to your advice,i rewrite an NEW SQL below,

select /*+ driving_site(sd_res_id_case) */ * from

( select * from(

select t.school_id,t.school_name,t.photo_path,r.subject_id,r.version_id,r.year_mark,r.textbook_code from

(

select * from sd_res_id_case@res r2

where 1=1 and rowid in (

select min(rowid) from sd_res_id_case@res r1

where r1.delete_flag=0

  and r1.is\_school\_check=2

  and r1.subject\_id is not null

  and r1.version\_id is not null

  and r1.textbook\_code is not null

  group by r1.school\_id having count(\*)>=1 )

) r left join base_school t on r.school_id = t.school_id

where 1=1

and t.county_id='36288003491df4b9014931007cb30b15') a where rownum<6 )

where rownum <= 1000;

It spent 128.804 seconds to execute successfully and explain plan as follows,

1.png

Next,i try to execute another SQL lonely(a part of full SQL above to remote db ) on remote db,

SELECT MIN("A1".ROWID)

FROM "SD_RES_ID_CASE" "A1"

WHERE "A1"."DELETE_FLAG" =0

    AND "A1"."IS\_SCHOOL\_CHECK"=2

    AND "A1"."SUBJECT\_ID"    IS NOT NULL

    AND "A1"."VERSION\_ID"    IS NOT NULL

    AND "A1"."TEXTBOOK\_CODE" IS NOT NULL

GROUP BY "A1"."SCHOOL_ID"

HAVING COUNT(*)>=1;

It spent 29.324 seconds to execute successfully and its explain plan as follows,

2.png

By the way,i executed (no hint) original SQL and spent 125.656 seconds to complete and its explain plan as follows(Comparison convenience).

3.png

Finally,i found that explain plan of adding HINT or not seemed to be same.Please Note.

Best Regards,

Quanwen Zhao

AndrewSayer

Explain plan <> execution plan

One tells you how Oracle executed the statement, the other is explain plan.

Quanwen Zhao

Andrew Sayer 撰写:

Explain plan <> execution plan

One tells you how Oracle executed the statement, the other is explain plan.

Hi,Andrew Sayer

According to your guide,my latest post to Paulzip should be explain plan(because i press F5 in my PL/SQL Developer after SQL statement executed successfully),

actually spending time execute that SQL is short and my explain plan show very long,it estimated not correctly?

Now, how to obtain real execution plan ? Thank you.

Best Regards,

Quanwen Zhao

Jonathan Lewis

There are a couple of problems that might make this a little difficult.

First - you've provided a list from the SQL Monitor screen that shows many copies of the query taking thousands of seconds, but the specific example you've picked to show the actual execution plan reports a run-time of 51 seconds.  So the first question is: was it the concurrent execution of about 30 copies of the query that resulted in massive (I/O and/or Network) contention at the remote database that caused a problem - this seems fairly unlikely but should be considered - or was it a different plan that resulted in the difference in performance.

Second - the optimizer has clearly run into internal consistency issues while optimising the query.  You'll notice that the VIEW VW_NSO_1 line of the SQL Monitor plan predicts 11K rows - which means the optimizer should allow something like that as the number of executions of the second REMOTE operation, which has a cost of 341K; but the cost of the nested loop is 750K rather than 3.75M  - which means that (at some code point) the optimizer thinks it's going to run the remote operation roughly 2.2 times. Then the cost of the hash join is less than the cost of the second of its inputs, and just happens to be twice the 341K reported for the second remote operation.  The upshot of all that is that you're probably going to have to tell the optimizer (through hints, outlines, patches or baselines) exactly how you want it to run the query because it may not be able to do a sensible cost calculation however appropirate the object statistics are)

Third - your LEFT JOIN is eliminated by the predicate "t.county_id = :1", but the presence of the predicate ":SYS_B_0 = :SYS_B_1" (which you've later noted came from the predicate "1 = 1" suggests that possibly this code is constructed in the front-end and the predicate on count_id may be something that appears in some cases and doesn't appear in others - perhaps the solution you get to for this specific query will be inappropriate for other varations so you should investigate how this query is generated.

Fourth - The optimizer doesn't pull histograms from remote databases when calculating cardinalities and you have a couple of remote predicates that might be candidates for histograms on the remote site. This means the local optimizer may expect the remote SQL to behave in one way while the remote optimizer sees its histograms and does something unexpected.

As far as optimisation is concerned - the 51 second plan you supplied shows that a large fraction of the time was the full scan and sort aggregate at the remote site, with another large fraction of the time being the repeated (nested loop) revisit to the remote site. It's not obvious that you will be able to do much about the aggregation but you should be able to eliminate the 39K roundtrips  -- Andrew's suggestion about turning that bit of the query into a single visit to the table with analytic call may help, alternatively PaulZip's driving_site() approach may be sufficient.

Regards

Jonathan Lewis

Cookiemonster76

Your outer-join isn't an outer-join because you've specified that the optional table (base_school) must have a particular value in the where clause.

If you really want an outer-join you need to move the "t.county_id='36288003491df4b9014931007cb30b15'" into the ON clause.

Otherwise make it an inner-join.

Quanwen Zhao

Quanwen Zhao 撰写:

Andrew Sayer 撰写:

Explain plan <> execution plan

One tells you how Oracle executed the statement, the other is explain plan.

Hi,Andrew Sayer

According to your guide,my latest post to Paulzip should be explain plan(because i press F5 in my PL/SQL Developer after SQL statement executed successfully),

actually spending time execute that SQL is short and my explain plan show very long,it estimated not correctly?

Now, how to obtain real execution plan ? Thank you.

Best Regards,

Quanwen Zhao

Hello,Andrew Sayer

I use this method aboutt 'DISPLAY_CURSOR()' and SQL SCRIPT 'awrsqrpt.sql' to generate execution plan of SQL adding HINT.

Please see my diagram below,

(1) DISPLAY_CURSOR(),

1.png

2.png

3.png

(2) awrsqrpt.sql,

4.png

5.png

Please Note.

Best Regards,

Quanwen Zhao

Jonathan Lewis

Regarding the driving_site hint - it should use the ALIAS of a table held at the site you want to handle the query; in your case this could be either r1 or r2.

Technically the alias should also include the query block name - but with your cascading inline views it's not immediately obvious what the initial alias name would be. If you can pull the execution plan from memory with the "outline" or "alias" sections you would find references to things like 'r2@sel$1", or "r1@sel$12cd3a87f" which would give you the fully qualified alias.

Regards

Jonathan Lewis

Sven W.

--- snip --- already mentioned.

Quanwen Zhao

Jonathan Lewis 撰写:

There are a couple of problems that might make this a little difficult.

First - you've provided a list from the SQL Monitor screen that shows many copies of the query taking thousands of seconds, but the specific example you've picked to show the actual execution plan reports a run-time of 51 seconds. So the first question is: was it the concurrent execution of about 30 copies of the query that resulted in massive (I/O and/or Network) contention at the remote database that caused a problem - this seems fairly unlikely but should be considered - or was it a different plan that resulted in the difference in performance.

Second - the optimizer has clearly run into internal consistency issues while optimising the query. You'll notice that the VIEW VW_NSO_1 line of the SQL Monitor plan predicts 11K rows - which means the optimizer should allow something like that as the number of executions of the second REMOTE operation, which has a cost of 341K; but the cost of the nested loop is 750K rather than 3.75M - which means that (at some code point) the optimizer thinks it's going to run the remote operation roughly 2.2 times. Then the cost of the hash join is less than the cost of the second of its inputs, and just happens to be twice the 341K reported for the second remote operation. The upshot of all that is that you're probably going to have to tell the optimizer (through hints, outlines, patches or baselines) exactly how you want it to run the query because it may not be able to do a sensible cost calculation however appropirate the object statistics are)

Third - your LEFT JOIN is eliminated by the predicate "t.county_id = :1", but the presence of the predicate ":SYS_B_0 = :SYS_B_1" (which you've later noted came from the predicate "1 = 1" suggests that possibly this code is constructed in the front-end and the predicate on count_id may be something that appears in some cases and doesn't appear in others - perhaps the solution you get to for this specific query will be inappropriate for other varations so you should investigate how this query is generated.

Fourth - The optimizer doesn't pull histograms from remote databases when calculating cardinalities and you have a couple of remote predicates that might be candidates for histograms on the remote site. This means the local optimizer may expect the remote SQL to behave in one way while the remote optimizer sees its histograms and does something unexpected.

As far as optimisation is concerned - the 51 second plan you supplied shows that a large fraction of the time was the full scan and sort aggregate at the remote site, with another large fraction of the time being the repeated (nested loop) revisit to the remote site. It's not obvious that you will be able to do much about the aggregation but you should be able to eliminate the 39K roundtrips -- Andrew's suggestion about turning that bit of the query into a single visit to the table with analytic call may help, alternatively PaulZip's driving_site() approach may be sufficient.

Regards

Jonathan Lewis

Hello,Jonathan Lewis

About Four points you have mentioned to my issue,i do interpretation at this moment.

First, My post SQL Monitor screen content is actually concurrent execution situation.

Second,Third and Fourch, About details of you analyzed my execution plan,i seemed to have understood a little.

Due to local oracle db server's optimizer has not accurately estimated COST from REMOTE DB,as a result,execution plan has not been right ?

Execution plan of REMOTE DB's SQL statement part has actually shown 'TABLE ACCESS FULL',spending most time is from this ? If it is,i need to create composite INDEX(because WHERE cause has delete_flag = 0 AND is_school_check =2 AND subject_id IS NOT NULL AND version_id IS NOT NULL AND textbook_code IS NOT NULL) of COLUMN 'delete_flag' and 'is_school_check' ?

Best Regards,

Quanwen Zhao

Quanwen Zhao

Jonathan Lewis 撰写:

Regarding the driving_site hint - it should use the ALIAS of a table held at the site you want to handle the query; in your case this could be either r1 or r2.

Technically the alias should also include the query block name - but with your cascading inline views it's not immediately obvious what the initial alias name would be. If you can pull the execution plan from memory with the "outline" or "alias" sections you would find references to things like 'r2@sel$1", or "r1@sel$12cd3a87f" which would give you the fully qualified alias.

Regards

Jonathan Lewis

Hi,Jonathan Lewis

According to your advices,i use HINT adding ALIAS 'r1' and 'r2' to query,but execution plan hasn't changed much than my post above.

Please see my snapshot below,

7.png

8.png

9.png

10.png

Best Regards,

Quanwen Zhao

Quanwen Zhao

I attach my full AWR report (that SQL using DBLINK and another similar SQL executed period),content as follows,

WORKLOAD REPOSITORY report for

DB NameDB IdInstanceInst numStartup TimeReleaseRAC
ORCL293701130036orcl29105-Jul-17 21:0711.2.0.4.0NO

Host NamePlatformCPUsCoresSocketsMemory (GB)
orcl29Linux x86 64-bit192968504.63

Snap IdSnap TimeSessionsCursors/Session
Begin Snap:1194127-Sep-17 07:00:342432.1
End Snap:1194227-Sep-17 08:00:522340.1
Elapsed: 60.31 (mins)
DB Time: 1,994.32 (mins)

Report Summary

Load Profile

Per SecondPer TransactionPer ExecPer Call
DB Time(s):33.188.10.140.20
DB CPU(s):7.419.60.030.04
Redo size (bytes):2,940.07,833.8
Logical read (blocks):39,265.6104,624.8
Block changes:4.712.6
Physical read (blocks):26.370.1
Physical write (blocks):1.12.9
Read IO requests:26.269.9
Write IO requests:0.71.9
Read IO (MB):0.20.6
Write IO (MB):0.00.0
User calls:166.7444.2
Parses (SQL):163.9436.6
Hard parses (SQL):0.00.1
SQL Work Area (MB):19.050.7
Logons:16.143.0
Executes (SQL):237.6633.2
Rollbacks:0.00.0
Transactions:0.4

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %:100.00Redo NoWait %:100.00
Buffer Hit %:99.93In-memory Sort %:100.00
Library Hit %:99.98Soft Parse %:99.98
Execute to Parse %:31.04Latch Hit %:99.99
Parse CPU to Parse Elapsd %:91.31% Non-Parse CPU:99.82

Top 10 Foreground Events by Total Wait Time

Event
padders

> /* +driving_site(r2.sd_res_id_case) */

Your general hint syntax appears incorrect in your last example. You need to be precise or your hints will typically be ignored.

You seem (unless it is a formatting/copy-paste issue) to have a space between '/*' and '+', this is not valid hint syntax.

Also your driving site table reference looks incorrect, you have...

DRIVING_SITE (table_alias.table_name)

Query block hint syntax (as Jonathan recommended, get query block alias from explain plan) would be...

DRIVING_SITE (table_aliast@query_block_alias)

Global hint syntax (a possibly alternative assuming you explicitly alias in-line views) would be...

DRIVING_SITE (view_alias.view_alias.table_alias)

Jonathan Lewis

Quanwen Zhao wrote:

Hello,Jonathan Lewis

About Four points you have mentioned to my issue,i do interpretation at this moment.

First, My post SQL Monitor screen content is actually concurrent execution situation.

Second,Third and Fourch, About details of you analyzed my execution plan,i seemed to have understood a little.

Due to local oracle db server's optimizer has not accurately estimated COST from REMOTE DB,as a result,execution plan has not been right ?

Execution plan of REMOTE DB's SQL statement part has actually shown 'TABLE ACCESS FULL',spending most time is from this ? If it is,i need to create composite INDEX(because WHERE cause has delete_flag = 0 AND is_school_check =2 AND subject_id IS NOT NULL AND version_id IS NOT NULL AND textbook_code IS NOT NULL) of COLUMN 'delete_flag' and 'is_school_check' ?

Best Regards,

Quanwen Zhao

Quanwen Zhao,

First point - I was aware that your SQL Monitor screen was showing multiple executions, that's why I wrote: "So the first question is: was it the concurrent execution of about 30 copies of the query that resulted in massive (I/O and/or Network) contention at the remote database that caused a problem - this seems fairly unlikely but should be considered - or was it a different plan that resulted in the difference in performance."

The AWR report shows that my comment about "fairly unlikely" was wrong; a huge fraction of the time was spent on waits for: "SQL*Net message from dblink" so possibly running 30 concurrent copies of the query does result in each copy running about 100 times slower than the default time.  One short-term optimisation strategy, therefore, would be to limit the number of concurrent copies of the query that can run to (somewhere between 5 and 10).

Given the particular wait that is costing so much time the 3 other points I made aren't worth pursuing.  My comment about using the driving site hint is important - but you still weren't using it correctly; as a first attempt you could try /*+ driving_site(r2) */; if this doesn't change the plan then go to the query block where you select from sd_res_id_case@res aliased as r2 and insert immediately after the SELECT keyword in that query /*+ qb_name(inline_q1) */, then in the first select you can put the fully qualified alias:  /*+ driving_site(r2@inline_q1) */

If that still doesn't work use dbms_xplan.display_cursor({sql id},{child number},'alias outline')) to pull the plan from memory showing us the actual aliases finally used by the optimizer, then we can use those aliases to force the driving site.

There are other courses of action - mainly you need to get the two sets of access to sd_res_id_case happening at the same time in the remote database; better still if the small set of rows from base_school is sent to the remote site and the join result sent back - that's what we should see with the driving_site hint. If you follow Andrew's suggestion about rewriting part of the query using an analytic approach that will at least eliminate the thousands of single row selects that are responsible for most of the SQL*Net message from dblink.

I note from the AWR report that you have two other statement that are (probably) very similar to the one you sent us, and another pair that look as if they may be accessing the sd_res_id_case table just once, or are accessing it twice as a single remote query and not causing the same high volume of SQL*Net message from dblink.

Similar statements:  7ux4576yrws9u, 4gd0562wb2n13

Well-behaved statements: bqp4hjj3r5gqg, 8ja3a8b4m04xz

You may want to look at these as well.

Regards

Jonathan Lewis

Quanwen Zhao

Hi,Jonathan Lewis

According to your guide,i use HINT of three methods ordinally '/*+ driving_site(r2) */','/*+ driving_site(r2@sel$5) */' and '/*+ driving_site(r2@qb_name(inline_q1) */' to test that long SQL statement,but all of execution plans have also no changed much.

My snapshot as follows,

(1) USING /*+ driving_site(r2) */

1.png

(2) USING /*+ driving_site(r2@sel$5) */

SYS@orcl29> set pagesize 3000

SYS@orcl29> set linesize 3000

SYS@orcl29> select * from table(dbms_xplan.display_cursor(sql_id => '7wh2t143w87p7',format => 'alias outline'));

4.png

3.png

(3) USING /*+ driving_site(r2@qb_name(inline_q1) */

5.png

And all of my execution plans has all the same.

6.png

By the way,data size of two tables as follows,

SZD_BASE_V2@base> select count(*) from base_school;

COUNT(*)

----------------

700478

SZD_RESOURCE_V2@res> select count(*) from sd_res_id_case;

COUNT(*)

----------------

15003728

Please Note.

Best Regards,

Quanwen Zhao

Jonathan Lewis

Quanwen Zhao,

That's strange - two of the three versions you tried look correct:  /*+ driving_site(r2@sel$5) */ and /*+ driving_site(t2@inline_q1) */.

Another simple strategy to test - make the query block sel$5 (inline_q1) non-mergeable. We really want, as a first step at least, to stop the nested loop between the two copies of sd_res_id_case from being split into two separate remote operations. Putting the hint /*+ no_merge */ into that query block - or modifying the hinted form to read /*+ qb_name(inline_q1) no_merge */ - should do the job.

There are a couple of things that could stop the default path from making the nested loop a single remote operation - but they don't seem to be present in your query.  Are any of the columns in the sd_res_id_case table CLOBs, nested tables, object_type, or virtual; is there anything in the slightest bit "uncommon" about the definition of the table. Thinking along those lines anyway - your query as written has a "select *" in it the optimizer should be able to project out the columns it needs but it's worth trying to change "*" to the minimum list of explicitly needed columns.

Could you supply the rest of the "outline" section of the output just in case that supplies a clue.

Regards

Jonathan Lewis

[UPDATE:] one silly little thought, just in case but it's probably irrelevant: if you're logged in to the SYS account to do your testing the driving_site hint won't work.

Quanwen Zhao

Jonathan Lewis 撰写:

Quanwen Zhao,

That's strange - two of the three versions you tried look correct: /*+ driving_site(r2@sel$5) */ and /*+ driving_site(t2@inline_q1) */.

Another simple strategy to test - make the query block sel$5 (inline_q1) non-mergeable. We really want, as a first step at least, to stop the nested loop between the two copies of sd_res_id_case from being split into two separate remote operations. Putting the hint /*+ no_merge */ into that query block - or modifying the hinted form to read /*+ qb_name(inline_q1) no_merge */ - should do the job.

There are a couple of things that could stop the default path from making the nested loop a single remote operation - but they don't seem to be present in your query. Are any of the columns in the sd_res_id_case table CLOBs, nested tables, object_type, or virtual; is there anything in the slightest bit "uncommon" about the definition of the table. Thinking along those lines anyway - your query as written has a "select *" in it the optimizer should be able to project out the columns it needs but it's worth trying to change "*" to the minimum list of explicitly needed columns.

Could you supply the rest of the "outline" section of the output just in case that supplies a clue.

Regards

Jonathan Lewis

Oh,My full part of execution plan as follows,

SYS@orcl29> select * from table(dbms_xplan.display_cursor(sql_id => '7wh2t143w87p7',format => 'alias outline'));

7.png

Query Block Name / Object Alias (identified by operation id):

----------------------------------------------------------------

1 - SEL$1

2 - SEL$AB3BE8D5 / from$_subquery$_001@SEL$1

3 - SEL$AB3BE8D5

5 - SEL$AB3BE8D5 / T@SEL$3

6 - SEL$AB3BE8D5 / T@SEL$3

8 - SEL$CE1D94FA / VW_NSO_1@SEL$AB3BE8D5

9 - SEL$CE1D94FA

10 - SEL$AB3BE8D5 / R2@SEL$5

Outline Data

-------------

/*+

  BEGIN\_OUTLINE\_DATA

  IGNORE\_OPTIM\_EMBEDDED\_HINTS

  OPTIMIZER\_FEATURES\_ENABLE('11.2.0.4')

  DB\_VERSION('11.2.0.4')

  OPT\_PARAM('\_optimizer\_null\_aware\_antijoin' 'false')

  OPT\_PARAM('\_optimizer\_extended\_cursor\_sharing' 'none')

  OPT\_PARAM('\_optimizer\_extended\_cursor\_sharing\_rel' 'none')

  OPT\_PARAM('\_optimizer\_adaptive\_cursor\_sharing' 'false')

  OPT\_PARAM('\_optimizer\_use\_feedback' 'false')

  ALL\_ROWS

  OUTLINE\_LEAF(@"SEL$CE1D94FA")

  OUTLINE\_LEAF(@"SEL$AB3BE8D5")

  UNNEST(@"SEL$6" UNNEST\_NOSEMIJ\_NODISTINCTVIEW)

  OUTLINE\_LEAF(@"SEL$1")

  OUTLINE(@"SEL$6")

  OUTLINE(@"SEL$B871075F")

  MERGE(@"SEL$9C26C05F")

  OUTLINE(@"SEL$2")

  OUTLINE(@"SEL$9C26C05F")

  OUTER\_JOIN\_TO\_INNER(@"SEL$8D3EA582" "T"@"SEL$3")

  OUTLINE(@"SEL$8D3EA582")

  MERGE(@"SEL$8885838C")

  OUTLINE(@"SEL$7")

  OUTLINE(@"SEL$8885838C")

  MERGE(@"SEL$3")

  MERGE(@"SEL$5")

  OUTLINE(@"SEL$4")

  OUTLINE(@"SEL$3")

  OUTLINE(@"SEL$5")

  NO\_ACCESS(@"SEL$1" "from$\_subquery$\_001"@"SEL$1")

  NO\_ACCESS(@"SEL$AB3BE8D5" "VW\_NSO\_1"@"SEL$AB3BE8D5")

  FULL(@"SEL$AB3BE8D5" "R2"@"SEL$5")

  INDEX\_RS\_ASC(@"SEL$AB3BE8D5" "T"@"SEL$3" ("BASE\_SCHOOL"."COUNTY\_ID"))

  LEADING(@"SEL$AB3BE8D5" "VW\_NSO\_1"@"SEL$AB3BE8D5" "R2"@"SEL$5" "T"@"SEL$3")

  USE\_NL(@"SEL$AB3BE8D5" "R2"@"SEL$5")

  USE\_HASH(@"SEL$AB3BE8D5" "T"@"SEL$3")

  SWAP\_JOIN\_INPUTS(@"SEL$AB3BE8D5" "T"@"SEL$3")

  FULL(@"SEL$CE1D94FA" "R1"@"SEL$6")

  END\_OUTLINE\_DATA

*/

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter(ROWNUM<=1000)

3 - filter(ROWNUM<6)

4 - access("R2"."SCHOOL_ID"="T"."SCHOOL_ID")

6 - access("T"."COUNTY_ID"='36288003491df4b9014931007cb30b15')

Remote SQL Information (identified by operation id):

----------------------------------------------------

9 - SELECT MIN("A1".ROWID) FROM "SD_RES_ID_CASE" "A1" WHERE "A1"."DELETE_FLAG"=0 AND

   "A1"."IS\_SCHOOL\_CHECK"=2 AND "A1"."SUBJECT\_ID" IS NOT NULL AND "A1"."VERSION\_ID" IS NOT NULL AND

   "A1"."TEXTBOOK\_CODE" IS NOT NULL GROUP BY "A1"."SCHOOL\_ID" HAVING COUNT(\*)>=1 (accessing 'RES' )

10 - SELECT ROWID,"SUBJECT_ID","TEXTBOOK_CODE","SCHOOL_ID","VERSION_ID","YEAR_MARK" FROM "SD_RES_ID_CASE"

    "R2" WHERE ROWID=:1 (accessing 'RES' )

Best Regards,

Quanwen Zhao

Quanwen Zhao

Hi,Jonathan Lewis

Has my issue related to REMOTE DB's query ?Because i found that execution plan is FULL SCAN,therefore i try to create composite index.

SZD_RESOURCE_V2@res> create index idx_id_case_delete_school on sd_res_id_case(delete_flag,is_school_check) online;

Index created.

---------------------------------------

SELECT MIN(rowid)

FROM sd_res_id_case

WHERE delete_flag = 0

AND is_school_check = 2

AND subject_id IS NOT NULL

AND version_id IS NOT NULL

AND textbook_code IS NOT NULL

GROUP BY school_id

HAVING COUNT(*)>=1;

---------------------------------------

8.png

Two columns 'delete_flag' and 'is_school_check' have respective index,and i created composite index and rerun SQL query above,buy my execution plan has also FULL SCAN. Please Note.

Best Regards,

Quanwen Zhao

Jonathan Lewis

Quanwen Zhao,

I don't think the main problem is the full tablescan.

At some point you need to make the data access on the remote database as efficient as possible, but I think the biggest problem of scalability is the traffic between the two databases and that problem is caused by the way your two accesses to sd_res_id_case appear as two separate operations.

I got curious about the problem - the driving_site hint didn't do what it should have - so I created a model of your query and found that the defect could be related to your use of ROWNUM.  In your query you have an odd "rownum <= 1000" in one inline view followed immediately by "rownum <= 6" in the next - but that second rownum preidcate makes the first rownum predicate redundant.

In my model if I took out the "rownum <= 1000" (i.e. just the first rownum predicate) the driving_site hint worked as expected and I got a plan that executed the query block that we've labelled inline_q1 as a single remote operation. (The no_merge hint I mentioned in the previous post ought to do the same.)

Try the no_merge or eliminating the first rownum clause and see what the plan looks like. It should have only one remote operation. You will still be doing the tablescan and aggregation, so a serial test will probably not change much in its performance; but if you run up 30 sessions all executing the same query I hope you'll see that concurrent execution scales much better.

Regards

Jonathan Lewis

Quanwen Zhao

Jonathan Lewis 撰写:

Quanwen Zhao,

I don't think the main problem is the full tablescan.

At some point you need to make the data access on the remote database as efficient as possible, but I think the biggest problem of scalability is the traffic between the two databases and that problem is caused by the way your two accesses to sd_res_id_case appear as two separate operations.

I got curious about the problem - the driving_site hint didn't do what it should have - so I created a model of your query and found that the defect could be related to your use of ROWNUM. In your query you have an odd "rownum <= 1000" in one inline view followed immediately by "rownum <= 6" in the next - but that second rownum preidcate makes the first rownum predicate redundant.

In my model if I took out the "rownum <= 1000" (i.e. just the first rownum predicate) the driving_site hint worked as expected and I got a plan that executed the query block that we've labelled inline_q1 as a single remote operation. (The no_merge hint I mentioned in the previous post ought to do the same.)

Try the no_merge or eliminating the first rownum clause and see what the plan looks like. It should have only one remote operation. You will still be doing the tablescan and aggregation, so a serial test will probably not change much in its performance; but if you run up 30 sessions all executing the same query I hope you'll see that concurrent execution scales much better.

Regards

Jonathan Lewis

Hi,Jonathan Lewis

I add HINT 'no_merge' following your advice,this time my execution plan has changed than my previous post.

I guess that these HINT(the main thing is DRIVING_SITE) take effect finally(ORACLE have not captured data from REMOTE DB and only pulled out LOCAL DB's data to REMOTE and last return data to LOCAL).

It's so amazing!

Please see my snapshot below,

10.png

It spent only 25.785 seconds to query successfully.

11.png

Please Note.

Best Regards,

Quanwen Zhao

Jonathan Lewis
Answer

That's the plan I was hoping to see from the no_merge. The key feature is that it not longer does thousands for round-trips to join the two appearances of sd_res_id_case. Most of the time now would be in the tablescan and aggregation of sd_res_id_case at the remote database.

The driving_site hint is not having any effect - you can tell this from the fact that part of the plan that should be about the no_merge'd references to sd_res_id_case appears as "REMOTE"; if the query had been resolved by the remote database the plan would have been displayed from the perspective of the remote database so the part about table BASE_SCHOOL would be labelled REMOTE and you would have see the actual operation lines for the actions against sd_res_id_case.  You should also see this from the plan section "Remote SQL information" - it will be about sd_res_id_case.

I think the plan will change again to operate remotely if you eliminate the first of the two rownum predicates - but I doubt if it will make a lot of difference to the run time.

Regards

Jonathan Lewis

Marked as Answer by Quanwen Zhao · Sep 27 2020
Quanwen Zhao

Jonathan Lewis 撰写:

That's the plan I was hoping to see from the no_merge. The key feature is that it not longer does thousands for round-trips to join the two appearances of sd_res_id_case. Most of the time now would be in the tablescan and aggregation of sd_res_id_case at the remote database.

The driving_site hint is not having any effect - you can tell this from the fact that part of the plan that should be about the no_merge'd references to sd_res_id_case appears as "REMOTE"; if the query had been resolved by the remote database the plan would have been displayed from the perspective of the remote database so the part about table BASE_SCHOOL would be labelled REMOTE and you would have see the actual operation lines for the actions against sd_res_id_case. You should also see this from the plan section "Remote SQL information" - it will be about sd_res_id_case.

I think the plan will change again to operate remotely if you eliminate the first of the two rownum predicates - but I doubt if it will make a lot of difference to the run time.

Regards

Jonathan Lewis

How to find out REMOTE SQL information ?

Best Regards,

Quanwen Zhao

padders

> How to find out REMOTE SQL information

In the plan table where OPERATION = 'REMOTE' the remote statement can be found in the OTHER column.

Quanwen Zhao

padders 撰写:

> How to find out REMOTE SQL information

In the plan table where OPERATION = 'REMOTE' the remote statement can be found in the OTHER column.

Hi,padders

Thanks for your advice.

My check process as follows,

(1) Query from LOCAL DB.

1.png

2.png

3.png

(2) Query from REMOTE DB.

1.png

2.png

3.png

Although,i have solved about DBLINK's issue currently.

Now,how to solve REMOTE's FULL SCAN(composite index about COLUMN is_school_check and delete_flag i created has not been taken effect) ?

Best Regards,

Quanwen Zhao

Jonathan Lewis

The first thing to note is that the optimizer's estimate is 8.8M rows, aggregating to 11k rows.

Is this fairly accurate - especially the 8.8M rows bit.

select count(*) from sd_res_id_case where is_school_check = 2 and delete_flag = 0;

then

select count(*) from sd_res_id_case where ... all five predicates.

There are three possibilies:

a) the index is a good choice, but Oracle doesn't have sufficiently helpful statistics yet.

b) you need better indexing - perhaps a clever function-based index that captures all the data you need for the aggregate and join column

c) the volume of data is so large that the tablescan is the best option

Regards

Jonathan Lewis

Paulzip

What's the definition of the composite index?

I would create it as

IS_SCHOOL_CHECK, DELETE_FLAG, <the other columns>.

However, it just might not be selective enough as Jonathan is saying with his count(*) comments.

Quanwen Zhao

Jonathan Lewis 撰写:

The first thing to note is that the optimizer's estimate is 8.8M rows, aggregating to 11k rows.

Is this fairly accurate - especially the 8.8M rows bit.

select count(*) from sd_res_id_case where is_school_check = 2 and delete_flag = 0;

then

select count(*) from sd_res_id_case where ... all five predicates.

There are three possibilies:

a) the index is a good choice, but Oracle doesn't have sufficiently helpful statistics yet.

b) you need better indexing - perhaps a clever function-based index that captures all the data you need for the aggregate and join column

c) the volume of data is so large that the tablescan is the best option

Regards

Jonathan Lewis

Jonathan Lewis

Following your guidance,my query process as follows,

1.png

2.png

3.png

I found that COLUMN subject_id AND version_id have NULL value,

4.png

Best Regards,

Quanwen Zhao

Jonathan Lewis

So you really are selecting 10 million rows from 15 million - there's no point in using an indexed access path to get to the table data.

In theory you could create an index on ALL the columns in that subquery (which means including the school_id) and the optimizer could either use an "INDEX FAST FULL SCAN" instead of a tablescan before doing the hash aggregation, or (if the index starts with the school_id) an INDEX FULL SCAN followed by a SORT GROUP BY (NOSORT) operation. Either of these MIGHT be faster than the using the tablescan - but probably not by very much, and with an extra resource cost for DML on the table, and a risk that some queries might pick the index when they shouldn't

Regards

Jonathan Lewis

Quanwen Zhao

Jonathan Lewis 撰写:

So you really are selecting 10 million rows from 15 million - there's no point in using an indexed access path to get to the table data.

In theory you could create an index on ALL the columns in that subquery (which means including the school_id) and the optimizer could either use an "INDEX FAST FULL SCAN" instead of a tablescan before doing the hash aggregation, or (if the index starts with the school_id) an INDEX FULL SCAN followed by a SORT GROUP BY (NOSORT) operation. Either of these MIGHT be faster than the using the tablescan - but probably not by very much, and with an extra resource cost for DML on the table, and a risk that some queries might pick the index when they shouldn't

Regards

Jonathan Lewis

According to your analyze details previous post,

i haven' need to gather statistics about table SD_RES_IS_CASE(8.8M rows seem to equal 9969053 rows) ? Optimizer estimate accurately ?

Thus,no using INDEX maybe best option ?

Best Regards,

Quanwen Zhao

Jonathan Lewis

Quanwen Zhao wrote:

According to your analyze details previous post,

i haven' need to gather statistics about table SD_RES_IS_CASE(8.8M rows seem to equal 9969053 rows) ? Optimizer estimate accurately ?

Thus,no using INDEX maybe best option ?

Best Regards,

Quanwen Zhao

Yes, that's what it looks like to me.

Regards

Jonathan Lewis

Quanwen Zhao

Jonathan Lewis wrote:

Quanwen Zhao wrote:

According to your analyze details previous post,

i haven' need to gather statistics about table SD_RES_IS_CASE(8.8M rows seem to equal 9969053 rows) ? Optimizer estimate accurately ?

Thus,no using INDEX maybe best option ?

Best Regards,

Quanwen Zhao

Yes, that's what it looks like to me.

Regards

Jonathan Lewis

Thanks,Jonathan Lewis

Following your advice,i did a test about creating a composite INDEX of 6 COLUMNS(include SCHOOL_ID).

My process as follows,

SZD_RESOURCE_V2@res> set timing on;

SZD_RESOURCE_V2@res> create index idx_id_case_si_df_isc_si_vi_tc on sd_res_id_case(school_id,delete_flag,is_school_check,subject_id,version_id,textbook_code) online;

Index created.

Elapsed: 00:03:15.57

SZD_RESOURCE_V2@res> set linesize 200

SZD_RESOURCE_V2@res> set pagesize 200

SZD_RESOURCE_V2@res> col index_name for a35

SZD_RESOURCE_V2@res> col index_type for a10

SZD_RESOURCE_V2@res> col table_name for a30

SZD_RESOURCE_V2@res> col table_owner for a20

SZD_RESOURCE_V2@res> select index_name,index_type,table_name,table_owner from user_indexes where table_owner='SZD_RESOURCE_V2' and index_name like '%SI_DF_ISC_SI_VI_TC%';

INDEX_NAME                          INDEX_TYPE TABLE_NAME                     TABLE_OWNER

----------------------------------- ---------- ------------------------------ --------------------

IDX_ID_CASE_SI_DF_ISC_SI_VI_TC      NORMAL     SD_RES_ID_CASE                 SZD_RESOURCE_V2

SYS@orcl28> select * from table(dbms_xplan.display_cursor('26zufnu50huh4'));

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  26zufnu50huh4, child number 0

-------------------------------------

SELECT /*+ NO_MERGE QB_NAME ("INLINE_Q1") */

"A1"."SUBJECT_ID","A1"."TEXTBOOK_CODE","A1"."SCHOOL_ID","A1"."VERSION_ID

","A1"."YEAR_MARK" FROM  (SELECT MIN("A3".ROWID) "MIN(ROWID)" FROM

"SD_RES_ID_CASE" "A3" WHERE "A3"."DELETE_FLAG"=0 AND

"A3"."IS_SCHOOL_CHECK"=2 AND "A3"."SUBJECT_ID" IS NOT NULL AND

"A3"."VERSION_ID" IS NOT NULL AND "A3"."TEXTBOOK_CODE" IS NOT NULL

GROUP BY "A3"."SCHOOL_ID" HAVING COUNT(*)>=1) "A2","SD_RES_ID_CASE"

"A1" WHERE "A1".ROWID="A2"."MIN(ROWID)"

Plan hash value: 815553515

----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name                           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                                |       |       |       |   160K(100)|          |

|   1 |  NESTED LOOPS               |                                | 11148 |  1382K|       |   160K  (1)| 00:32:02 |

|   2 |   VIEW                      |                                | 11148 |   130K|       |   148K  (1)| 00:29:48 |

|*  3 |    FILTER                   |                                |       |       |       |            |          |

|   4 |     HASH GROUP BY           |                                | 11148 |  1284K|  1159M|   148K  (1)| 00:29:48 |

|*  5 |      INDEX FAST FULL SCAN   | IDX_ID_CASE_SI_DF_ISC_SI_VI_TC |  8902K|  1001M|       | 65840   (1)| 00:13:11 |

|   6 |   TABLE ACCESS BY USER ROWID| SD_RES_ID_CASE                 |     1 |   115 |       |     1   (0)| 00:00:01 |

----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - filter(COUNT(*)>=1)

   5 - filter(("A3"."IS_SCHOOL_CHECK"=2 AND "A3"."DELETE_FLAG"=0 AND "A3"."SUBJECT_ID" IS NOT NULL AND

              "A3"."VERSION_ID" IS NOT NULL AND "A3"."TEXTBOOK_CODE" IS NOT NULL))

32 rows selected.

And execution plan actually change 'TABLE ACCESS FULL' to 'INDEX FAST FULL SCAN' this time.

Please Note.

Best Regards,

Quanwen Zhao

Quanwen Zhao

Andrew Sayer wrote:

Quanwen Zhao wrote:

Hello,Paulzip

I've no idea about the value of colon behind,such as ':"SYS_B_6"' or ':2' and etc.

Best Regards,

Quanwen Zhao

Someone should know, probably the person who's running this silly SQL?

As I've already said, the SYS_B.. are from forced cursor sharing, this means:

SELECT *

FROM sd_res_id_case@res r2

WHERE :"SYS_B_0" =:"SYS_B_1"

Really means something like

select * from table where 1= 1

Or possibly where 1=0

As you can already see, your live monitor and your plan are different, that's because the plan is an explain plan and obviously can't be trusted.

AND rowid IN

(

SELECT MIN(rowid)

MIN(rowid)?!?!?! that probably isn't what whoever wrote this silly SQL think it is.

And if it is, then that whole bit can probably be written as an analytic, something like (untested)

FROM

(

SELECT subject_id

,version_id

,year_mark

,textbook_code

,row_number() over (partition by school_id order by rowid) rown

,count(*) over (partition by school_id) cnt

FROM sd_res_id_case@res r

WHERE :"SYS_B_0" =:"SYS_B_1"

AND r.delete_flag =:"SYS_B_2"

AND r.is_school_check=:"SYS_B_3"

AND r.subject_id IS NOT NULL

AND r.version_id IS NOT NULL

AND r.textbook_code IS NOT NULL

) r where r.rown = 1 and r.cnt >= :"SYS_B_4"

Will mean that you only goto that remote table once

I'm confident that those literal comparisons can be done better

I'm confident that cursor_sharing should be exact for this particular SQL.

Hello,Andrew Sayer

Although Janathan Lewis gave me best suggestion about optimize this DBLINK's SQL,i wanna use this method about analytic function you mentioned(ONLY TEST),

and i have puzzled about what you said,

where does SQL you provided from 'FROM statement' to 'r.cnt >= :"SYS_B_4"' put my original SQL ?

please see my diagram as follows,

2.png

in first or second FROM statement ?

1.png

Thank you very much!

Best Regards,

Quanwen Zhao

1 - 42
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Oct 2 2017
Added on Sep 4 2017
4 comments
976 views