1 2 Previous Next 17 Replies Latest reply: Mar 1, 2012 2:47 AM by Jonathan Lewis Go to original post RSS
      • 15. Re: E-rows and A-rows dont match
        dm_ptldba
        Randolf,
        Thanks for your insight. Definitely some fresh thoughts that I can use. Following are my feedback to your 4 pointers -

        1. I will tyr with star transformation turned off and get back to you on results

        2. The exists clause is the security peice for the data warehouse, that needs to be a part of the queries. Do you think you can suggest a better way to approach it ?

        3. Can you please guide me the way to approach to identify the potential columns for histograms

        Thank you,
        Dm
        • 16. Re: E-rows and A-rows dont match
          Randolf Geist
          dm_ptldba wrote:
          2. The exists clause is the security peice for the data warehouse, that needs to be a part of the queries. Do you think you can suggest a better way to approach it ?
          Since I don't know your environment and the logic behind the security I can only base my recommendation on what I see from your query. The clause seems trying to emulate a procedural IF THEN ELSE logic using a bad design, because a particular value stored in the security table seems to indicate "no restriction", and in that case you basically want to see all rows from the fact table where the CHARGING_LOCATION_KEY IS NOT NULL (are you aware of this implicit filter and want it that way?) else you want to filter on the actual values stored in the security table.

          Depending on your application architecture I would recommend the following:

          1. Ideally the application constructing the query should know upfront that the user issuing the query doesn't require the restriction and simply omit the clause in such a case. Then your EXISTS clause can be simplified to a regular join predicate if a user actually requires restriction - and this simplified clause is eligible for rewrite / transformation.

          Of course you can achieve the same simplification by storing all valid location codes for the user in the "no restriction" case, but that has other implications, in terms of maintainability and potentially optimizer issues.

          2. If you really need to do that inside the database you could evaluate the Virtual Private Database (VPD) option. In the policy function of the VPD you could check that condition and depending on the result of the check add the EXISTS clause dynamically or leave the statement as it is. But VPD does have other side effects, in particular on the parse and library cache activity, so that is something that needs to be evaluated and tested carefully.

          3. If you want to execute something like that efficiently in SQL you would need rewrite this using an OR that gets transformed by a concatenation transformation into something more efficient:
          ... exists (select null from ptl_edw.s_location where s_location.ssoid = '600028988' and PTL_EDWPERF.F_JOB_OIL_COST.CHARGING_LOCATION_KEY = s_location.location_master_key) 
          or exists (select null from from ptl_edw.s_location where s_location.ssoid = '600028988' and s_location.location_master_key = -99999)
          Unfortunately the CBO at present isn't particularly clever at such OR EXISTS constructs and therefore doesn't transform as desired.

          Hence you would end up with a manual rewrite:
          <the remaining query goes here>
          ...
          and exists (select null from ptl_edw.s_location where s_location.ssoid = '600028988' and PTL_EDWPERF.F_JOB_OIL_COST.CHARGING_LOCATION_KEY = s_location.location_master_key) 
          and not exists (select null from from ptl_edw.s_location where s_location.ssoid = '600028988' and s_location.location_master_key = -99999)
          union all
          <the remaining query goes here>
          ... 
          and exists (select null from from ptl_edw.s_location where s_location.ssoid = '600028988' and s_location.location_master_key = -99999)
          This looks ugly but allows Oracle to:

          1. Transform the first EXISTS check into a semi-join or other potentially more efficient forms
          2. Evaluate the second NOT EXISTS early which basically short-cuts the execution of that branch of the plan
          3. Evaluate the third EXISTS early which basically short-cuts the execution of that branch of the plan

          Furthermore the two subqueries using the constant literal check will be executed at most once and therefore should be efficient.

          The whole point of this exercise is emulating a conditional execution of two branches of an execution plan depending on the existence of your "no restriction" entry in the security table:

          Here are two sample Rowsource Profiles of such an execution plan:

          1. Actual restriction applies:
          --------------------------------------------------------------------
          | Id  | Operation               | Name  | Starts | E-Rows | A-Rows |
          --------------------------------------------------------------------
          |   0 | SELECT STATEMENT        |       |      1 |        |      0 |
          |   1 |  COUNT                  |       |      1 |        |      0 |
          |*  2 |   FILTER                |       |      1 |        |      0 |
          |   3 |    VIEW                 |       |      1 |   2000 |   1000 |
          |   4 |     UNION-ALL           |       |      1 |        |   1000 |
          |*  5 |      FILTER             |       |      1 |        |   1000 |
          |   6 |       NESTED LOOPS SEMI |       |      1 |   1000 |   1000 |
          |   7 |        TABLE ACCESS FULL| T1    |      1 |   1000 |   1000 |
          |*  8 |        INDEX UNIQUE SCAN| T2_PK |   1000 |   1000 |   1000 |
          |*  9 |       INDEX UNIQUE SCAN | T2_PK |      1 |      1 |      0 |
          |* 10 |      FILTER             |       |      1 |        |      0 |
          |  11 |       TABLE ACCESS FULL | T1    |      0 |   1000 |      0 |
          |* 12 |       INDEX UNIQUE SCAN | T2_PK |      1 |      1 |      0 |
          --------------------------------------------------------------------
          2. No restriction case:
          --------------------------------------------------------------------
          | Id  | Operation               | Name  | Starts | E-Rows | A-Rows |
          --------------------------------------------------------------------
          |   0 | SELECT STATEMENT        |       |      1 |        |      0 |
          |   1 |  COUNT                  |       |      1 |        |      0 |
          |*  2 |   FILTER                |       |      1 |        |      0 |
          |   3 |    VIEW                 |       |      1 |   2000 |   1000 |
          |   4 |     UNION-ALL           |       |      1 |        |   1000 |
          |*  5 |      FILTER             |       |      1 |        |      0 |
          |   6 |       NESTED LOOPS SEMI |       |      0 |   1000 |      0 |
          |   7 |        TABLE ACCESS FULL| T1    |      0 |   1000 |      0 |
          |*  8 |        INDEX UNIQUE SCAN| T2_PK |      0 |   1000 |      0 |
          |*  9 |       INDEX UNIQUE SCAN | T2_PK |      1 |      1 |      1 |
          |* 10 |      FILTER             |       |      1 |        |   1000 |
          |  11 |       TABLE ACCESS FULL | T1    |      1 |   1000 |   1000 |
          |* 12 |       INDEX UNIQUE SCAN | T2_PK |      1 |      1 |      1 |
          --------------------------------------------------------------------
          Check the STARTS column carefully. The FILTER operators (ID 5 and 10) execute the second child (ID 9 and 12) exactly once, and depending on the result the first child is executed or not (STARTS = 0). It's by the way again an interesting exception to the general rule of FILTER operations where the first child is supposed to drive the execution of the remaining child operations. Here the second child is obviously evaluated first.
          3. Can you please guide me the way to approach to identify the potential columns for histograms
          There is no simple answer to that question. Instead of going with SKEWONLY you could at least return to the default of SIZE AUTO. This way you avoid some of the unnecessary histograms, but potentially still end up with some of them that are useless or dangerous.

          Ideally you need to know your data and application respectively queries and understand which of them benefit from a histogram without negatively affecting other queries. Also depending on your data you might need to craft a few histograms manually so that they are helpful.

          Hope this helps,
          Randolf

          Edited by: Randolf Geist on Mar 1, 2012 9:03 AM

          Forgot to describe the simple case of adding all location codes
          • 17. Re: E-rows and A-rows dont match
            Jonathan Lewis
            dm_ptldba wrote:
            Jonathan,
            Appreciate you response to my question. I have been a fan of your blogs. They are a grest source of knowledge.

            My db version is 11.2.0.1
            Regarding the aggregation of summary -
            to you give a perspective, I have the join of the partition key to the time key in order to have partition purning. Otherwise, my actual query is and the corrsponding plan is -
            As a general principle, it's best if you supply the query that's causing the problem rather than a query that's similar, but critically different.

            >
            AND  ( PTL_EDWPERF.F_JOB_OIL_COST.CHARGING_LOCATION_KEY=D_LOCATION_DOMICILE.LOCATION_MASTER_KEY  )
            AND  ( exists (select 1 from ptl_edw.s_location where s_location.ssoid = '600028988' and PTL_EDWPERF.F_JOB_OIL_COST.CHARGING_LOCATION_KEY = decode(s_location.location_master_key, -99999, PTL_EDWPERF.F_JOB_OIL_COST.CHARGING_LOCATION_KEY, s_location.location_master_key))  )
            As Randolf has pointed out one of the critical loads in this query is the number of times the existence subquery runs. This may be happening because Oracle CAN'T unnest the subquery - I'd have to test the effect of the decode to be sure - but it may be happening because the cardinality is so badly wrong. A quick test on that would be to add a cardinality hint to the query /*+ cardinality (F_JOB_OIL_COST 53000000) */ - check that the predicted Rows shows 53M to see if you've hinted correctly.

            This strategy not work, of course - and perhaps it doesn't need to because looking at your query I think you could apply the subquery to the D_LOCATION_DOMICILE table, column LOCATION_MASTER_KEY - which would eliminate data earlier, rather than later, and require the subquery to run far fewer times.

            |*  8 |     HASH JOIN                          |                                |   530K|    32M|    31M|   182K  (1)| 00:42:37 |       |       |
            |   9 |      PARTITION RANGE ALL               |                                |   530K|    25M|       |   127K  (1)| 00:29:50 |     1 |   133 |
            |  10 |       TABLE ACCESS BY LOCAL INDEX ROWID| F_JOB_OIL_COST                 |   530K|    25M|       |   127K  (1)| 00:29:50 |     1 |   133 |
            |  11 |        BITMAP CONVERSION TO ROWIDS     |                                |       |       |       |            |          |       |       |
            ....
            |* 41 |      TABLE ACCESS FULL                 | D_UNIT_MASTER                  |  1794K|    23M|       | 51485   (2)| 00:12:01 |       |       |
            As well as changing the optimizer's treatment of the subquery, the cardinality hint should also cause the hash join between F_JOB_OIL_COST and D_UNIT_MASTER to switch, and this is another possible time-waster on your query. If you don't see the hash join swap order, then you could extract the outline for the query, and adjust the hints you see to make the hash join run the other way round.

            Regards
            Jonathan Lewis
            http://jonathanlewis.wordpress.com
            Author: <b><em>Oracle Core</em></b>
            1 2 Previous Next