DevOps, CI/CD and Automation

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Problem with MD5 checksums of pre-built SOA & BPM virtual machines

993422Feb 26 2013

Hello,

I'm trying to download the lastest pre-built virtual machines for the SOA & BPM suite and I'm finding troubles to merge and extract the virtual machine. MD5 checksum of all files except 007 fail. I have tried downloading the files with different browsers and even different operating systems.

This is the url where I got the links from: http://www.oracle.com/technetwork/middleware/soasuite/learnmore/vmsoa-172279.html

Any idea what's going on ?

Thanks in advance

John Brady - UK

First, why do you think you know more about the "best join order" between your tables than Oracle does? That is precisely what the Optimizer does automatically for you - it considers all possible joins between the tables in the query (up to a limit) and uses the execution plan with the lowest total cost. Thus the Optimizer will automatically consider different join orders and join methods, and use those with the lowest cost.

The only time I'm really aware of when this doesn't work as expected is when the statistics on your tables don't agree with the actual data in the tables (out of date statistics), or you have heavily skewed data distributions and the same execution plan is used with different bind variable values.

Second, the "leading" hint tells Oracle to access those tables in that order within the execution plan. So any execution plan produced will always start with A, then do B, then C, then D, then the other tables. So your "leading" hint is stopping the Optimizer from considering joining table G until after the other 4 tables have been accessed in the execution plan.

The simplest solution is to remove the "leading" hint (and any others you might be considering) and let the Optimizer consider all the possible joins between the tables in the query. Then it will naturally pick the table access sequence and join methods that have the lowest cost. If joining A to G early on produces a lower cost execution plan, then the Optimizer should find it and use it.

I'd strongly recommend avoiding all such hints that force specific execution plans, unless you have very strong and overriding reasons for doing so. And even then I'd still try and avoid hints, because you have to know more than the Optimizer does to produce the "right" plan and to make sure that your hints ALWAYS produce the plan you want in all possible circumstances.

2 people found this helpful

JohnWatson2

You initial assumption is not usually correct,

start with the table and condition which has the least number of rows and keep the large tables for join at the end.

When calculating a join order you need to start not with the smallest table, and not with the table that has the least number of rows after applying filters. You should start with the table that has the lowest proportion of rows remaining after applying filters. This is known as "the filtered rows percentage method", brilliantly documented here,

https://www.amazon.com/Oracle-Performance-Tuning-Optimization-Cardinalities/dp/1501022695

The way to look at it is, for example, that if you have a dimension table of only four rows and a constant filter which cuts it down to 1 (25%), you will still (on average) have to look at one quarter of a huge fact table. So even if the filter on the huge fact table returns a thousand rows, it will be quicker to start with that if that one thousand is a smaller percentage of the total fact table.

JohnWatson2

Here's an example:

orclx>
orclx> set autot trace exp
orclx> select * from emp natural join dept where dname='SALES' and job='MANAGER';
Execution Plan
----------------------------------------------------------
Plan hash value: 3625962092
----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time    |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |        |    1 |    58 |    6  (0)| 00:00:01 |
|  1 |  NESTED LOOPS                |        |    1 |    58 |    6  (0)| 00:00:01 |
|  2 |  NESTED LOOPS              |        |    3 |    58 |    6  (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL        | EMP    |    3 |  114 |    3  (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN        | PK_DEPT |    1 |      |    0  (0)| 00:00:01 |
|*  5 |  TABLE ACCESS BY INDEX ROWID| DEPT    |    1 |    20 |    1  (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  3 - filter("EMP"."JOB"='MANAGER')
  4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
  5 - filter("DEPT"."DNAME"='SALES')
Note
-----
  - this is an adaptive plan
orclx>

Uncle Oracle is starting with emp, even though emp id the bigger table and it expects to get three rows after applying the filter. This is because 3 as a proportion of 14 is lower then the 1 out of 4 proportion that it expects to get from dept.

mtefft

> I'd strongly recommend avoiding all such hints that force specific execution plans, unless you have very strong and overriding reasons for doing so. And even then I'd still try and avoid hints, because you have to know more than the Optimizer does to produce the "right" plan and to make sure that your hints ALWAYS produce the plan you want in all possible circumstances.

Absolutely agree.

I discourage the developers I work with from trying to hint their way to 'a better plan'.

If you try to do this, you need to be able to imagine a better plan than Oracle is providing: this takes work. And then get Oracle to follow your ideas. And then verify that the result was really worth the effort and fragility that your hints introduced. Much better is to find an Oracle-produced plan (and ASH can help you verify its efficiency), and use a baseline or one of the other plan-stability methods to enforce it.

Too often, a developer thinks they can just throw a single hint into the mix and goodness happens.

Jonathan Lewis

The problem with that argument is that it's implicitly assuming the join is a nested loop.

In your case a nested loop from 1,000 rows of fact table to 4 rows of dimension may well be better than a nested loop from one row of the dimension table to 1/4 of the fact table with filter; but that means you can identify the 1,000 efficiently anyway, so a hash join that builds with the dimension and probes with the fact may be more efficient than either nested loop.

When you consider hash joins you have to be careful with the expression "starts with" - it becomes ambiguous.

Regards

Jonathan Lewis

I'm going to ignore the choice of order for A D C E in this question because it's aimed at the problem of adding one more table to an existing query - and there is clearly a very good reason why the OP should want to make Oracle do something other than its chosen strategy for that last step.

The 4 table join produces 483,000 rows; the single table access to G produces 94 rows. Although volume of data is more important than number of rows when considering a hash join I think it's safe to assume that the 94 rows acquired from 56 buffers will have a lower volume than 483,000 rows (which - according to the stats required 39MB to build in memory).

If the plan for the first 4 tables is good, then it makes sense for the OP to ask the question about how to change Oracle's choice on the fifth.

Regards

Jonathan Lewis

Answer

Like everyone else who has answered so far, I encourage people to avoid hinting. Mainly because they usually don't do it well. In your case, for example, you haven't specified any join methods so Oracle could follow your hints and still change the plan dramatically from the one you expect. If you think you've hinted the best plan then capture an SQL Plan Baseline for it so that you can see how many more hints you need to make it a stable plan.

To answer the detail of your question, though: you still want G to be the last table in the join order, but you want Oracle to use it as the probe build table in a hash join, so you need to add G at the end of your leading() hint, specifiy a hash join, and specity swap_join_inputs() the last hint will put G at the top of the plan while still joining to it as the last table.

/*+ leading(A D C E G) use_hash(G) swap_join_inputs(G) */

Regards

Jonathan Lewis

Update: I wrote "probe" when I should have said "build": the thing that appears first as you read down the plan from the "hash join " operator is the build table, the second is the probe table.

Marked as Answer by Sekar_BLUE4EVER · Sep 27 2020

Sekar_BLUE4EVER

Thanks for all the response. I get that it is better to leave the Join order and join type for the Oracle optimizer to select based on the statistics. The idea behind adding the hint to manually force the join order was that if we know the business behind the table data and what kind of data will be there in each table then isn't it better to manually force the join order and type ? Because I have ran into issues where sometimes oracle picks up the right plan (the one that runs faster ) but sometimes it takes too long . For example I know that a table will return a lot of rows after the filter and it was used at the start of a complex join query in Nested loops join. So in that case doesn't it make sense to manually "delay" the join operation as long as possible by moving it to the end of joins?

unknown-7404

I get that it is better to leave the Join order and join type for the Oracle optimizer to select based on the statistics.

Correct - but there is more to the process than that

1. initially you should let Oracle do its thing and not try to solve problems that don't exist - that is, if it ain't broke don't fix it

2. do your testing

3. if the results don't meet the required SLA (service level agreement) then start your troubleshooting.

For example I know that a table will return a lot of rows after the filter and it was used at the start of a complex join query in Nested loops join. So in that case doesn't it make sense to manually "delay" the join operation as long as possible by moving it to the end of joins?

And the 'issue' is that you do NOT know what Oracle is going to do until you let it do it and then examine the execution plan.

The standard troubleshooting steps are:

1. identify a problem/issue to be addressed

2. confirm that the problem/issue REALLY exists

3. identify potential solutions that will eliminate/mitigate the problem/issue

4. select a small number (2 or 3) of those 'solutions' for prototyping and testing

5. select the BEST solution from those tested.

Having an SLA is an important part of step #1. Almost any query/process can be made 'faster' - so 'make it faster' is NOT an appropriate goal. You need to have a SPECIFIC performance target.

That target will be based on things like: 1) number of rows expected, 2) response time needed for the FIRST rows (or maybe ALL rows) to be returned, 3) the number of times the query/process is executed (once a day? or every five minutes?), 4) the effect that 'tuning' this process might have on other queries/processes (adding an index might help this query but could cause problems with other queries that used to work ok).

So what everyone meant is that generally you should let Oracle do its thing and only get involved if there is a CONFIRMED problem with what Oracle did. Even then the problem/issue can often be resolved without using hints.

1 - 9

Locked Post

New comments cannot be posted to this locked post.

Locked on Mar 26 2013

Added on Feb 26 2013

#downloads-issues, #error

0 comments

1,066 views

DevOps, CI/CD and Automation

Problem with MD5 checksums of pre-built SOA & BPM virtual machines

Comments

Post Details