11 Replies Latest reply: Jan 15, 2013 6:46 AM by BluShadow RSS

    Hash Join

    Oracle Maniac
      Hi Friends,


      If i have a table T1 and a table T2. Table T1 is having 100 rows and table T2 is having 20 rows. When performing a Hash join ,which table should be used to make the hash table ,the larger one or the smaller one and why ?IF the data set is too small for considerations then please conser table T1 with 10 million of rows and table T2 with 1 million of rows.




      Thanks as always :)
        • 1. Re: Hash Join
          Stew Ashton
          Rahul  K wrote:
          ...When performing a Hash join ,which table should be used to make the hash table ,the larger one or the smaller one and why ?
          First of all, as a rule we should let the optimizer choose the join method and the best way to perform it, so your question is theoretical.

          I have read that the hash table is generally made from the smaller table, since it is meant to stay in memory. The rows from the larger table can then be read and either discarded or returned in the result set. See
          http://en.wikipedia.org/wiki/Hash_join
          • 2. Re: Hash Join
            damorgan
            If you are a developer: "The one the database optimizer chooses."

            If you are a student: "Read the docs ... we do not help people cheat on tests." http://tahiti.oracle.com
            • 3. Re: Hash Join
              Oracle Maniac
              Yes, its always left to the optimizer to chose. But when i went through the manual ,it said ,if you have some different requirement of data .i.e. for a better response time (we have to quicky display the top ten transactions of someone's bank statement) ,we can hint the optimizer to use Nested loop rather than hash join or sort merge.
              • 4. Re: Hash Join
                Stew Ashton
                I suggest using the FIRST_ROWS hint. It tells the optimizer what your objective is: to return the first rows quickly. It will then choose the solution itself, but keeping your objective in mind.
                • 5. Re: Hash Join
                  BluShadow
                  Rahul  K wrote:
                  Yes, its always left to the optimizer to chose. But when i went through the manual ,it said ,if you have some different requirement of data .i.e. for a better response time (we have to quicky display the top ten transactions of someone's bank statement) ,we can hint the optimizer to use Nested loop rather than hash join or sort merge.
                  You should never use joining or index hints in production code. That would be making the assumption that you as the developer know better than the oracle database how to get the data from data sets that change over time, and that your assumption will always be better in the future. That's ridiculous and is the cause of many people's 'performance issues' (which are usually fixed by simply removing such stupid hints). The hints are there to help analyse performance issues and identify where a database design is poor or where a query is written badly, but should most definitely not be considered a way to 'fix' performance issues.

                  Your original question is completely theoretical, and is lacking any useful substance to give an accurate answer.
                  The hash join could take place on the larger table, if the selectivity of data (due to conditions in your query) results in a smaller set of data than the other table.

                  The optimiser takes all things into account when determining the best way to join, and that decision may change in the future as the data changes on the tables.
                  • 6. Re: Hash Join
                    Oracle Maniac
                    So what I understand is :

                    1) Never use hints ...
                    2)If we don't have appropriate statistics on our object then the default values(rowlen,row number,distinct column values) chosen by the CBO can be inappropriate and hence the plan can be inappropriate.
                    3)Since COB transforms our query in a number of ways ,changes the join order,join method ,access paths etc so its quite logical that the optimizer would have checked for the strategy which we are asking the COB to use ,using hints .



                    Thanks :)
                    • 7. Re: Hash Join
                      BluShadow
                      Rahul  K wrote:
                      So what I understand is :

                      1) Never use hints ...
                      Never say never. There are some hints which are acceptable, such as FIRST_ROWS, APPEND etc. The ones that shouldn't be used in production (without damn good justification) are those that tell the optimizer how to join or access tables/data.
                      2)If we don't have appropriate statistics on our object then the default values(rowlen,row number,distinct column values) chosen by the CBO can be inappropriate and hence the plan can be inappropriate.
                      Yes, having stats up to date is important.
                      3)Since COB transforms our query in a number of ways ,changes the join order,join method ,access paths etc so its quite logical that the optimizer would have checked for the strategy which we are asking the COB to use ,using hints .
                      Not necessarily. The plan you create using hints may be soooo way off the mark, that the optimiser doesn't even evaluate it, based on the statistics on the tables etc. that it knows about.
                      • 8. Re: Hash Join
                        William Robertson
                        Well, the optimiser will ignore a hint if it's syntactically invalid or logically impossible. Otherwise it has to use it.
                        • 9. Re: Hash Join
                          BluShadow
                          William Robertson wrote:
                          Well, the optimiser will ignore a hint if it's syntactically invalid or logically impossible. Otherwise it has to use it.
                          Yes absolutely... but sorry I wasn't clear.

                          The OP was saying that he was considering the query he would have used with hints would have been one of the options the optimiser would evaluate when he provides the query without hints, and I was saying that no, that wouldn't necessarily be so. i.e. theoretically, the optimiser may evaluate, let's say, 20 different execution paths when you provide the query without hints, based on the statistics and other information it has. out of those 20 evaluated paths, 1 of them may, or may not be, the same path that would be evaluated if the query was provided with hints... but it depends how 'off the mark' the query with hints is from the information it has without hints.

                          That's what I was saying. ;)
                          • 10. Re: Hash Join
                            William Robertson
                            Ah, see what you meant. Agreed, if there are a million logically possible execution plans, the optimizer cannot be expected to evaluate every single one, and we would not generally want it to. Therefore a hint may make it consider something that would otherwise have been off its radar.
                            • 11. Re: Hash Join
                              BluShadow
                              William Robertson wrote:
                              Ah, see what you meant. Agreed, if there are a million logically possible execution plans, the optimizer cannot be expected to evaluate every single one, and we would not generally want it to. Therefore a hint may make it consider something that would otherwise have been off its radar.
                              That's a good way of phrasing it. :)