10 Replies Latest reply on May 4, 2008 7:01 PM by Jonathan Lewis

    Normal Table vs Clustered hash Tables

    Tomy3k_Bakr
      Hi All,
      I have a situation here and i need your help with me to correct my decision.
      I have one table that will be used for frequent DML (Insert and Update) and SELECT. I want to make a comparison between creating this table as a Normal Table and Cluster version of it.
      The Normal table structure is as follows:
      CREATE TABLE OMS_ORDER
      (SL_NO NUMBER NOT NULL,
      ORDER_SIDE VARCHAR2(1),
      MARKET_SL_NO VARCHAR2(15),
      ORDER_DATE DATE,
      NIN VARCHAR2(60),
      C_ACCOUNT VARCHAR2(60),
      COMPANY_CODE VARCHAR2(10),
      VOLUME NUMBER NOT NULL,
      ACTUAL_VOLUME NUMBER DEFAULT 0,
      REMAIN_VOLUME NUMBER DEFAULT null,
      EXECUTED_VOLUME NUMBER DEFAULT 0,
      ORDER_TYPE VARCHAR2(1) NOT NULL,
      ORDER_PRICE NUMBER,
      AVG_PRICE NUMBER DEFAULT 0,
      ACTUAL_PRICE NUMBER DEFAULT 0,
      ORDER_STATUS NUMBER,
      ORDER_SOURCE VARCHAR2(10),
      REF_SL_NO NUMBER,
      MARKET VARCHAR2(4) DEFAULT 'QA',
      VALIDITY VARCHAR2(1) NOT NULL,
      MAX_FIX_SEQ NUMBER DEFAULT 0)
      /


      -- Constraints for OMS_ORDER

      ALTER TABLE OMS_ORDER
      ADD CONSTRAINT ORDER_PRIM_SLNO PRIMARY KEY (SL_NO)
      USING INDEX
      /

      The Cluster version DDL is as follows:
      CREATE CLUSTER OMS_ORD_CLS (
      SL_NO NUMBER(22,0))
      SIZE 72
      SINGLE TABLE
      HASHKEYS 5000011
      /

      CREATE TABLE OMS_ORDER_CLS
      (SL_NO NUMBER(22,0),
      ORDER_SIDE VARCHAR2(1),
      MARKET_SL_NO VARCHAR2(15),
      ORDER_DATE DATE,
      NIN VARCHAR2(60),
      C_ACCOUNT VARCHAR2(60),
      COMPANY_CODE VARCHAR2(10),
      VOLUME NUMBER NOT NULL,
      ACTUAL_VOLUME NUMBER DEFAULT 0,
      REMAIN_VOLUME NUMBER DEFAULT null,
      EXECUTED_VOLUME NUMBER DEFAULT 0,
      ORDER_TYPE VARCHAR2(1) NOT NULL,
      ORDER_PRICE NUMBER,
      AVG_PRICE NUMBER DEFAULT 0,
      ACTUAL_PRICE NUMBER DEFAULT 0,
      ORDER_STATUS NUMBER,
      ORDER_SOURCE VARCHAR2(10),
      REF_SL_NO NUMBER,
      MARKET VARCHAR2(4) DEFAULT 'QA',
      VALIDITY VARCHAR2(1) NOT NULL,
      MAX_FIX_SEQ NUMBER DEFAULT 0)
      CLUSTER OMS_ORD_CLS(SL_NO)
      /


      Which will be better using a normal table or a cluster one?
      A little note The PK is SL_NO wich is a unique number and all the queries will be like this
      SELECT * FROM OMS_ORDER_XXX WHERE SL_NO = XXXXXX;

      Please Help Me.

      Thanks in advance
        • 1. Re: Normal Table vs Clustered hash Tables
          Jonathan Lewis
          At first sight, this looks as if you have a good case for a single table hash clusters, but to come to the best conclusion you're going to have to spend a lot more time discussing what you want to do.

          First of all, you will still need to create the primary key constraint on the table, so the single table hash cluster won't save you space.

          Secondly, your size of 72 looks suspect - your row definition allows your rows to be much longer than 72 bytes, and you need to add about 20 bytes per row for cluster key overheads. If you go the hash cluster route, you should also make sure that your number columns are strictly defined (i.e. number(p,s)) where possible so that you don't run the risk of getting rounding errors from front end code that end up storing numbers to 22 byte precision.

          Thirdly, an "orders" table looks like one that is supposed to grow indefinitely, whereas hash clusters are optimised for pre-defined data limits.

          Note - if your SL_NO is supposed to be (very close to) a sequence number, then you probably want to consider using the 'hash is sl_no' option in the cluster definition.

          The optimium access for a single table hash cluster that's clusterd on a declared primary key is: 'single table by hash' - which uses a special buffer get mechanism to access the block, and can access the correct block by "arithmetic" rather than by walking down the primary key index. So the benefits you are after are a reduction in logical I/O, the absence of any need to keep the index buffered for table reads (although you will need to access it for inserts and deletes), and the removal of the possible "root block hot spot".

          The things that can go wrong are that you could get hash collisions (two primary keys hashing to the same hashkey value) and hash overflow (a block fills before it's had all the "expected" rows inserted, and a row has to go to an overflow block).

          When dealing with the first problem, Oracle can get to the right block quickly, but has to scan the entire block checking the hashkey of every row in that block, then checking the primary key: this increases the CPU used to get the right row, and is one of the main reasons you see Oracle using multiple block sizes in the TPC-C benchmarks.

          When dealing with the second problem, Oracle has to do a second buffer get - which increases contention and CPU, and could increase the physical I/O count. This is why you have to be very careful when setting the SIZE parameter.

          There's quite a lot more that could be said, but that's probably enough to let you start thinking about the costs and benefits you might see from using a single table hash cluster.


          Regards
          Jonathan Lewis
          http://jonathanlewis.wordpress.com
          http://www.jlcomp.demon.co.uk
          • 2. Re: Normal Table vs Clustered hash Tables
            sgalaxy
            You should use Hash Clustered Tables in the following circumstances:
            1) You know beforehand the number of rows your table will have or the upper bound of it. Getting the size of the HASHKEYs and SIZE parameters right is crucial to avoid a rebuild.
            2) The inserts mainly , should be less/lighter to data retrieval.Updates do not provide much overhead unless the hashkey is being updated...
            3) You access constantly the data by the hashkey value.... From your notes , you cover this 'requirement'.

            You should however make a test case for both alternatives and compare them....

            Greetings....
            Sim
            • 3. Re: Normal Table vs Clustered hash Tables
              Tomy3k_Bakr
              i made a small test. i wrote a program that inserts 500,000 rows in each table. Normal Insert is more faster than Cluster Insert. The result was that with normal insert procedure that will be used in our business we inserted around 317 rows/s but in cluster insert we insert around 250.6 rows/s. SQL_TRACE was opened in both the session and i want to paste them for you.
              Normal Insert:

              -----------------------------------------------------------------------------------------------------------
              INSERT INTO OMS_ORDER(SL_NO, ORDER_SIDE, ORDER_DATE, NIN, C_ACCOUNT, COMPANY_CODE, VOLUME,
              ORDER_TYPE, ORDER_PRICE, VALIDITY, ORDER_SOURCE, REF_SL_NO, MARKET)
              VALUES(:b13, :b12, :b11, :b10, :b9, :b8, :b7,
              :b6, :b5, :b4, :b3, :b2, NVL(:b1, 'QA'))

              call count cpu elapsed disk query current rows
              ------- ------ -------- ---------- ---------- ---------- ---------- ----------
              Parse 1 0.00 0.00 0 0 0 0
              Execute 500000 537.68 537.81 0 3837 3365204 500000
              Fetch 0 0.00 0.00 0 0 0 0
              ------- ------ -------- ---------- ---------- ---------- ---------- ----------
              total 500001 537.68 537.81 0 3837 3365204 500000

              Misses in library cache during parse: 0
              Optimizer goal: CHOOSE
              Parsing user id: 539 (recursive depth: 1)
              ---------------------------------------------------------------------------------------------------------
              for Cluster Insert:

              INSERT INTO OMS_ORDER_CLS(SL_NO, ORDER_SIDE, ORDER_DATE, NIN, C_ACCOUNT, COMPANY_CODE, VOLUME,
              ORDER_TYPE, ORDER_PRICE, VALIDITY, ORDER_SOURCE, REF_SL_NO, MARKET)
              VALUES(:b13, :b12, :b11, :b10, :b9, :b8, :b7,
              :b6, :b5, :b4, :b3, :b2, NVL(:b1, 'QA'))

              call count cpu elapsed disk query current rows
              ------- ------ -------- ---------- ---------- ---------- ---------- ----------
              Parse 1 0.01 0.01 0 3 0 0
              Execute 500000 457.00 929.43 44643 30 2007108 500000
              Fetch 0 0.00 0.00 0 0 0 0
              ------- ------ -------- ---------- ---------- ---------- ---------- ----------
              total 500001 457.01 929.44 44643 33 2007108 500000

              Misses in library cache during parse: 1
              Optimizer goal: CHOOSE
              Parsing user id: 539 (SHAMAN) (recursive depth: 1)

              so in cluster insert it will make additional disk reads i think for populating the cluster information.

              So Normal Insert is better than Cluster Insert was my conclusion.

              Regards;
              • 4. Re: Normal Table vs Clustered hash Tables
                108476
                Hi Tommy,
                Which will be better using a normal table or a cluster one?
                That depends on "how" the rows are accessed.

                For example, assume that I have an app that retrieves all orders for a customer, and that the average is 20 per customer.

                - In a "normal" table, new order could go on any data block, and it's unlikely that they would be co-located on a single block. Twenty orders, twenty trip the the database.

                - Using a sorted hash cluster, rows are co-located, allowing the app to fetch all orders in a single logical I/O.

                I have some useful notes here:

                http://www.dba-oracle.com/t_sorted_hash_clusters.htm

                --------------------

                Hope this helps. . .

                Donald K. Burleson
                Oracle Press author
                Author of "Oracle Tuning: The Definitive Reference":
                http://www.dba-oracle.com/bp/s_oracle_tuning_book.htm
                • 5. Re: Normal Table vs Clustered hash Tables
                  Tomy3k_Bakr
                  Hi Donald,
                  first of all thankx a lot for you response. Second, i'm using your website as a refrence to me before implementing any oracle feature. For my case only i think Normal Table will be more faster than Cluster Table, am i right?


                  Regards;
                  • 6. Re: Normal Table vs Clustered hash Tables
                    Tomy3k_Bakr
                    Hi Donald,
                    Can you please reply to this thread,
                    Restrict User Logon From Certain IP(s)
                    I realy need it if you can help me.
                    • 7. Re: Normal Table vs Clustered hash Tables
                      sgalaxy
                      "Can you please reply to this thread"
                      Irrelevant to this current thread.....
                      • 8. Re: Normal Table vs Clustered hash Tables
                        108476
                        Hi Tommy,
                        For my case only i think Normal Table will be more faster than Cluster Table, am i right?
                        It depends!

                        Cluster tables are great whn the vast majority of queries use the same index to retrieve multiple rows, and data block adjacency is important to reduce I/O.

                        What percentage of your production SQL accesses the rows in index sorted order?
                        • 9. Re: Normal Table vs Clustered hash Tables
                          Jonathan Lewis
                          >
                          For example, assume that I have an app that retrieves
                          all orders for a customer, and that the average is 20
                          per customer.

                          - Using a sorted hash cluster, rows are co-located,
                          allowing the app to fetch all orders in a single
                          logical I/O.

                          I have some useful notes here:

                          http://www.dba-oracle.com/t_sorted_hash_clusters.htm
                          The OP said the following in his original post:
                          A little note The PK is SL_NO wich is a unique number and all the queries will be like this
                          SELECT * FROM OMS_ORDER_XXX WHERE SL_NO = XXXXXX;


                          This means sorted hash clusters are irrelevent.

                          Regards
                          Jonathan Lewis
                          http://jonathanlewis.wordpress.com
                          http://www.jlcomp.demon.co.uk
                          • 10. Re: Normal Table vs Clustered hash Tables
                            Jonathan Lewis
                            Tomy3k_Bakr

                            This is expected behaviour.

                            I guess that your inserts were either using a pl/sql loop counter or a sequence number. If this is so, the heap table inserts would have been inserting several consecutive rows into the "next new" block in the table, and doing the same with the primary key index.

                            The cluster inserts would be inserting each row into "the right block" which would have been scattering them all over the place. Also, when you create a hash cluster, Oracle formats all the blocks before making the cluster available, so the inserts will initially have to read the appropriate block from disk, rather than just "new"ing it in memory.

                            In a very active environment, one of the considerations of optimising the use of hash clusters is to ensure that they stay 100% cached - so that the most significant cost of a block visit is a single "consistent get - examination".

                            Regards
                            Jonathan Lewis
                            http://jonathanlewis.wordpress.com
                            http://www.jlcomp.demon.co.uk