Forum Stats

  • 3,728,131 Users
  • 2,245,555 Discussions
  • 7,853,351 Comments

Discussions

Normal Table vs Clustered hash Tables

Tomy3k_Bakr
Tomy3k_Bakr Member Posts: 110 Blue Ribbon
Hi All,
I have a situation here and i need your help with me to correct my decision.
I have one table that will be used for frequent DML (Insert and Update) and SELECT. I want to make a comparison between creating this table as a Normal Table and Cluster version of it.
The Normal table structure is as follows:
CREATE TABLE OMS_ORDER
(SL_NO NUMBER NOT NULL,
ORDER_SIDE VARCHAR2(1),
MARKET_SL_NO VARCHAR2(15),
ORDER_DATE DATE,
NIN VARCHAR2(60),
C_ACCOUNT VARCHAR2(60),
COMPANY_CODE VARCHAR2(10),
VOLUME NUMBER NOT NULL,
ACTUAL_VOLUME NUMBER DEFAULT 0,
REMAIN_VOLUME NUMBER DEFAULT null,
EXECUTED_VOLUME NUMBER DEFAULT 0,
ORDER_TYPE VARCHAR2(1) NOT NULL,
ORDER_PRICE NUMBER,
AVG_PRICE NUMBER DEFAULT 0,
ACTUAL_PRICE NUMBER DEFAULT 0,
ORDER_STATUS NUMBER,
ORDER_SOURCE VARCHAR2(10),
REF_SL_NO NUMBER,
MARKET VARCHAR2(4) DEFAULT 'QA',
VALIDITY VARCHAR2(1) NOT NULL,
MAX_FIX_SEQ NUMBER DEFAULT 0)
/


-- Constraints for OMS_ORDER

ALTER TABLE OMS_ORDER
ADD CONSTRAINT ORDER_PRIM_SLNO PRIMARY KEY (SL_NO)
USING INDEX
/

The Cluster version DDL is as follows:
CREATE CLUSTER OMS_ORD_CLS (
SL_NO NUMBER(22,0))
SIZE 72
SINGLE TABLE
HASHKEYS 5000011
/

CREATE TABLE OMS_ORDER_CLS
(SL_NO NUMBER(22,0),
ORDER_SIDE VARCHAR2(1),
MARKET_SL_NO VARCHAR2(15),
ORDER_DATE DATE,
NIN VARCHAR2(60),
C_ACCOUNT VARCHAR2(60),
COMPANY_CODE VARCHAR2(10),
VOLUME NUMBER NOT NULL,
ACTUAL_VOLUME NUMBER DEFAULT 0,
REMAIN_VOLUME NUMBER DEFAULT null,
EXECUTED_VOLUME NUMBER DEFAULT 0,
ORDER_TYPE VARCHAR2(1) NOT NULL,
ORDER_PRICE NUMBER,
AVG_PRICE NUMBER DEFAULT 0,
ACTUAL_PRICE NUMBER DEFAULT 0,
ORDER_STATUS NUMBER,
ORDER_SOURCE VARCHAR2(10),
REF_SL_NO NUMBER,
MARKET VARCHAR2(4) DEFAULT 'QA',
VALIDITY VARCHAR2(1) NOT NULL,
MAX_FIX_SEQ NUMBER DEFAULT 0)
CLUSTER OMS_ORD_CLS(SL_NO)
/


Which will be better using a normal table or a cluster one?
A little note The PK is SL_NO wich is a unique number and all the queries will be like this
SELECT * FROM OMS_ORDER_XXX WHERE SL_NO = XXXXXX;

Please Help Me.

Thanks in advance

Comments

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,579 Gold Crown
    edited May 2008
    At first sight, this looks as if you have a good case for a single table hash clusters, but to come to the best conclusion you're going to have to spend a lot more time discussing what you want to do.

    First of all, you will still need to create the primary key constraint on the table, so the single table hash cluster won't save you space.

    Secondly, your size of 72 looks suspect - your row definition allows your rows to be much longer than 72 bytes, and you need to add about 20 bytes per row for cluster key overheads. If you go the hash cluster route, you should also make sure that your number columns are strictly defined (i.e. number(p,s)) where possible so that you don't run the risk of getting rounding errors from front end code that end up storing numbers to 22 byte precision.

    Thirdly, an "orders" table looks like one that is supposed to grow indefinitely, whereas hash clusters are optimised for pre-defined data limits.

    Note - if your SL_NO is supposed to be (very close to) a sequence number, then you probably want to consider using the 'hash is sl_no' option in the cluster definition.

    The optimium access for a single table hash cluster that's clusterd on a declared primary key is: 'single table by hash' - which uses a special buffer get mechanism to access the block, and can access the correct block by "arithmetic" rather than by walking down the primary key index. So the benefits you are after are a reduction in logical I/O, the absence of any need to keep the index buffered for table reads (although you will need to access it for inserts and deletes), and the removal of the possible "root block hot spot".

    The things that can go wrong are that you could get hash collisions (two primary keys hashing to the same hashkey value) and hash overflow (a block fills before it's had all the "expected" rows inserted, and a row has to go to an overflow block).

    When dealing with the first problem, Oracle can get to the right block quickly, but has to scan the entire block checking the hashkey of every row in that block, then checking the primary key: this increases the CPU used to get the right row, and is one of the main reasons you see Oracle using multiple block sizes in the TPC-C benchmarks.

    When dealing with the second problem, Oracle has to do a second buffer get - which increases contention and CPU, and could increase the physical I/O count. This is why you have to be very careful when setting the SIZE parameter.

    There's quite a lot more that could be said, but that's probably enough to let you start thinking about the costs and benefits you might see from using a single table hash cluster.


    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk
  • sgalaxy
    sgalaxy Member Posts: 5,687 Bronze Trophy
    You should use Hash Clustered Tables in the following circumstances:
    1) You know beforehand the number of rows your table will have or the upper bound of it. Getting the size of the HASHKEYs and SIZE parameters right is crucial to avoid a rebuild.
    2) The inserts mainly , should be less/lighter to data retrieval.Updates do not provide much overhead unless the hashkey is being updated...
    3) You access constantly the data by the hashkey value.... From your notes , you cover this 'requirement'.

    You should however make a test case for both alternatives and compare them....

    Greetings....
    Sim
  • Tomy3k_Bakr
    Tomy3k_Bakr Member Posts: 110 Blue Ribbon
    i made a small test. i wrote a program that inserts 500,000 rows in each table. Normal Insert is more faster than Cluster Insert. The result was that with normal insert procedure that will be used in our business we inserted around 317 rows/s but in cluster insert we insert around 250.6 rows/s. SQL_TRACE was opened in both the session and i want to paste them for you.
    Normal Insert:

    -----------------------------------------------------------------------------------------------------------
    INSERT INTO OMS_ORDER(SL_NO, ORDER_SIDE, ORDER_DATE, NIN, C_ACCOUNT, COMPANY_CODE, VOLUME,
    ORDER_TYPE, ORDER_PRICE, VALIDITY, ORDER_SOURCE, REF_SL_NO, MARKET)
    VALUES(:b13, :b12, :b11, :b10, :b9, :b8, :b7,
    :b6, :b5, :b4, :b3, :b2, NVL(:b1, 'QA'))

    call count cpu elapsed disk query current rows
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    Parse 1 0.00 0.00 0 0 0 0
    Execute 500000 537.68 537.81 0 3837 3365204 500000
    Fetch 0 0.00 0.00 0 0 0 0
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    total 500001 537.68 537.81 0 3837 3365204 500000

    Misses in library cache during parse: 0
    Optimizer goal: CHOOSE
    Parsing user id: 539 (recursive depth: 1)
    ---------------------------------------------------------------------------------------------------------
    for Cluster Insert:

    INSERT INTO OMS_ORDER_CLS(SL_NO, ORDER_SIDE, ORDER_DATE, NIN, C_ACCOUNT, COMPANY_CODE, VOLUME,
    ORDER_TYPE, ORDER_PRICE, VALIDITY, ORDER_SOURCE, REF_SL_NO, MARKET)
    VALUES(:b13, :b12, :b11, :b10, :b9, :b8, :b7,
    :b6, :b5, :b4, :b3, :b2, NVL(:b1, 'QA'))

    call count cpu elapsed disk query current rows
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    Parse 1 0.01 0.01 0 3 0 0
    Execute 500000 457.00 929.43 44643 30 2007108 500000
    Fetch 0 0.00 0.00 0 0 0 0
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    total 500001 457.01 929.44 44643 33 2007108 500000

    Misses in library cache during parse: 1
    Optimizer goal: CHOOSE
    Parsing user id: 539 (SHAMAN) (recursive depth: 1)

    so in cluster insert it will make additional disk reads i think for populating the cluster information.

    So Normal Insert is better than Cluster Insert was my conclusion.

    Regards;
  • 108476
    108476 Member Posts: 2,184
    edited May 2008
    Hi Tommy,
    Which will be better using a normal table or a cluster one?
    That depends on "how" the rows are accessed.

    For example, assume that I have an app that retrieves all orders for a customer, and that the average is 20 per customer.

    - In a "normal" table, new order could go on any data block, and it's unlikely that they would be co-located on a single block. Twenty orders, twenty trip the the database.

    - Using a sorted hash cluster, rows are co-located, allowing the app to fetch all orders in a single logical I/O.

    I have some useful notes here:

    http://www.dba-oracle.com/t_sorted_hash_clusters.htm

    --------------------

    Hope this helps. . .

    Donald K. Burleson
    Oracle Press author
    Author of "Oracle Tuning: The Definitive Reference":
    http://www.dba-oracle.com/bp/s_oracle_tuning_book.htm
  • Tomy3k_Bakr
    Tomy3k_Bakr Member Posts: 110 Blue Ribbon
    Hi Donald,
    first of all thankx a lot for you response. Second, i'm using your website as a refrence to me before implementing any oracle feature. For my case only i think Normal Table will be more faster than Cluster Table, am i right?


    Regards;
  • Tomy3k_Bakr
    Tomy3k_Bakr Member Posts: 110 Blue Ribbon
    Hi Donald,
    Can you please reply to this thread,
    653208
    I realy need it if you can help me.
  • sgalaxy
    sgalaxy Member Posts: 5,687 Bronze Trophy
    "Can you please reply to this thread"
    Irrelevant to this current thread.....
  • 108476
    108476 Member Posts: 2,184
    Hi Tommy,
    For my case only i think Normal Table will be more faster than Cluster Table, am i right?
    It depends!

    Cluster tables are great whn the vast majority of queries use the same index to retrieve multiple rows, and data block adjacency is important to reduce I/O.

    What percentage of your production SQL accesses the rows in index sorted order?
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,579 Gold Crown
    >
    For example, assume that I have an app that retrieves
    all orders for a customer, and that the average is 20
    per customer.

    - Using a sorted hash cluster, rows are co-located,
    allowing the app to fetch all orders in a single
    logical I/O.

    I have some useful notes here:

    http://www.dba-oracle.com/t_sorted_hash_clusters.htm
    The OP said the following in his original post:
    A little note The PK is SL_NO wich is a unique number and all the queries will be like this
    SELECT * FROM OMS_ORDER_XXX WHERE SL_NO = XXXXXX;


    This means sorted hash clusters are irrelevent.

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,579 Gold Crown
    edited May 2008
    Tomy3k_Bakr

    This is expected behaviour.

    I guess that your inserts were either using a pl/sql loop counter or a sequence number. If this is so, the heap table inserts would have been inserting several consecutive rows into the "next new" block in the table, and doing the same with the primary key index.

    The cluster inserts would be inserting each row into "the right block" which would have been scattering them all over the place. Also, when you create a hash cluster, Oracle formats all the blocks before making the cluster available, so the inserts will initially have to read the appropriate block from disk, rather than just "new"ing it in memory.

    In a very active environment, one of the considerations of optimising the use of hash clusters is to ensure that they stay 100% cached - so that the most significant cost of a block visit is a single "consistent get - examination".

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk
This discussion has been closed.