This discussion is archived
7 Replies Latest reply: Jan 13, 2012 11:12 AM by Tubby RSS

Change from merge to insert and create a procedure to update records after

654930 Newbie
Currently Being Moderated
Hi,

I have a table that contains about 50M rows by partition and each day I have an application that does a merge statement to insert about 100k records a day and it does 300M updates a day.

Because I have both update and insert the application is doing a merge.
I need to do at least 500M updates a day, and because of this this solution doesn't have enough performance for what I require.

I'm thinking on another solution that I'm going to try to implement now and I just posted this in order to try to get some help on the option I may have to do this.

What I want to start implement:
- The purpuse is to updated "last_record_date" using "KEY1", "KEY2" and "KEY3";
- Change the application to do only inserts to a temporary table.
- Have two procedures, one that picks the new records from the temporary table and inserts it in the final table, and another that picks the records that will perform update and perform an update.
- This two procedures will pick the records and remove them from the temporary table ( or mark them as updated\inserted).

The volume is very big, so performance is a challenge! That’s why I'm posting this, because finding a solution with enough performance for the intermediary step will be difficult!
My final table is partitioned by date and the date is the field I need to update (Instead of creating daily partitions I’m creating monthly partitions in order to reduce the frequency of row movements)!
This is currently my table definition:
CREATE TABLE MY_TEST_TABLE
    (KEY1                           VARCHAR2(50 BYTE) ,
    KEY2                         VARCHAR2(50 BYTE) NOT NULL,
    KEY3                           VARCHAR2(50 BYTE) ,
    last_record_date                    VARCHAR2(50 BYTE) NOT NULL)
  PCTFREE     10
  INITRANS    1
  MAXTRANS    255
   STORAGE   (
    BUFFER_POOL DEFAULT
  )
  NOCACHE
  MONITORING
  ENABLE ROW MOVEMENT
  PARTITION BY RANGE (last_record_date)
  (
  PARTITION MY_TEST_TABLE_201112 VALUES LESS THAN ('20120101')
  PCTFREE     10
  INITRANS    1
  MAXTRANS    255
  NOLOGGING,
  PARTITION MY_TEST_TABLE_201201 VALUES LESS THAN ('20120201')
  PCTFREE     10
  INITRANS    1
  MAXTRANS    255
  NOLOGGING,
  PARTITION MY_TEST_TABLE_201202 VALUES LESS THAN ('20120301')
  PCTFREE     10
  INITRANS    1
  MAXTRANS    255
  NOLOGGING,
  PARTITION def VALUES LESS THAN (MAXVALUE)
  PCTFREE     10
  INITRANS    1
  MAXTRANS    255
  NOLOGGING
  )
  NOPARALLEL
/



-- Constraints for MY_TEST_TABLE

ALTER TABLE MY_TEST_TABLE
ADD CHECK ("KEY1" IS NOT NULL)
/

ALTER TABLE MY_TEST_TABLE
ADD CHECK ("KEY2" IS NOT NULL)
/

ALTER TABLE MY_TEST_TABLE
ADD CHECK ("KEY3" IS NOT NULL)
/
ALTER TABLE MY_TEST_TABLE
ADD CONSTRAINT pk_MY_TEST_TABLE PRIMARY KEY (KEY1, KEY2, KEY3)
USING INDEX
  PCTFREE     10
  INITRANS    2
  MAXTRANS    255
/
Thanks,
Ricardo Tomás
  • 1. Re: Change from merge to insert and create a procedure to update records after
    Tubby Guru
    Currently Being Moderated
    naoseionome wrote:
    last_record_date                    VARCHAR2(50 BYTE) NOT NULL)
    Are you really using a VARCHAR data type to store date information? That would be my #1 concern out of anything you've posted.

    How is the application currently performing the MERGE operation (where's the data coming from that it's using to MERGE) ? Very frequently for a small number of records, infrequently for a large number of records, are there many concurrent processes all firing off MERGE statements or do you have a dedicated resource responsible for doing this MERGE? Also, why doesn't the current solution provide enough performance for you, in what respect is it lacking (have you traced it to see where your time is being spent)?

    Also, have you provided the complete DDL for the table including all indexes?

    What version of Oracle are you using?
    select * from v$version;
  • 2. Re: Change from merge to insert and create a procedure to update records after
    654930 Newbie
    Currently Being Moderated
    Hi,

    My version is:
    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

    This is the DDL I used to create the database.

    The merge is frequent during each day for about 10M records.

    There are several processes doing the update but they are doing it sequencially in order to avoid deadlocks ( because I cannot control the source and each process may update the same keys.

    The update uses "KEY1,KEY2,KEY3" in order to update value "LAST_RECORD_DATE" does it have impact in being date or varchar2?

    I'm still waiting to receive permissions in order to be able to track the issue, but in meanwhile I'm trying to implement this workaround in order to speedup an development in case It would be required! The only feedback I have for now is that It is not being able to process the total amount of records in the proper time (backlog is being generated).

    Thanks,
    Ricardo
  • 3. Re: Change from merge to insert and create a procedure to update records after
    Tubby Guru
    Currently Being Moderated
    naoseionome wrote:
    Hi,

    My version is:
    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

    This is the DDL I used to create the database.

    The merge is frequent during each day for about 10M records.

    There are several processes doing the update but they are doing it sequencially in order to avoid deadlocks ( because I cannot control the source and each process may update the same keys.

    The update uses "KEY1,KEY2,KEY3" in order to update value "LAST_RECORD_DATE" does it have impact in being date or varchar2?

    I'm still waiting to receive permissions in order to be able to track the issue, but in meanwhile I'm trying to implement this workaround in order to speedup an development in case It would be required! The only feedback I have for now is that It is not being able to process the total amount of records in the proper time (backlog is being generated).

    Thanks,
    Ricardo
    Well, the first thing you should do is change the data type from a varchar to a date. The major impact being that a date is not a string, it's a date. If you have processes adding data in different date formats (possible when you use a character to store the information) you're going to be very sad (good luck converting it back to an actual date).

    An example of how this can easily happen.
    ME_TUBBZ?select to_char(sysdate) from dual;
     
    TO_CHAR(SYSDATE)
    -----------------------------
    12-jan-2012 10 55:33
     
    1 row selected.
     
    Elapsed: 00:00:00.01
    ME_TUBBZ?alter session set nls_date_format = 'yyyy-mon-dd';
     
    Session altered.
     
    Elapsed: 00:00:00.01
    ME_TUBBZ?select to_char(sysdate) from dual;
     
    TO_CHAR(SYSDATE)
    --------------------
    2012-jan-12
    
    1 row selected.
    
    Elapsed: 00:00:00.00
    Assuming the client passes the column as a DATE Oracle will have to do a conversion (implicit) in order to store the data as a string, which will rely on the current NLS settings for the session. This also makes it possible for someone to put completely non-date information into the column (super bad). It's just really bad form to use strings to store dates, numbers, anything that isn't naturally a string.

    Based on your problem description, i can't imagine how your partitioning scheme does anything other than require more work in order to process the updates (which are the bulk of your requirement). Partitioning doesn't equate to performance gain. It's a tool that, like all tools has it's place, but i don't think it's warranted for what you are doing (surely not the way you are currently using it anyways).

    I'm not saying partitioning can't be useful for you, but i don't think partitioning on the column you are currently using is "good".

    Something i would personally investigate (this requires being able to compare this approach with your current approach, so you need metrics to do the comparison) were i you would be utilizing an Index Organized Table for this. It will require more time to perform INSERT operations, but those are a tiny fraction of the total work you need to do for this process. For the updates, it should eliminate a great deal of IO, assuming you are currently doing an INDEX look up followed by a table access by ROWID to perform the updates. This recommendation assumes everything you've told us is truthful. You have 4 columns in your table, 3 of which are components of the primary key and the 4th is a date (note i didn't say string :) ) and that you primarily update, not insert. Also that you have no "secondary indexes" on this entity (and none are needed).

    I would investigate that approach before the temporary table solution you are currently investigating.

    Cheers,
  • 4. Re: Change from merge to insert and create a procedure to update records after
    654930 Newbie
    Currently Being Moderated
    Thanks for the recommendations.

    I will check that, I started building the skeleton for the solution with the temporary table and now that its on paper it doesn't seem to be as good as I was initially thinking!

    I have partitions just because of housekeeping!

    I will have to delete millions of rows every months or I just drop the previous month partition...

    I will share the feedback as soon as I have some results.

    Thanks,
    Ricardo
  • 5. Re: Change from merge to insert and create a procedure to update records after
    Tubby Guru
    Currently Being Moderated
    naoseionome wrote:
    I have partitions just because of housekeeping!

    I will have to delete millions of rows every months or I just drop the previous month partition...
    Well, you know the data better than i, but i don't see how that could work. If the process is continually issuing updates to the date column based on the primary key, how can you know when it's "safe" to remove that data since the date seems to be constantly changing? Is there something else in the data (perhaps the PK columns) that allows you to make this determination?
    naoseionome wrote:
    I will share the feedback as soon as I have some results.
    That'd be great, i'll put a watch on this thread so i don't forget to check back :)
  • 6. Re: Change from merge to insert and create a procedure to update records after
    654930 Newbie
    Currently Being Moderated
    Hi,

    I had to remove the partitions because the key was not part of the primary key!
    It was safe to remove based on the fact that information older than one month is useless so the partition could be safely removed!

    I have changed the table and the system is up and running.
    This solution fits my requirements.
    I don't have exact number to share to you because I'm not able to get them currently, but I think It performed about 1M inserts and 500M updates,the system almost doubled the performance!
    The system is currently able to process all the information so I don't need to apply any extra workaround!

    I haven't changed the date field because I don't know the impact on everything that depends on this table, but that will be part of next week work :)

    My ddl stayed like this:
    CREATE TABLE fms_r_imsi_imei_msc_hua
        (KEY1                         VARCHAR2(50 BYTE) ,
        KEY2                           VARCHAR2(50 BYTE) ,
        KEY3                           VARCHAR2(50 BYTE) ,
        last_record_date                    VARCHAR2(50 BYTE) ,
      CONSTRAINT PK_FMS_IMSI_IMEI_MSC_HUA
      PRIMARY KEY (KEY1, KEY2, KEY3))
      ORGANIZATION INDEX
      NOPARALLEL
      LOGGING
    /
    Thank you so much for the help it helped my finding the best way to follow,
    Ricardo Tomás

    Edited by: naoseionome on Jan 13, 2012 11:44 PM
  • 7. Re: Change from merge to insert and create a procedure to update records after
    Tubby Guru
    Currently Being Moderated
    Awesome, glad to hear i was able to help.

    One thing you'll have to work out is the monthly purge of information. I agree that a simple partition drop would be much easier, but if that doesn't support the daily needs of the application then it's really not an option :)

    I'm not sure how many rows you need to purge in relation to the existing data,or if you have any maintenance windows, but it might be easiest to have a monthly process that does a CTAS (create table as) selecting the values from the existing table you need to retain (as opposed to those you need to delete), drop the table and rename the new table you created in the CTAS (would have to create it as an IOT as well). Something for you to best figure out how to proceed of course, that's just one possible option.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points