This discussion is archived
1 2 Previous Next 26 Replies Latest reply: Sep 6, 2013 2:10 PM by rp0428 RSS

daily batch job

970021 Newbie
Currently Being Moderated

If I have a daily batch job that need to pick up the records that have been modified during the past 24 hours and do some processing, there is the issue of timing.

 

If I execute:

 

insert into TARGET_TABLE select * From SOURCE_TABLE where MODIFIED_DATE >= to_date(to_char(sysdate - 1, 'YYYYMMDD') || '070000', 'YYYYMMDDHHMISS')
and MODIFIED_DATE < to_date(to_char(sysdate , 'YYYYMMDD') || '070000', 'YYYYMMDDHHMISS')

 

If I have a MODIFIED_DATE column that gets updated by a trigger whenever there is a change, how can I make sure I pick up all the records? Because of read consistency and a transaction might take a few minutes to commit, a record can get the MODIFIED_DATE updated in the range above after the statement above gets executed, so it won't be picked up. How do I solve the issue of read consistency?

  • 1. Re: daily batch job
    Mike Kutz Expert
    Currently Being Moderated

    hint:

    You can 'trunc' a date by year, month ('MM'), day ('day' or 'DD') hour ('HH'), minute ('MI')

    trunc( sysdate, 'DAY' ) + 7/24 gives you TODAY @ 07:00

     

    One idea would be to run it at 08:00 instead of 07:00.

     

    If you still have concerns about lingering transactions, use Advance Queuing.

    Modify the current trigger to add to the queue "Hey, the row with PK value ___ needs to be processed!"

     

    Now, your daily routine just picks up all the PKs that have been modified since the last time you ran it.

  • 2. Re: daily batch job
    970021 Newbie
    Currently Being Moderated

    Running it at 8 can work but you will be picking up duplicate records from the previous day. Design wise, it is not optimal. Although unlikely, the transaction could be stuck for over an hour, etc. Is there a systematic way of making sure I don't miss any records?

  • 3. Re: daily batch job
    rp0428 Guru
    Currently Being Moderated

    970021 wrote:

     

    If I have a daily batch job that need to pick up the records that have been modified during the past 24 hours and do some processing, there is the issue of timing.

     

    If I execute:

     

    1. insert into TARGET_TABLE select * From SOURCE_TABLE where MODIFIED_DATE >= to_date(to_char(sysdate - 1, 'YYYYMMDD') || '070000', 'YYYYMMDDHHMISS')  
    2. and MODIFIED_DATE < to_date(to_char(sysdate , 'YYYYMMDD') || '070000', 'YYYYMMDDHHMISS'

     

    If I have a MODIFIED_DATE column that gets updated by a trigger whenever there is a change, how can I make sure I pick up all the records? Because of read consistency and a transaction might take a few minutes to commit, a record can get the MODIFIED_DATE updated in the range above after the statement above gets executed, so it won't be picked up. How do I solve the issue of read consistency?

    The issue of 'read consistency' isn't your problem; Oracle has already solved that issue.

     

    The fact that 'a transaction might take a few minutes to commit' is YOUR problem. Your code is what is waiting a few minutes before issuing a commit. Once you issue a COMMIT Oracle will usually execute it VERY quickly.

     

    A normal trigger does NOT contain transactional code. The transaction is controlled by the code that causes the trigger to fire. Depending on what that code is doing it might be seconds, minutes, hours or even days before a COMMIT is issued.

     

    You either need to accept the risk that you identified or eliminate it. There are several options for eliminating it. One is to make sure all users have completed their transactions before you do the data pull.

     

    Another option is to use a materialized view log to capture the changes and then do your data pull based on the MV log contents and then delete the log contents for the data you pulled .

     

    Other options include using LogMiner to get the changes from the REDO log files.

  • 4. Re: daily batch job
    Mike Kutz Expert
    Currently Being Moderated

    The only way to make sure you get all records is to record what records got changed somewhere in the database.

    ie you record what you want to work on for later, then (later) work on them.

    In this fashion, what you are wanting is a queuing system!

    So, use one!!!

    Advance Queuing is part of the database and is easy to set up.

     

    PS - Have you taken a look at Golden Gate?

    My understanding, Golden Gate helps you automate the transfer of data from database a to database b.

    If I'm totally wrong about my assumption of what it does, please let me know.

  • 5. Re: daily batch job
    jihuyao Journeyer
    Currently Being Moderated

    Without successful commit, nothing should happen or being modified although it can be logged saying such modification is performed but failed.

     

    But commit on logging process should be independent of commit on data process (no need to wait for data process completed, instead just update its status at end successful or fail). 

  • 6. Re: daily batch job
    jihuyao Journeyer
    Currently Being Moderated

    You can't assume a data process or transaction will be successful before it is completed and also committed

  • 7. Re: daily batch job
    970021 Newbie
    Currently Being Moderated

    You either need to accept the risk that you identified or eliminate it. There are several options for eliminating it. One is to make sure all users have completed their transactions before you do the data pull.

     

    Another option is to use a materialized view log to capture the changes and then do your data pull based on the MV log contents and then delete the log contents for the data you pulled

     

    Can you expand on the two options above?

     

    From Oracle, is it possible to check what are uncommitted transactions by table?

     

    How would using a materialized view would solve the issue of timing? How would this solve the issue of a transaction that started, let's say at 6:58 and gets committed at 7:04am the day before. If I refresh the materialized view at 7 today, the record wouldn't be captured because at 7, the transaction is not committed. Wouldn't I have the same problem? I'm sure I'm missing something, please clarify.

  • 8. Re: daily batch job
    rp0428 Guru
    Currently Being Moderated

    Can you expand on the two options above?

     

    Sure

     

    1. accepting the risk

    Many orgs perform extracts late in the early morning (e.g. 1 am to 3 am) when there would not normally BE any uncommitted transactions from the previous day. Their extract may also occur AFTER other maintenance operations that have already dealt with any uncommitted transactions by terminating the sessions that had them.

     

    In those situations there would be little risk of uncommitted DML that might cause an extract problem.

     

    For an example of a situation where there IS a problem using a MODIFIED_DATE value to extract data see my last reply in this thread from April

    https://forums.oracle.com/thread/2527467

     

    From Oracle, is it possible to check what are uncommitted transactions by table?

    Query the DBA_DML_LOCKS view (and/or DDL view)

    http://docs.oracle.com/cd/E11882_01/server.112/e25513/statviews_3157.htm

    select * from DBA_DML_LOCKS

     

    SESSION_ID,OWNER,NAME,MODE_HELD,MODE_REQUESTED,LAST_CONVERT,BLOCKING_OTHERS

    10,SCOTT,EMP_READ_ONLY,Row-X (SX),None,1163,Not Blocking

    69,SCOTT,EMP_READ_ONLY,Row-X (SX),None,1648,Not Blocking

    That shows that two session are holding exclusive row lock on the EMP_READ_ONLY TABLE. As a test I inserted a row from two different sessions but did NOT commit it yet.

     

    So when you don't want to accept the risk you put the table into READ ONLY mode, perform the extract and then put the table back into READ WRITE mode.

    ALTER TABLE EMP_READ_ONLY READ ONLY

    You will get an exception if there are outstanding DML operations. Then you can track them down and deal with them.

     

     

    How would using a materialized view would solve the issue of timing? How would this solve the issue of a transaction that started, let's say at 6:58 and gets committed at 7:04am the day before. If I refresh the materialized view at 7 today, the record wouldn't be captured because at 7, the transaction is not committed. Wouldn't I have the same problem? I'm sure I'm missing something, please clarify.

    I said use an MV log; I didn't say use a materializled view.

     

    You can create an MV log on the table so that

     

    Oracle will record ALL changes on the table to the LOG.

     

    See my detailed reply dated May 23, 2013 1:37 PM in this thread

    https://forums.oracle.com/forums/thread.jspa?messageID=11034366&#11034366

     

    My reply includes sample code and results that show the info captured in an MV log and how you can use it.

     

    If possible you should use LogMiner to just grab the changes from the REDO log files.

  • 9. Re: daily batch job
    970021 Newbie
    Currently Being Moderated

    This is great information. Thank you so much. A few more questions:

     

    You will get an exception if there are outstanding DML operations. Then you can track them down and deal with them.

     

    Although we will do this when there are very few DML going on, what is the best way to loop the "alter table XYZ read only" in case there is a uncommitted DML going on? Something like loop every 10 seconds for 5 minutes until it can change the table to read only. Any best practice on this?

     

    Regarding the use of materialized view logs, any issue with two sessions updating the same record/ getting an error or a rollback? How do situations like these get handled through the internal triggers?

     

    Also, the last comment on your post https://forums.oracle.com/forums/thread.jspa?messageID=11034366&#11034366

    somebody mentioned that the materialized view log wasn't designed for this and puts an example when this would fail that I couldn't understand. I tried to do some research on log miner, materialized view log, streams, and change data capture methods and compare them, but couldn't find a lot of useful information. I understand that materialized view log is the only one that uses triggers, impacting performance more or less. The other options use the redo logs. What are the benefits/ drawbacks/ differences of:

     

    Log miner

    Asynchronous CDC

    Streams

  • 10. Re: daily batch job
    Mike Kutz Expert
    Currently Being Moderated

    970021 wrote:

    Although we will do this when there are very few DML going on, what is the best way to loop the "alter table XYZ read only" in case there is a uncommitted DML going on? Something like loop every 10 seconds for 5 minutes until it can change the table to read only. Any best practice on this?

     

     

    You want to stay away from doing DDL in 'day-to-day code'.

    Just lock the table in exclusive mode with a WAIT.

     

    declare
    -- variables for looping
      try_count int := 0;
      have_lock boolean := false;
    -- catching "ORA-54 Resource Busy"
      lock_timeout exception;
      pragma exception_init( lock_timeout, -54 );
    begin
    -- try 3 with a timeout of 5min per try (15min total)
      while( try_count < 3 and not have_lock )
      loop
        try_count := try_count + 1;
        begin
          lock table t in exclusive mode wait 5*60; -- 5min timeout
    have_lock := true; -- (i forgot this in original post)
        exception
          when lock_timeout then
            -- do nothing,  loop takes care of retry
            null;
        end;
      end loop;
      -- just because we are here in the code
      -- does not mean we have the lock
      if not have_lock
      then
        -- place LOG_ERROR() here
        -- place SEND_ERROR_ALERT_EMAIL() here
        raise lock_timeout;
      end if;
      -- if you just need to ensure everybody is 'out' of the table
      -- release the lock by calling 'commit'
      -- this way, everybody can 'get back to work'
      commit;
      -- place your actual code here
    end;
    
  • 11. Re: daily batch job
    Kapil Newbie
    Currently Being Moderated

    Can you use select ...for update instead of locking entire table based on your criteria, as this wont lock entire table.

  • 12. Re: daily batch job
    970021 Newbie
    Currently Being Moderated

    I need to lock the whole table in this case.


    What are advantages do I have by using lock instead of alter table read only other than not doing a DDL?

  • 13. Re: daily batch job
    Kapil Newbie
    Currently Being Moderated

    If your transaction is too big and continuing in hours, user can still do the insert operations, post your cut-off time.

    Ex. The cut-off time is 7:00 pm for 24 hours data. Batch start Time 7:00 pm. Process time 2:00 hrs. If the lock table is used, it will lock the entire table from 7:00 pm to 9:00 pm. And users will have to wait till that time. But if it is select for update, the data which is available till 7:00 pm will be locked and user are free to insert new data which will be capture/locked in next day run.

     

    This is based on experience and you may have some specific requirement which am not aware.

  • 14. Re: daily batch job
    Mike Kutz Expert
    Currently Being Moderated

    Dear dbms_random.value(0,100000)

     

    In both cases, SELECT is not blocked.

     

    ALTER TABLE .. READ ONLY

    all sessions (including yours) will receive an "ORA-12081" when they try to do an insert,update,delete on that table.

    (What would your applications/end-users do if, all of the sudden, this error was thrown?)

    You need to run another DDL to make it read/write again.

    And, the table remains in READ ONLY mode if the DB crashes before your process ends.  This just complicates DR (disaster recovery) process.

    In short - this is bad idea.

     

    LOCK TABLE ... IN EXCLUSIVE MODE

    Your session can run all the insert,update,deletes it wants.

    Meanwhile, all other sessions (that want to do insert,update,delete) just sit there and wait until the lock is released.

    This means that the LOCK TABLE statements can be used to force serialization (ie SLOW=TRUE) which, from the sound of it, is what you want to accomplish.

    You only need to commit/rollback your transaction to release the lock.

     

    SELECT .. FOR UPDATE

    Think of this as a slimmed down version of LOCK TABLE.

    Instead of locking the entire table, it only locks the rows that you have selected.

    This is the preferred method as the LOCK TABLE can be more of a "sledgehammer to thumbtack" approach.

     

    Message was edited by: MikeKutz Included kabil's suggestion

1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points