1 2 Previous Next 26 Replies Latest reply: Sep 6, 2013 4:10 PM by rp0428 RSS

    daily batch job

    970021

      If I have a daily batch job that need to pick up the records that have been modified during the past 24 hours and do some processing, there is the issue of timing.

       

      If I execute:

       

      insert into TARGET_TABLE select * From SOURCE_TABLE where MODIFIED_DATE >= to_date(to_char(sysdate - 1, 'YYYYMMDD') || '070000', 'YYYYMMDDHHMISS')
      and MODIFIED_DATE < to_date(to_char(sysdate , 'YYYYMMDD') || '070000', 'YYYYMMDDHHMISS')
      

       

      If I have a MODIFIED_DATE column that gets updated by a trigger whenever there is a change, how can I make sure I pick up all the records? Because of read consistency and a transaction might take a few minutes to commit, a record can get the MODIFIED_DATE updated in the range above after the statement above gets executed, so it won't be picked up. How do I solve the issue of read consistency?

        • 1. Re: daily batch job
          Mike Kutz

          hint:

          You can 'trunc' a date by year, month ('MM'), day ('day' or 'DD') hour ('HH'), minute ('MI')

          trunc( sysdate, 'DAY' ) + 7/24 gives you TODAY @ 07:00

           

          One idea would be to run it at 08:00 instead of 07:00.

           

          If you still have concerns about lingering transactions, use Advance Queuing.

          Modify the current trigger to add to the queue "Hey, the row with PK value ___ needs to be processed!"

           

          Now, your daily routine just picks up all the PKs that have been modified since the last time you ran it.

          • 2. Re: daily batch job
            970021

            Running it at 8 can work but you will be picking up duplicate records from the previous day. Design wise, it is not optimal. Although unlikely, the transaction could be stuck for over an hour, etc. Is there a systematic way of making sure I don't miss any records?

            • 3. Re: daily batch job
              rp0428

              970021 wrote:

               

              If I have a daily batch job that need to pick up the records that have been modified during the past 24 hours and do some processing, there is the issue of timing.

               

              If I execute:

               

              1. insert into TARGET_TABLE select * From SOURCE_TABLE where MODIFIED_DATE >= to_date(to_char(sysdate - 1, 'YYYYMMDD') || '070000', 'YYYYMMDDHHMISS')  
              2. and MODIFIED_DATE < to_date(to_char(sysdate , 'YYYYMMDD') || '070000', 'YYYYMMDDHHMISS'

               

              If I have a MODIFIED_DATE column that gets updated by a trigger whenever there is a change, how can I make sure I pick up all the records? Because of read consistency and a transaction might take a few minutes to commit, a record can get the MODIFIED_DATE updated in the range above after the statement above gets executed, so it won't be picked up. How do I solve the issue of read consistency?

              The issue of 'read consistency' isn't your problem; Oracle has already solved that issue.

               

              The fact that 'a transaction might take a few minutes to commit' is YOUR problem. Your code is what is waiting a few minutes before issuing a commit. Once you issue a COMMIT Oracle will usually execute it VERY quickly.

               

              A normal trigger does NOT contain transactional code. The transaction is controlled by the code that causes the trigger to fire. Depending on what that code is doing it might be seconds, minutes, hours or even days before a COMMIT is issued.

               

              You either need to accept the risk that you identified or eliminate it. There are several options for eliminating it. One is to make sure all users have completed their transactions before you do the data pull.

               

              Another option is to use a materialized view log to capture the changes and then do your data pull based on the MV log contents and then delete the log contents for the data you pulled .

               

              Other options include using LogMiner to get the changes from the REDO log files.

              • 4. Re: daily batch job
                Mike Kutz

                The only way to make sure you get all records is to record what records got changed somewhere in the database.

                ie you record what you want to work on for later, then (later) work on them.

                In this fashion, what you are wanting is a queuing system!

                So, use one!!!

                Advance Queuing is part of the database and is easy to set up.

                 

                PS - Have you taken a look at Golden Gate?

                My understanding, Golden Gate helps you automate the transfer of data from database a to database b.

                If I'm totally wrong about my assumption of what it does, please let me know.

                • 5. Re: daily batch job
                  jihuyao

                  Without successful commit, nothing should happen or being modified although it can be logged saying such modification is performed but failed.

                   

                  But commit on logging process should be independent of commit on data process (no need to wait for data process completed, instead just update its status at end successful or fail). 

                  • 6. Re: daily batch job
                    jihuyao

                    You can't assume a data process or transaction will be successful before it is completed and also committed

                    • 7. Re: daily batch job
                      970021

                      You either need to accept the risk that you identified or eliminate it. There are several options for eliminating it. One is to make sure all users have completed their transactions before you do the data pull.

                       

                      Another option is to use a materialized view log to capture the changes and then do your data pull based on the MV log contents and then delete the log contents for the data you pulled

                       

                      Can you expand on the two options above?

                       

                      From Oracle, is it possible to check what are uncommitted transactions by table?

                       

                      How would using a materialized view would solve the issue of timing? How would this solve the issue of a transaction that started, let's say at 6:58 and gets committed at 7:04am the day before. If I refresh the materialized view at 7 today, the record wouldn't be captured because at 7, the transaction is not committed. Wouldn't I have the same problem? I'm sure I'm missing something, please clarify.

                      • 8. Re: daily batch job
                        rp0428

                        Can you expand on the two options above?

                         

                        Sure

                         

                        1. accepting the risk

                        Many orgs perform extracts late in the early morning (e.g. 1 am to 3 am) when there would not normally BE any uncommitted transactions from the previous day. Their extract may also occur AFTER other maintenance operations that have already dealt with any uncommitted transactions by terminating the sessions that had them.

                         

                        In those situations there would be little risk of uncommitted DML that might cause an extract problem.

                         

                        For an example of a situation where there IS a problem using a MODIFIED_DATE value to extract data see my last reply in this thread from April

                        https://forums.oracle.com/thread/2527467

                         

                        From Oracle, is it possible to check what are uncommitted transactions by table?

                        Query the DBA_DML_LOCKS view (and/or DDL view)

                        http://docs.oracle.com/cd/E11882_01/server.112/e25513/statviews_3157.htm

                        select * from DBA_DML_LOCKS

                         

                        SESSION_ID,OWNER,NAME,MODE_HELD,MODE_REQUESTED,LAST_CONVERT,BLOCKING_OTHERS

                        10,SCOTT,EMP_READ_ONLY,Row-X (SX),None,1163,Not Blocking

                        69,SCOTT,EMP_READ_ONLY,Row-X (SX),None,1648,Not Blocking

                        That shows that two session are holding exclusive row lock on the EMP_READ_ONLY TABLE. As a test I inserted a row from two different sessions but did NOT commit it yet.

                         

                        So when you don't want to accept the risk you put the table into READ ONLY mode, perform the extract and then put the table back into READ WRITE mode.

                        ALTER TABLE EMP_READ_ONLY READ ONLY

                        You will get an exception if there are outstanding DML operations. Then you can track them down and deal with them.

                         

                         

                        How would using a materialized view would solve the issue of timing? How would this solve the issue of a transaction that started, let's say at 6:58 and gets committed at 7:04am the day before. If I refresh the materialized view at 7 today, the record wouldn't be captured because at 7, the transaction is not committed. Wouldn't I have the same problem? I'm sure I'm missing something, please clarify.

                        I said use an MV log; I didn't say use a materializled view.

                         

                        You can create an MV log on the table so that

                         

                        Oracle will record ALL changes on the table to the LOG.

                         

                        See my detailed reply dated May 23, 2013 1:37 PM in this thread

                        https://forums.oracle.com/forums/thread.jspa?messageID=11034366&#11034366

                         

                        My reply includes sample code and results that show the info captured in an MV log and how you can use it.

                         

                        If possible you should use LogMiner to just grab the changes from the REDO log files.

                        • 9. Re: daily batch job
                          970021

                          This is great information. Thank you so much. A few more questions:

                           

                          You will get an exception if there are outstanding DML operations. Then you can track them down and deal with them.

                           

                          Although we will do this when there are very few DML going on, what is the best way to loop the "alter table XYZ read only" in case there is a uncommitted DML going on? Something like loop every 10 seconds for 5 minutes until it can change the table to read only. Any best practice on this?

                           

                          Regarding the use of materialized view logs, any issue with two sessions updating the same record/ getting an error or a rollback? How do situations like these get handled through the internal triggers?

                           

                          Also, the last comment on your post https://forums.oracle.com/forums/thread.jspa?messageID=11034366&#11034366

                          somebody mentioned that the materialized view log wasn't designed for this and puts an example when this would fail that I couldn't understand. I tried to do some research on log miner, materialized view log, streams, and change data capture methods and compare them, but couldn't find a lot of useful information. I understand that materialized view log is the only one that uses triggers, impacting performance more or less. The other options use the redo logs. What are the benefits/ drawbacks/ differences of:

                           

                          Log miner

                          Asynchronous CDC

                          Streams

                          • 10. Re: daily batch job
                            Mike Kutz

                            970021 wrote:

                            Although we will do this when there are very few DML going on, what is the best way to loop the "alter table XYZ read only" in case there is a uncommitted DML going on? Something like loop every 10 seconds for 5 minutes until it can change the table to read only. Any best practice on this?

                             

                             

                            You want to stay away from doing DDL in 'day-to-day code'.

                            Just lock the table in exclusive mode with a WAIT.

                             

                            declare
                            -- variables for looping
                              try_count int := 0;
                              have_lock boolean := false;
                            -- catching "ORA-54 Resource Busy"
                              lock_timeout exception;
                              pragma exception_init( lock_timeout, -54 );
                            begin
                            -- try 3 with a timeout of 5min per try (15min total)
                              while( try_count < 3 and not have_lock )
                              loop
                                try_count := try_count + 1;
                                begin
                                  lock table t in exclusive mode wait 5*60; -- 5min timeout
                            have_lock := true; -- (i forgot this in original post)
                                exception
                                  when lock_timeout then
                                    -- do nothing,  loop takes care of retry
                                    null;
                                end;
                              end loop;
                            
                              -- just because we are here in the code
                              -- does not mean we have the lock
                              if not have_lock
                              then
                                -- place LOG_ERROR() here
                                -- place SEND_ERROR_ALERT_EMAIL() here
                                raise lock_timeout;
                              end if;
                            
                              -- if you just need to ensure everybody is 'out' of the table
                              -- release the lock by calling 'commit'
                              -- this way, everybody can 'get back to work'
                              commit;
                            
                              -- place your actual code here
                            end;
                            
                            
                            • 11. Re: daily batch job
                              Kapil

                              Can you use select ...for update instead of locking entire table based on your criteria, as this wont lock entire table.

                              • 12. Re: daily batch job
                                970021

                                I need to lock the whole table in this case.


                                What are advantages do I have by using lock instead of alter table read only other than not doing a DDL?

                                • 13. Re: daily batch job
                                  Kapil

                                  If your transaction is too big and continuing in hours, user can still do the insert operations, post your cut-off time.

                                  Ex. The cut-off time is 7:00 pm for 24 hours data. Batch start Time 7:00 pm. Process time 2:00 hrs. If the lock table is used, it will lock the entire table from 7:00 pm to 9:00 pm. And users will have to wait till that time. But if it is select for update, the data which is available till 7:00 pm will be locked and user are free to insert new data which will be capture/locked in next day run.

                                   

                                  This is based on experience and you may have some specific requirement which am not aware.

                                  • 14. Re: daily batch job
                                    Mike Kutz

                                    Dear dbms_random.value(0,100000)

                                     

                                    In both cases, SELECT is not blocked.

                                     

                                    ALTER TABLE .. READ ONLY

                                    all sessions (including yours) will receive an "ORA-12081" when they try to do an insert,update,delete on that table.

                                    (What would your applications/end-users do if, all of the sudden, this error was thrown?)

                                    You need to run another DDL to make it read/write again.

                                    And, the table remains in READ ONLY mode if the DB crashes before your process ends.  This just complicates DR (disaster recovery) process.

                                    In short - this is bad idea.

                                     

                                    LOCK TABLE ... IN EXCLUSIVE MODE

                                    Your session can run all the insert,update,deletes it wants.

                                    Meanwhile, all other sessions (that want to do insert,update,delete) just sit there and wait until the lock is released.

                                    This means that the LOCK TABLE statements can be used to force serialization (ie SLOW=TRUE) which, from the sound of it, is what you want to accomplish.

                                    You only need to commit/rollback your transaction to release the lock.

                                     

                                    SELECT .. FOR UPDATE

                                    Think of this as a slimmed down version of LOCK TABLE.

                                    Instead of locking the entire table, it only locks the rows that you have selected.

                                    This is the preferred method as the LOCK TABLE can be more of a "sledgehammer to thumbtack" approach.

                                     

                                    Message was edited by: MikeKutz Included kabil's suggestion

                                    1 2 Previous Next