7 Replies Latest reply: Sep 29, 2013 8:11 PM by rp0428 RSS

    How to prevent duplication on a column with condition

    Ann586341

      Hello everyone,

       

      I need some advice here. At work, we have an Oracle APEX app that allow user to add new records with the automatic increment decision number based on year and group name.

       

      Says if they add the first record , group name AA, for year 2012, they get decision number AA 1 2013 as their displayed record casein the report page.

       

      The second record of AA in 2013 will be AA 2 2013.

       

      If they add about 20 records , it will be AA 20 2013.

       

       

       

      The first record for 2014 will be AA 1 2014.

       

      However, recently , we get a user complaint about two records from the same group name have the same decision number.

       

      When I looked into the history table, and find that the time gap between 2 record is just about 0.1 seconds.

       

       

       

      Besides, we have lookup table that allows admin user to update the Start Sequence number with the restraint that it has to be larger than the max number of the current group name of the current year.

       

      This Start sequence number and group name is stored together in a table.

       

      And in some other special case,user can add a duplicate decision number for related record. (this is a new function)

       

      The current procedure logic to add new record on the application are

       

      _Get max(decision_number) from record table with chosen Group Name and current year.

       

      _insert into the record table the new entered record with decision number + 1

       

      _ update sequence number to the just added decision number.

       

       

       

      So rather than utitlising APEX built-in automatic table modification process, I write a procedure that combine all the three process.

       

      I run some for loop to continuously execute this procedure, and it seems it can autotically generate new unique decision number with time gap about 0.1 second.

       

      However, when I increase the number of entry to 200, and let two users run 100 each.

       

      If the time gap is about 0.01 second, Duplicate decision numbers appear.

       

       

       

      What can I do to prevent the duplication ?

       

      I cannot just apply a unique constraint here even for all three columns with condition, as it can have duplicate value in some special condition. I don't know much about using lock and its impact.

       

      This is the content of my procedure

      create or replace

       

      PROCEDURE        add_new_case(

       

        --ID just use the trigger

       

        p_case_title IN varchar2,

       

        p_year IN varchar2,

       

        p_group_name IN VARCHAR2,

       

        --decisionnumber here

       

        p_case_file_number IN VARCHAR2,

       

        --active

       

        p_user IN VARCHAR2

       

        )

       

      AS

       

        default_value NUMBER;

       

          caseCount NUMBER;

       

        seqNumber NUMBER;

       

        previousDecisionNumber NUMBER;

       

      BEGIN

       

        --execute immediate q'[alter session set nls_date_format='dd/mm/yyyy']';

       

        SELECT count(*)

       

              INTO caseCount

       

              FROM CASE_RECORD

       

              WHERE GROUP_ABBR = p_group_name

       

              AND to_number(to_char(create_date, 'yyyy')) = to_number(to_char(date_utils.get_current_date, 'yyyy'));

       

       

              SELECT max(decision_number)

       

              INTO previousDecisionNumber

       

              FROM CASE_RECORD

       

              WHERE GROUP_ABBR = p_group_name

       

              AND to_number(to_char(create_date, 'yyyy')) = to_number(to_char(date_utils.get_current_date, 'yyyy'));

       

       

              IF p_group_name IS NULL

       

              THEN seqNumber := 0;

       

              ELSE   

       

              SELECT seq_number INTO seqNumber FROM GROUP_LOOKUP WHERE ABBREVATION = p_group_name;

       

              END IF;

       

       

          IF caseCount > 0 THEN

       

                 default_value := greatest(seqNumber, previousdecisionnumber)+1;

       

          ELSE

       

                 default_value := 1;

       

          END IF; 

       

        INSERT INTO CASE_RECORD(case_title, decision_year, GROUP_ABBR, decision_number, case_file_number, active_yn, created_by, create_date)

       

        VALUES(p_case_title, p_year, p_group_name, default_value, p_case_file_number, 'Y', p_user, sysdate );

       

        --Need to update sequence here also

       

        UPDATE GROUP_LOOKUP

       

        SET SEQ_NUMBER = default_value

       

        WHERE ABBREVATION = p_group_name;

       

        COMMIT;

       

       

       

      EXCEPTION

       

      WHEN OTHERS THEN

       

          logger.error(p_message_text => SQLERRM

       

                      ,p_message_code => SQLCODE

       

                      ,p_stack_trace  => dbms_utility.format_error_backtrace

       

                      );

       

          RAISE;

       

       

       

      END;

      Many thanks in advance,

      Ann

        • 1. Re: How to prevent duplication on a column with condition
          davidp 2

          This is easiest to solve for the case when p_group_name is not null. In that case you are going to update a GROUP_LOOKUP row, so you can select FOR UPDATE that row at the start, to prevent two cases for the same group being added at the same time. To do that, change the select from GROUP_LOOKUP to:

          SELECT seq_number INTO seqNumber FROM GROUP_LOOKUP WHERE ABBREVATION = p_group_name FOR UPDATE OF SEQ_NUMBER;

          and move this to being the first thing the procedure does - before it counts CASE_RECORD rows.

          In the case when p_group_name is null, you need some object to lock on. I think the best you can do is to lock the whole GROUP_LOOKUP table:

          lock table GROUP_LOOKUP in exclusive mode wait 100;

          The "wait 100" means it will wait up to 100 seconds before giving up and raising an error. in practice you'd expect to only wait a moment.

          Exclusive mode allows others to read but not update the table.

          Both the FOR UPDATE and LOCK TABLE will make updates from other sessions wait for this transaction to commit. Queries from other sessions are unaffected.

          The locks are released when you commit or roll back.

          • 2. Re: How to prevent duplication on a column with condition
            marcusafs

            What uniquely identifies a row?  If you can't define a unique identifier I would contend that this design will not allow you to prevent bad data from getting into the database without locking the table.  Locking the table can lead to a loss of productivity when someone is in the middle of a transaction and they leave for lunch or a weeks vacation and forget to commit.  Then someone has to unlock the table and the person who left may lose their work.

            • 3. Re: How to prevent duplication on a column with condition
              Etbin

              Why not using a sequence for populating the decision_number column ?

              Sequence values are guaranteed to be unique so there's no need to lock anything.

              You'll inevitably have gaps and no different groups will have the same decision_number in common.

              Having to deal with consecutive numbers fixations you can proceed as

               

              with

              case_record as

              (select 2012 decision_year,'AA' group_abbr,1 decision_number from dual union all

              select 2012,'BB',2 from dual union all

              select 2012,'AA',21 from dual union all

              select 2012,'AA',22 from dual union all

              select 2012,'BB',25 from dual union all

              select 2013,'CC',33 from dual union all

              select 2013,'CC',34 from dual union all

              select 2013,'CC',36 from dual union all

              select 2013,'BB',37 from dual union all

              select 2013,'AA',38 from dual union all

              select 2013,'AA',39 from dual union all

              select 2013,'BB',41 from dual union all

              select 2013,'AA',42 from dual union all

              select 2013,'AA',43 from dual union all

              select 2013,'BB',45 from dual

              )

              select decision_year,

                     group_abbr,

                     row_number() over (partition by decision_year,group_abbr order by decision_number) decision_number,

                     decision_number sequence_number -- not shown (noone needs to know you're using a sequence)

                from case_record

              order by decision_year,group_abbr,decision_number

               

              DECISION_YEARGROUP_ABBRDECISION_NUMBERSEQUENCE_NUMBER
              2012AA11
              2012AA221
              2012AA322
              2012BB12
              2012BB225
              2013AA138
              2013AA239
              2013AA342
              2013AA443
              2013BB137
              2013BB241
              2013BB345
              2013CC133
              2013CC234
              2013CC336

               

              for retrieval (assuming decision_year,group_abbr,decision_number as being the key):

               

              select decision_year,group_abbr,decision_number -- the rest of columns

                from (select decision_year,

                             group_abbr,

              -- the rest of columns

                             row_number() over (partition by decision_year,group_abbr order by decision_number) decision_number

                        from case_record

                       where decision_year = :decision_year

                         and group_abbr = :group_abbr

                     )

              where decision_number = :decision_number

               

              DECISION_YEARGROUP_ABBRDECISION_NUMBER
              2013AA4

               

              if that's acceptable

               

              Regards

               

              Etbin

              • 4. Re: How to prevent duplication on a column with condition
                Ann586341

                Thank you everyone. I may have to go with the lock solution. Using Sequence is also a great idea. But decision number can be

                • duplicate if user turn on allow_duplicate (new enhancement we just aleady deployed)
                • decision number may not be consecutive. e.g. the latest Decision number if year 2012 is 12, admin user can set the sequence_number is 15, the next decision number will be 16 not 13.

                About What uniquely identifies a row? we do have a primary key column. But this column is nothing to do with the combination of Year, Group Name, and Decision Number that the business people want.

                The system have run for a couple of years, and we just get one complaints regarding the duplicate issue.

                I will investigate to see how many users for this system, if there are not too many, hopefully we don't run into lock often. I may also create some validation to ensure there must have a group name, so we don't have to lock the whole table.

                • 5. Re: How to prevent duplication on a column with condition
                  davidp 2

                  If all the locking happens within a single procedure, with a COMMIT at the end of that procedure (and that is what I suggested) then the lock is never held outside that procedure. With the steps between locking and committing being short, the lock is only held for a few milliseconds, so you don't need to worry about people being significantly blocked in the way MarcusAFS described.

                  I have used this sort of locking to maintain integrity in triggers quite a few times without problems.

                  • 6. Re: How to prevent duplication on a column with condition
                    Ann586341

                    Thanks a lot David.

                    Ann

                    • 7. Re: How to prevent duplication on a column with condition
                      rp0428

                      Presumably you know that by using your own table like that for assigning values you are SERIALIZING the process. So that approach will not be scalable or performant for large numbers of assignments. Performance will be even worse if you are using RAC.

                       

                      And speaking of performance you are further degrading performance by preventing Oracle from using an index on the table to quickly find the rows you need.

                       

                      Why are you doing this?

                      AND to_number(to_char(create_date, 'yyyy')) = to_number(to_char(date_utils.get_current_date, 'yyyy'));

                      I'm assuming that 'CREATE_DATE' is a column of the table and that the datatype is DATE.

                       

                      The use of a function (e.g. TO_CHAR) on the column guarantees that Oracle can't use an index (except a functional index). If an index is not used then each of those queries that use that construct will be doing a full table scan.

                       

                      That can seriously degrade normal performance for queries on a table but when your queries are for a serialized process you are just introducing totally unnecessary performance issues.

                       

                      You only need to check that 'CREATE_DATE' is for the same year as what 'DATE_UTILS.GET_CURRENT_DATE' returns. So you should:

                       

                      1. Remove the functions on 'CREATE_DATE'

                       

                      2. Construct the proper year boundaries based on the 'GET_CURRENT_DATE' values

                       

                      3. Use a BETWEEN clause that specifies those boundaries

                       

                      4. Add an appropriate index on GROUP_ABBR, CREATE_DATE so that queries can use the index and improve the performance of your serialized process

                       

                      That predicate would then look something like:

                      v_first_day_of_current_year DATE;
                      v_last_day_of_current_year DATE;
                      v_first_day_of_current_year := ??? -- compute this by TRUNC (date_utils.get_current_date, 'yyyy');
                      v_last_day_of_current_year := ??? -- compute this by ADD_MONTHS(v_first_day_of_current_year, 12) - 1');
                      --
                      -- then use those values in the BETWEEN clause
                         AND create_date BETWEEN v_first_day_of_current_year AND v_last_day_of_current_year;

                      If you are using 11g you could also add a virtual column (e.g. CREATE_DATE_YEAR) that uses the year (e.g. EXTRACT(year FROM create_date)) and index on that virtual column and GROUP_ABBR.

                       

                      But even if you need to use portions of a DATE value at least use the functionality (i.e. EXTRACT) that Oracle provides rather than your own code:

                       

                      See EXTRACT in the SQL Language doc

                      http://docs.oracle.com/cd/B28359_01/server.111/b28286/functions052.htm