7 Replies Latest reply: Jan 23, 2013 8:57 AM by chris227 RSS

    REGEXP_REPLACE to conditionally remove vowels

    Another_user
      I am dynamically creating triggers on most of the tables in my schema. The general naming convention is <TABLE_NAME> || AUDIT.  Oracle obviously restricts us to 30 characters in the object name so some of my DDLs would fail for the name being too long.  It is easy to generate these when there are <=  24 characters, as 24 + (AUDIT) = 30 in length. So I am looking for a regular expression to dynamically remove vowels from the string, starting at the end, and up to minimum number of occurrences required to be <= 30 total characters.

      Thanks for any help you can provide here.

      CREATE TABLE T ( COL VARCHAR2(30));
      
      INSERT INTO T VALUES ('EMPLOYEES');
      
      INSERT INTO T VALUES ('ORDERS');
      
      INSERT INTO T VALUES ('SOME_REALLY_LONG_TABLE_NAME');
      
      INSERT INTO T VALUES ('SOME_REALLY_LONG_TABLE_NAME_4U');
      
      
      Expected output:
      
      EMPLOYEES,                          EMPLOYEES_AUDIT
      ORDERS,                             ORDERS_AUDIT
      SOME_REALLY_LONG_TABLE_NAME,        SOME_REALLY_LONG_TABL_NM_AUDIT
      SOME_REALLY_LONG_TABLE_NAME_4U,     SOME_REALLY_LNG_TBL_NM_4_AUDIT
        • 1. Re: REGEXP_REPLACE to conditionally remove vowels
          Frank Kulash
          Hi,
          Another_user wrote:
          I am dynamically creating triggers
          Make sure you really need to do this first. Dynamic anything is questionable, dynamically writing triggers is especially so.
          on most of the tables in my schema. The general naming convention is <TABLE_NAME> || AUDIT.  Oracle obviously restricts us to 30 characters in the object name so some of my DDLs would fail for the name being too long.  It is easy to generate these when there are <=  24 characters, as 24 + (AUDIT) = 30 in length. So I am looking for a regular expression to dynamically remove vowels from the string, starting at the end, and up to minimum number of occurrences required to be <= 30 total characters.
          Since you need PL/SQL anyway to create the trigger, I would use PL/SQL to strip off the vowels, starting at the end, until you get a string that's 24 characters or less. After removing all vowels you might still have names that are too long, in which case you can start removing consonants, too.
          • 2. Re: REGEXP_REPLACE to conditionally remove vowels
            Another_user
            Thanks for the insight Frank. By dynamically, I mean this is a one-time task against the data dictionary to generate the code.

            I realize this may just be easier to do via a little plsql function. I will go that route unless someone has a solution here.
            • 3. Re: REGEXP_REPLACE to conditionally remove vowels
              chris227
              hi,
              idont see a single regexp here
              with table_names as (
              select 'SOME_REALLY_LONG_TABLE_NAME' n from dual
              union all
              select 'SOME_REALLY_LONG_TABLE_NAME_4U' from dual
              --union all
              --select table_name from all_objects where ...
              )
              , r(n) as (
              select
              case
              when length(n) > 24
              then
              regexp_replace(n,'(.*)[AEIOU]([^AEIOU]*$)','\1\2')
              else n
              end n
              from table_names
              union all
              select
              case
              when length(n) > 24
              then
              regexp_replace(n,'(.*)[AEIOU]([^AEIOU]*$)','\1\2')
              else n
              end n
              from r
              where length(n) > 24
              )
              
              select
               n||'_AUDIT'
              from r
              where
              length(n) <= 24
              
              N||'_AUDIT'
              SOME_REALLY_LONG_TABL_NM_AUDIT
              SOME_REALLY_LNG_TBL_NM_4_AUDIT
              • 4. Re: REGEXP_REPLACE to conditionally remove vowels
                Frank Kulash
                Hi,
                Another_user wrote:
                Thanks for the insight Frank. By dynamically, I mean this is a one-time task against the data dictionary to generate the code.
                All the more reason to write a PL/SQL function for it.
                For example:
                CREATE OR REPLACE FUNCTION  abbrv
                (       in_txt          IN      VARCHAR2
                ,     in_target_len      IN      PLS_INTEGER     DEFAULT     24
                )
                RETURN     VARCHAR2
                DETERMINISTIC
                IS
                    actual_len     PLS_INTEGER     := LENGTH (in_txt);
                    pos          PLS_INTEGER     := actual_len;
                    return_txt     VARCHAR2 (4000)     := in_txt;
                BEGIN
                    WHILE actual_len > in_target_len
                    AND       pos          > 0
                    LOOP
                        IF  UPPER ( SUBSTR (return_txt, pos, 1)) IN ( 'A', 'E', 'I'
                                                                  , 'O', 'U', 'Y'
                                                  )
                     THEN
                          return_txt := SUBSTR (return_txt, 1      , pos - 1)
                                  || SUBSTR (return_txt, pos + 1);
                          actual_len := actual_len - 1;
                     END IF;
                
                     pos := pos - 1;
                    END LOOP;
                
                    IF  actual_len > in_target_len
                    THEN       -- still too long; just truncate it
                        return_txt := SUBSTR (return_txt, 1, in_target_len);
                    END IF;
                
                    RETURN  return_txt;
                END  abbrv;
                /
                You may want to make this a little more fool-proof. For example, somebody might pass a negative number for in_target_len.
                This query:
                SELECT       col
                ,       abbrv (col)     AS abbrv
                FROM       t
                ORDER BY  col
                ;
                shows that it works for your sample data:
                COL                            ABBRV
                ------------------------------ ------------------------
                EMPLOYEES                      EMPLOYEES
                ORDERS                         ORDERS
                SOME_REALLY_LONG_TABLE_NAME    SOME_REALLY_LONG_TABL_NM
                SOME_REALLY_LONG_TABLE_NAME_4U SOME_REALLY_LNG_TBL_NM_4
                I realize this may just be easier to do via a little plsql function. I will go that route unless someone has a solution here.
                If you really want to do this in pure SQL, what version do you want to use? REGEXP_COUNT (new in Oracle 11.1) and recursive WITH clauses (new in 11.2) could be handy.
                • 5. Re: REGEXP_REPLACE to conditionally remove vowels
                  Frank Kulash
                  Hi,

                  Here's one way using pure SQL:
                  WITH     got_num_to_cut     AS
                  (
                       SELECT     col
                       ,     GREATEST ( LENGTH (col) - 24
                                  , 0
                                  )     AS num_to_cut
                       FROM    t
                  )
                  SELECT       col
                  ,       SUBSTR ( REGEXP_REPLACE ( col
                                             , LPAD ( '$'
                                                , 1 + (20 * num_to_cut)
                                            , '[AEIOUY]([^AEIOUY]*)' -- 20 characters long
                                            )
                                        , SUBSTR ( '\1\2\3\4\5\6'
                                                  , 1
                                              , 2 * num_to_cut
                                              )
                                        )
                               , 1
                             , 24
                             )     AS abbrv
                  FROM       got_num_to_cut
                  ORDER BY  col
                  ;
                  This should work in any version of Oracle that has REGEXP_REPLACE (that is, 10.1 or higher).
                  As posted, this only removes upper-case vowels, but that's easy to fix if necessary.

                  I still don't recommend this for practical applications. A user-defined function will be much easier to debug and maintain, and probably more efficient.

                  Here's how it works:
                  Say you only wanted to remove the last vowel from the string. You could do it like this:
                  REGEXP_REPLACE ( col
                              , '[AEIOUY]([^AEIOUY]*)$'
                              , '\1'
                              )
                  If you wanted to remove 2 vowels, then you could repeat the pattern in the 2nd argument, and add '\2' to the 3rd argument, like this:
                  REGEXP_REPLACE ( col
                              , '[AEIOUY]([^AEIOUY]*)[AEIOUY]([^AEIOUY]*)$'
                              , '\1\2'
                              )
                  For any fixed number N of vowels to remove, you would just repeat the pattern in the 2nd argument (that is, the 20-character string '[AEIOUY]([^AEIOUY]*)')
                  N times, and have N backreferences in the 3rd argument.
                  However, N is not fixed; N is a variable between 0 and 6, so instead of hard-coding the 2nd are 3rd arguments, we have to derive them from N.
                  • 6. Re: REGEXP_REPLACE to conditionally remove vowels
                    Another_user
                    Thanks Chris and Frank. I did end up going with the plsql solution mostly for the reasons Frank mentioned - cleanliness, efficiency and maintainability.

                    Thanks!
                    • 7. Re: REGEXP_REPLACE to conditionally remove vowels
                      chris227
                      Even with pl/sql you may use recursion ;-)
                      CREATE OR REPLACE FUNCTION  abbrv
                      (       in_txt          IN      VARCHAR2
                      ,     in_target_len      IN      PLS_INTEGER     DEFAULT     24
                      )
                      RETURN     VARCHAR2
                      DETERMINISTIC
                      IS
                      BEGIN
                      case 
                      -- any vowels inside and too long
                      when regexp_count(in_txt,'[AEIOU]') > 0
                       -- if not 11.2 then
                       -- length(in_txt) - length(translate(in_txt,'XAEIOU','X')) != 0
                       and LENGTH(in_txt) > in_target_len
                      then
                       return abbrv (
                               REGEXP_REPLACE(
                                   in_txt
                                  ,'(.*)[AEIOU]([^AEIOU]*$)'
                                  ,'\1\2'), in_target_len
                               );
                      else
                        return in_txt;
                      end case;
                      END  abbrv;
                      
                      with table_names as (
                      select 'SOME_REALLY_LONG_TABLE_NAME' n from dual
                      union all
                      select 'SOME_REALLY_LONG_TABLE_NAME_4U' from dual
                      )
                      
                      select
                      abbrv(n,24)
                      from table_names
                      
                      ABBRV(N,24) 
                      SOME_REALLY_LONG_TABL_NM 
                      SOME_REALLY_LNG_TBL_NM_4