11 Replies Latest reply on Jan 17, 2011 3:58 PM by 4999

    Strange problem with REGEXP_LIKE

    4999
      Dear all,

      I am facing a strange problem with a query using the function REGEXP_LIKE.
      I want to filter a set of strings which can contain some special characters.

      See this:
      SELECT * FROM (
      SELECT 1 line_no, '01!' char_string FROM dual
      UNION
      SELECT 2 line_no, '02"' char_string FROM dual
      UNION
      SELECT 3 line_no, '03/' char_string FROM dual
      UNION
      SELECT 4 line_no, '04(' char_string FROM dual
      UNION
      SELECT 5 line_no, '05)' char_string FROM dual
      UNION
      SELECT 6 line_no, '06=' char_string FROM dual
      UNION
      SELECT 7 line_no, '07''' char_string FROM dual
      UNION
      SELECT 8 line_no, '08#' char_string FROM dual
      UNION
      SELECT 9 line_no, '09+' char_string FROM dual
      UNION
      SELECT 10 line_no, '10-' char_string FROM dual
      UNION
      SELECT 11 line_no, '11,' char_string FROM dual
      UNION
      SELECT 12 line_no, '12_' char_string FROM dual
      UNION
      SELECT 13 line_no, '13&' char_string FROM dual
      UNION
      SELECT 14 line_no, '14 ' char_string FROM dual
      UNION
      SELECT 15 line_no, '15.' char_string FROM dual
      UNION
      SELECT 16 line_no, '16,' char_string FROM dual
      UNION
      SELECT 17 line_no, '17@' char_string FROM dual
      UNION
      SELECT 18 line_no, '18:' char_string FROM dual
      UNION
      SELECT 19 line_no, '19;' char_string FROM dual
      UNION
      SELECT 20 line_no, '20[' char_string FROM dual
      UNION
      SELECT 21 line_no, '21]' char_string FROM dual
      UNION
      SELECT 22 line_no, '22\' char_string FROM dual
      UNION
      SELECT 23 line_no, '23<' char_string FROM dual
      UNION
      SELECT 24 line_no, '24>' char_string FROM dual
      UNION
      SELECT 25 line_no, '25?' char_string FROM dual
      UNION
      SELECT 26 line_no, '26*' char_string FROM dual
      UNION
      SELECT 27 line_no, '27$' char_string FROM dual
      )
      WHERE REGEXP_LIKE(char_string, '[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:][\!\"\/\(\)\=\''\#\+\-\,\_\&\%\ \.\,\@\:\;\[\]\\\<\>\?]]+')
      ;
      This query should return all lines up to no 25. Lines 26 and 27 should be filtered out.
      But this query doesn't return any lines.

      Some remarks :
      I do list all valid special characters explicitly because [A-Za-z] or [:alpha:] does not work properly for NLS-specific characters (which should be filtered out).
      I do not use char class [:punct:] because characters like "*" or "$" (which are part of class [:punct:]) should be filtered out as well (see lines 26 and 27).

      First I doubted the char "]" to end the last char class, but as this is escaped with char "\" this should work in the expected way.
      Query does not work either if I leave out the square brackets for the char class of special characters.
      I also tried to replace "\[\]" with the corresponding hex codes ("\x5B\x5D") but this returns all lines, including lines 26 and 27.
      Same result when I leave out "\[\]" .

      This seems kind of odd to me.

      Does any body have some ideas on this?

      Regards
      Kai
        • 1. Re: Strange problem with REGEXP_LIKE
          MichaelS
          First I doubted the char "]" to end the last char class, but as this is escaped with char "\" this should work in the expected way.
          check Re: REGEXP_REPLACE with ^ character !
          • 2. Re: Strange problem with REGEXP_LIKE
            BluShadow
            Square brackets at the start and minus sign at the end... oh, and you don't need to escape all the punctuation characters, just the "\" and obviously the single quote...
            SQL> ed
            Wrote file afiedt.buf
            
              1  SELECT * FROM (
              2  SELECT 1 line_no, '01!' char_string FROM dual UNION ALL
              3  SELECT 2 line_no, '02"' char_string FROM dual UNION ALL
              4  SELECT 3 line_no, '03/' char_string FROM dual UNION ALL
              5  SELECT 4 line_no, '04(' char_string FROM dual UNION ALL
              6  SELECT 5 line_no, '05)' char_string FROM dual UNION ALL
              7  SELECT 6 line_no, '06=' char_string FROM dual UNION ALL
              8  SELECT 7 line_no, '07''' char_string FROM dual UNION ALL
              9  SELECT 8 line_no, '08#' char_string FROM dual UNION ALL
             10  SELECT 9 line_no, '09+' char_string FROM dual UNION ALL
             11  SELECT 10 line_no, '10-' char_string FROM dual UNION ALL
             12  SELECT 11 line_no, '11,' char_string FROM dual UNION ALL
             13  SELECT 12 line_no, '12_' char_string FROM dual UNION ALL
             14  SELECT 13 line_no, '13&' char_string FROM dual UNION ALL
             15  SELECT 14 line_no, '14 ' char_string FROM dual UNION ALL
             16  SELECT 15 line_no, '15.' char_string FROM dual UNION ALL
             17  SELECT 16 line_no, '16,' char_string FROM dual UNION ALL
             18  SELECT 17 line_no, '17@' char_string FROM dual UNION ALL
             19  SELECT 18 line_no, '18:' char_string FROM dual UNION ALL
             20  SELECT 19 line_no, '19;' char_string FROM dual UNION ALL
             21  SELECT 20 line_no, '20[' char_string FROM dual UNION ALL
             22  SELECT 21 line_no, '21]' char_string FROM dual UNION ALL
             23  SELECT 22 line_no, '22\' char_string FROM dual UNION ALL
             24  SELECT 23 line_no, '23&lt;' char_string from dual UNION ALL
             25  SELECT 23 line_no, '24&gt;' char_string FROM dual UNION ALL
             26  SELECT 25 line_no, '25?' char_string FROM dual UNION ALL
             27  SELECT 26 line_no, '26*' char_string FROM dual UNION ALL
             28  SELECT 27 line_no, '27$' char_string FROM dual)
             29* WHERE REGEXP_LIKE(char_string, '[][ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:blank:]!"/()=''#+,_&.@:;\\&lt;&gt;?-]+')
            SQL> /
            
               LINE_NO CHA
            ---------- ---
                     1 01!
                     2 02"
                     3 03/
                     4 04(
                     5 05)
                     6 06=
                     7 07'
                     8 08#
                     9 09+
                    10 10-
                    11 11,
                    12 12_
                    13 13&
                    14 14
                    15 15.
                    16 16,
                    17 17@
                    18 18:
                    19 19;
                    20 20[
                    21 21]
                    22 22\
                    23 23<
                    23 24>
                    25 25?
            
            25 rows selected.
            {code}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
            • 3. Re: Strange problem with REGEXP_LIKE
              4999
              Thanks for your input, Michael.
              So experimented with using the special characters square brackets as the first characters and the minus sign as the last.

              But ...
              ... changing my WHERE-clause to
              WHERE REGEXP_LIKE(char_string, '[][ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:][\!\"\/\(\)\=\''\#\+\,\_\&\%\ \.\,\@\:\;\\\<\>\?\-]]+')
              {code}
              results in line 21 ("21]") only (which is not correct).
              
              ... changing it to
              {code}
              WHERE REGEXP_LIKE(char_string, '[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:][[]\!\"\/\(\)\=\''\#\+\,\_\&\%\ \.\,\@\:\;\\\<\>\?\-]]+')
              shows no results at all.

              ... changing it to
              WHERE REGEXP_LIKE(char_string, '[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:][\[\]\!\"\/\(\)\=\''\#\+\,\_\&\%\ \.\,\@\:\;\\\<\>\?\-]]+')
              {code}
              
              shows no results at all, too.
              
              So this is no improvement.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
              • 4. Re: Strange problem with REGEXP_LIKE
                MichaelS
                Probably you need the $ (-end) sign as well:
                SQL> select * from (
                    select 1 line_no, '01!' char_string from dual union
                    select 2 line_no, '02"' char_string from dual union
                    select 3 line_no, '03/' char_string from dual union
                    select 4 line_no, '04(' char_string from dual union
                    select 5 line_no, '05)' char_string from dual union
                    select 6 line_no, '06=' char_string from dual union
                    select 7 line_no, '07''' char_string from dual union
                    select 8 line_no, '08#' char_string from dual union
                    select 9 line_no, '09+' char_string from dual union
                    select 10 line_no, '10-' char_string from dual union
                    select 11 line_no, '11,' char_string from dual union
                    select 12 line_no, '12_' char_string from dual union
                    select 13 line_no, '13&' char_string from dual union
                    select 14 line_no, '14 ' char_string from dual union
                    select 15 line_no, '15.' char_string from dual union
                    select 16 line_no, '16,' char_string from dual union
                    select 17 line_no, '17@' char_string from dual union
                    select 18 line_no, '18:' char_string from dual union
                    select 19 line_no, '19;' char_string from dual union
                    select 20 line_no, '20[' char_string from dual union
                    select 21 line_no, '21]' char_string from dual union
                    select 22 line_no, '22' char_string from dual union
                    select 23 line_no, '23<''24>' char_string from dual union
                    select 25 line_no, '25?' char_string from dual union
                    select 26 line_no, '26*' char_string from dual union
                    select 27 line_no, '27$' char_string from dual
                )
                --
                --
                where regexp_like(char_string, '[][ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:]!"/()=''#+,_&% .,@:;<>?-]+$')
                /
                   LINE_NO CHAR_STRING
                ---------- -----------
                         1 01!        
                         2 02"        
                         3 03/        
                         4 04(        
                         5 05)        
                         6 06=        
                         7 07'        
                         8 08#        
                         9 09+        
                        10 10-        
                        11 11,        
                        12 12_        
                        13 13&        
                        14 14         
                        15 15.        
                        16 16,        
                        17 17@        
                        18 18:        
                        19 19;        
                        20 20[        
                        21 21]        
                        22 22         
                        23 23<'24>    
                        25 25?        
                
                24 rows selected.
                {code}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
                • 5. Re: Strange problem with REGEXP_LIKE
                  4999
                  Thanks, Michael.

                  I did not update this thread during my "experiments", so I did not notice your last reply.

                  But now it works like I expected it.

                  Made my day
                  • 6. Re: Strange problem with REGEXP_LIKE
                    4999
                    Update

                    Tried your tip in the "Real" world.
                    I want to get a list of job titles which contain invalid characters.

                    ... like this
                    SELECT
                    to_char(rownum,'009') rownumber,
                    JOB_TITLE 
                    FROM PERSON
                    WHERE REGEXP_LIKE(JOB_TITLE, '[^[]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:]!"/()=''#+,_&.@:;\\<>?-]+')
                    AND ROWNUM  <= 25
                    {code}
                    
                    ... and it returned no lines at all.
                    
                    Please note that I here use a negation of the allowed characters.
                    
                    To give you an impression of the expected output I removed the square brackets (which produces wrong output, too)
                    means
                    {code}
                    WHERE REGEXP_LIKE(JOB_TITLE, '[^ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:]!"/()=''#+,_&.@:;\\<>?-]+')
                    results in
                     001     DIV. MGR.
                     002     OFFICE MGR.
                     003     DRAY MGR.
                     004     DRAY MGR.
                     005     TERM. MGR.
                     006     TERM. MGR.
                     007     DIV. MGR.
                     008     LEITER ÜBERSEE
                     009     MANAGER LEAF LOG.
                     010     HANDLUNGSBEVOLLM.
                     011     ING.
                     012     BRANCH OFFICE MGR.
                     013     PROD. MGR SEAFREIGHT
                     014     ASSISTANT CHART. MGR
                     015     LTG. ZENTR. VERKAUF
                     016     LEITER DER GESCHÄFTS
                     017     STELLV. ABTEILUNGSLE
                     018     USA I.V.
                     019     EDI COMPETENCE CTR.
                     020     PURCHASING PROC.MANA
                     021     ING.
                     022     DR.
                     023     DIPL.-KAUFFRAU
                     024     IMPORT&EXPORT REP.
                     025     OPERATION MER.
                    Only the row numbers 008 and 016 are expected in the output.

                    So, not only that the square brackets as the first characters won't work even job titles with "." and "-" are listed which should not.

                    Any ideas what went wrong here?

                    Kai
                    • 7. Re: Strange problem with REGEXP_LIKE
                      MichaelS
                      note that I here use a negation of the allowed characters.
                      what if you simply use
                       where NOT regexp_like (.....)
                      ?
                      • 8. Re: Strange problem with REGEXP_LIKE
                        4999
                        This ...
                        WHERE NOT REGEXP_LIKE(JOB_TITLE, '[[]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:]!"/()=''#+,_&.@:;\\<>?-]+')
                        {code}
                        ... returns all rows where JOB_TITLE is not null
                        
                        This ... (without square brackets)
                        {code}
                        WHERE NOT REGEXP_LIKE(JOB_TITLE, '[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:]!"/()=''#+,_&.@:;\\<>?-]+')
                        returns no rows.
                        • 9. Re: Strange problem with REGEXP_LIKE
                          Sven W.
                          As a new approach you can use the translate function. It should be faster than any regexp_ expression and seems perfectly suited for your task.

                          untested
                          select
                             to_char(rownum,'000') rownumber,
                             JOB_TITLE 
                          from PERSON
                          where trim(translate (job_title, 
                                            'Ä[]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 !"/()=''#+,_&..@:;\\?-]+',
                                            'Ä')) is not null
                          and rownum <= 25;
                          {code}
                          
                          Translate will replace the specified chars. In this case it will remove all the chars from the list. Ä will be replaced with Ä (translate needs a not null third parameter). Any remaining chars are the special characters that you are looking for.
                          
                          Edited by: Sven W. on Jan 17, 2011 4:22 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
                          • 10. Re: Strange problem with REGEXP_LIKE
                            MichaelS
                            SQL> with t as (
                             select 001 id, 'DIV. MGR.' job_title from dual union all
                             select 002, 'OFFICE MGR.' from dual union all
                             select 003, 'DRAY MGR.' from dual union all
                             select 004, 'DRAY MGR.' from dual union all
                             select 005, 'TERM. MGR.' from dual union all
                             select 006, 'TERM. MGR.' from dual union all
                             select 007, 'DIV. MGR.' from dual union all
                             select 008, 'LEITER ÜBERSEE' from dual union all
                             select 009, 'MANAGER LEAF LOG.' from dual union all
                             select 010, 'HANDLUNGSBEVOLLM.' from dual union all
                             select 011, 'ING.' from dual union all
                             select 012, 'BRANCH OFFICE MGR.' from dual union all
                             select 013, 'PROD. MGR SEAFREIGHT' from dual union all
                             select 014, 'ASSISTANT CHART. MGR' from dual union all
                             select 015, 'LTG. ZENTR. VERKAUF' from dual union all
                             select 016, 'LEITER DER GESCHÄFTS' from dual union all
                             select 017, 'STELLV. ABTEILUNGSLE' from dual union all
                             select 018, 'USA I.V.' from dual union all
                             select 019, 'EDI COMPETENCE CTR.' from dual union all
                             select 020, 'PURCHASING PROC.MANA' from dual union all
                             select 021, 'ING.' from dual union all
                             select 022, 'DR.' from dual union all
                             select 023, 'DIPL.-KAUFFRAU' from dual union all
                             select 024, 'IMPORT&EXPORT REP.' from dual union all
                             select 025, 'OPERATION MER.' from dual
                            )
                            --
                            --
                            select * from t
                             where not regexp_like(job_title, '^[][ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[:digit:][:blank:]!\"/()=''#+,_&.@:;?-]+$')
                            /
                                    ID JOB_TITLE           
                            ---------- --------------------
                                     8 LEITER ÜBERSEE      
                                    16 LEITER DER GESCHÄFTS
                            
                            2 rows selected.
                            {code}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
                            • 11. Re: Strange problem with REGEXP_LIKE
                              4999
                              Nice idea, really.

                              But, there seems to be a problem with the point char "."

                              I used this query with the translated_job_title as the second column to make it visible here.
                              SELECT
                                 to_char(ROWNUM,'000') rownumber,
                                 trim (translate (job_title, 
                                                'Ä[]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 !"/()=''#+,_&.@:;\\?-',
                                                'Ä')) translated_job_title
                                 ,JOB_TITLE
                              FROM PERSON
                              WHERE trim (translate (job_title, 
                                                'Ä[]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 !"/()=''#+,_&.@:;\\?-',
                                                'Ä')) IS NOT NULL
                              AND ROWNUM <= 25
                              ... which resulted in ...
                               001     ..     DIV. MGR.
                               002     .     OFFICE MGR.
                               003     .     DRAY MGR.
                               004     .     DRAY MGR.
                               005     ..     TERM. MGR.
                               006     ..     TERM. MGR.
                               007     ..     DIV. MGR.
                               008     Ü     LEITER ÜBERSEE
                               009     .     MANAGER LEAF LOG.
                               010     .     HANDLUNGSBEVOLLM.
                               011     .     ING.
                               012     .     BRANCH OFFICE MGR.
                               013     .     PROD. MGR SEAFREIGHT
                               014     .     ASSISTANT CHART. MGR
                               015     ..     LTG. ZENTR. VERKAUF
                               016     Ä     LEITER DER GESCHÄFTS
                               017     .     STELLV. ABTEILUNGSLE
                               018     ..     USA I.V.
                               019     .     EDI COMPETENCE CTR.
                               020     .     PURCHASING PROC.MANA
                               021     .     ING.
                               022     .     DR.
                               023     .     DIPL.-KAUFFRAU
                               024     .     IMPORT&EXPORT REP.
                               025     .     OPERATION MER.
                              So, besides the records with a point in the job_title column it works perfect.

                              I now tried to escape the point character like I would do in a regular expression ....
                              SELECT
                                 to_char(ROWNUM,'000') rownumber,
                                 trim (translate (job_title, 
                                                'Ä[]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 !"/()=''#+,_&\.@:;\\?-',
                                                'Ä')) translated_job_title
                                 ,JOB_TITLE
                              FROM PERSON
                              WHERE trim (translate (job_title, 
                                                'Ä[]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 !"/()=''#+,_&\.@:;\\?-',
                                                'Ä')) IS NOT NULL
                              AND ROWNUM <= 25
                              ... and it worked!

                              Result:
                               001     Ü     LEITER ÜBERSEE
                               002     Ä     LEITER DER GESCHÄFTS
                               003     ¿     SALES REPRESENTAT¿VE
                               004     Ç     ADMINISTRAÇAO VENDAS
                               005     ÇÕ     GERENTE OPERAÇÕES
                               006     Ñ     DUEÑA DE LAMAR
                               007     ÇÕ     GERENTE DE OPERAÇÕES
                               008     Í     COMPRAS E LOGÍSTICA
                               009     Í     LOGÍSTICA
                               010     Á     PROPRIETÁRIO
                               011     Á     ENGENHEIRO RESPONSÁV
                               012     Á     PROPRIETÁRIO
                               013     Á     PROPRIETÁRIO
                               014     Ó     SÓCIO
                               015     Çà    EXPEDIÇÃO
                               016     Ó     SÓCIO
                               017     ÇÕ     OPERAÇÕES
                               018     Ý     OPERATÝON MANAGER
                               019     Ó     SÓCIO
                               020     Á     PROPRIETÁRIO
                               021     Í     COMPRAS E LOGÍSTICA
                               022     Í     LOGÍSTICA
                               023     Á     PROPRIETÁRIO DUNORTE
                               024     ÄÜ     GESCHÄFTSFÜHRERIN
                               025     Ä     STELLV. GESCHÄFTSF.
                               026     Í     LOGÍSTICA
                               027     ÝÝ     MARKETÝNG EXECUTÝVE
                               028     Ý     EXPORT DÝRECTOR
                               029     Ý     MARKETÝNG & ACC MNGR
                               030     Ý     CHAÝRPERSON
                               031     ÝÝ     MANAGÝNG DÝRECTOR
                               032     Ý     NATÝONAL S/F MNGR
                               033     ÝÝ     VÝCE PRESÝDENT
                               034     Ñ     GER ING Y DISEÑO
                              which shows all records with invalid characters in column job_title.

                              Thank you Sven.
                              You put me on the right track.