1 2 Previous Next 17 Replies Latest reply: Apr 8, 2013 5:59 PM by user1007 RSS

    Remove special characters from string using regexp_replace

    pl/sql novice
      Hi,
      I'm writing a function to remove special characters and non-printable characters that users have accidentally entered into CSV files. I've looked at the ASCII character map, and basically, for every varchar2 field, I'd like to keep characters inside the range from chr(32) to chr(126), and convert every other character in the string to '', which is nothing.
      Below is my draft PL/SQL statement:

      .....(inside my function that takes an input string)
      processed_string := regexp_replace(input_string, '( [^[:graph:] | ^[:blank:]] )', '');
      ......(rest of function)

      I've tried this in SQLPlus (using select.... from dual) and using something like 'This is a test $%&^*&*' as input_string, and nothing is replaced. I also tried the following:

      select regexp_replace('This is a test $%&^*&*', '( [[:graph:] | [:blank:]] )', '$') from dual;

      And the whole string is turned into '$$$$$$$$...... So I'm guessing here (a BIG guess) the opposite MIGHT work too.

      Do you think this statement will work? Since I don't know how to specify a range using chr() in regular expression, that's why I'm using posix. However, I'm new to this construct, and I'm having doubts especially with the '|' and '^' operators in the expression. Can you shed some light on this?

      Many thanks.
        • 1. Re: Remove special characters from string using regexp_replace
          BEDE
          I believe regexp can't handle special chars, but only printable chars. So, if you have a ^G char, you can't eliminate it with regexp_replace.
          I would create a PL/SQL table containing non-printable chars and perform a classic replace for every element of the PL/SQL table. And put all that in a function in a package, in the package initialization having what it takes to populate the PL/SQL table of non-printable chars.
          • 2. Re: Remove special characters from string using regexp_replace
            113128
            Try this

            select regexp_replace('This is a test $%&^*&* ', '( *[[:punct:]])', '') from dual;

            :) not correct forget it

            Message was edited by:
            tehbc98
            • 3. Re: Remove special characters from string using regexp_replace
              BluShadow
              Try this

              select regexp_replace('This is a test $%&^*&* ', '(
              *[[:punct:]])', '') from dual;
              Doesn't quite cover the OP's requirements of keeping only the chrs in the range 32 to 126...
              SQL> ed
              Wrote file afiedt.buf
              
                1* select regexp_replace('This is a test $%&^*&*'||chr(10)||'*"ú%bsfd.', '( *[[:punct:]])', '') from dual
              SQL> /
              
              REGEXP_REPLACE('THI
              -------------------
              This is a test
              bsfd
              
              
              Elapsed: 00:00:00.00
              SQL>
              • 4. Re: Remove special characters from string using regexp_replace
                pl/sql novice
                Hi, I just tried the following test cases:

                select length('Str' || chr(10) || 'ing' || chr(13)) from dual;

                Result: 8

                select regexp_replace('Str' || chr(10) || 'ing' || chr(13), '([^[:graph:] | ^[:blank:]])', '$') from dual;

                Result: Str$ing$

                so I guess regexp_replace can identify special characters, but I'm not sure my expression can successfully shut out all special characters.
                • 5. Re: Remove special characters from string using regexp_replace
                  572471
                  so I guess regexp_replace can identify special
                  characters, but I'm not sure my expression can
                  successfully shut out all special characters.
                  You can check what characters [:graph:], [:blank:] and other classes are standing for by using the following query:
                  select CHR(level),
                         decode(chr(level), regexp_substr(chr(level), '[[:graph:]]'), 1) is_graph,
                         decode(chr(level), regexp_substr(chr(level), '[[:blank:]]'), 1) is_blank,
                         decode(chr(level), regexp_substr(chr(level), '[[:alnum:]]'), 1) is_alnum,
                         decode(chr(level), regexp_substr(chr(level), '[[:alpha:]]'), 1) is_alpha,
                         decode(chr(level), regexp_substr(chr(level), '[[:digit:]]'), 1) is_digit,
                         decode(chr(level), regexp_substr(chr(level), '[[:cntrl:]]'), 1) is_cntrl,
                         decode(chr(level), regexp_substr(chr(level), '[[:lower:]]'), 1) is_lower,
                         decode(chr(level), regexp_substr(chr(level), '[[:upper:]]'), 1) is_upper,
                         decode(chr(level), regexp_substr(chr(level), '[[:print:]]'), 1) is_print,
                         decode(chr(level), regexp_substr(chr(level), '[[:punct:]]'), 1) is_punct,
                         decode(chr(level), regexp_substr(chr(level), '[[:space:]]'), 1) is_space,
                         decode(chr(level), regexp_substr(chr(level), '[[:xdigit:]]'), 1) is_xdigit
                    from dual
                  connect by level <= 300
                  Well, actually if you want to leave characters from 32 up to 126 - you can form the pattern you need by yourself.
                  E.g.:
                  SQL> set serveroutput on;
                  SQL> 
                  SQL> declare
                    2      pattern varchar2(100) := '][';
                    3      str     varchar2(100) := 'Str' || chr(10) || 'ing' || chr(13);
                    4    begin
                    5      for i in 32 .. 126 loop
                    6        pattern := pattern || case when chr(i) = '''' then ''''''
                    7                                   when chr(i) in ('[', ']') then null
                    8                                   else chr(i)
                    9                              end;
                   10      end loop;
                   11      dbms_output.put_line('pattern: ''' || pattern||'''');
                   12      dbms_output.put_line('str_old: ' || str);
                   13      str := regexp_replace(str, '([' || pattern || '])|.', '\1', 1, 0, 'n');
                   14      dbms_output.put_line('str_new: ' || str);
                   15    end;
                   16  /
                  
                  pattern: '][ !"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~'
                  str_old: Str
                  ing
                  
                  str_new: String
                  
                  PL/SQL procedure successfully completed
                  
                  SQL> 
                  • 6. Re: Remove special characters from string using regexp_replace
                    pl/sql novice
                    Thanks so much for the suggestion, this function is neat! Well, I just discovered another special character (actually within the range chr(32)-chr(126)).
                    This character is chr(38), the '&' character.

                    Here is my test case:
                    in sqlplus:

                    SQL> select replace('OSG-1;OSG-1;DFWRW\DFWW&SDFWW;SDFS~1~^DFSF', chr(38), '') from dual;
                    Enter value for sdfww:
                    (here I just hit Enter)
                    old 1: select replace('OSG-1;OSG-1;DFWRW\DFWW&SDFWW;SDFS~1~^DFSF', chr(38), '') from dual;
                    new 2: select replace('OSG-1;OSG-1;DFWRW\DFWW;SDFS~1~^DFSF', chr(38), '') from dual;

                    REPLACE('OSG-1;OSG-1;DFWRW\DFWW;SDF
                    -----------------------------------
                    OSG-1;OSG-1;DFWRW\DFWW;SDFS~1~^DFSF

                    The answer is wrong since it got rid of extra characters.
                    Besides, I intend to put this statement inside a stored procedure (like your suggestion) invoked by a GUI and nobody will hit the enter key manually, or type "set define off" in the xterm. Do you know how to get rid of the '&' character inside a stored procedure? Many thanks.
                    • 7. Re: Remove special characters from string using regexp_replace
                      572471
                      Ampersands

                      or maybe it is possible for you to get the input string not as 'blabla&blabla', but as 'blabla'||'&'||'blabla' or 'blabla'||chr(38)||'blabla'?
                      • 8. Re: Remove special characters from string using regexp_replace
                        Sven W.
                        Hi,
                        I'm writing a function to remove special characters and non-printable characters that users have accidentally entered into CSV files.
                        ...

                        So you try to program some automat, that is more clever than your users? It will have to distinguish between accidentally entered special chars and not-accidentally entered special chars.

                        Good luck to you, explaining them, why their currency symbol suddenly is missing.
                        • 9. Re: Remove special characters from string using regexp_replace
                          cd_2
                          I've seen someone using the CONVERT function to check for special characters.

                          C.
                          • 10. Re: Remove special characters from string using regexp_replace
                            pl/sql novice
                            I just tried this command:
                            set escape \
                            which uses '\' as escape character.

                            After I ran this command, I tried this:
                            SQL> select replace('OSG-1;OSG-1;DFWRW\DFWW\&SDFWW;SDFS~1~^DFSF',chr(38),'') from dual;

                            REPLACE('OSG-1;OSG-1;DFWRWDFWW&SDFWW;SD
                            ---------------------------------------
                            OSG-1;OSG-1;DFWRWDFWWSDFWW;SDFS~1~^DFSF

                            So I get it works. But as Sven pointed out, users might not understand why some characters are reserved and not allowed (don't worry, I always get their agreement before changing specs). Another problem is, what if user enters the '\' as part of data string? Therefore I'm now requesting to reserve both '\' and '&' as special characters, not sure what else I need to reserve in future though.
                            • 11. Re: Remove special characters from string using regexp_replace
                              Yasu
                              Hi Volder,

                              It would great helpful for me if you can help me in below mentioned thread, where i used the function provided by you, but its not giving the desired output.

                              Re: Removing non-printable characters

                              Thanks,
                              Yasser
                              • 12. Re: Remove special characters from string using regexp_replace
                                817616
                                Hi,

                                This will remove all special charcters and spaces in the string.

                                select regexp_replace('This is a test .,_-$%&^& ', '[^0-9A-Za-z]', '') from dual;
                                • 13. Re: Remove special characters from string using regexp_replace
                                  714088
                                  I hate to dig up an old post but this might have saved me a couple of hours so maybe I can do the same for someone else.

                                  In response to the original question: Despite some misleading documentation about the internet, oracle does not support the \x1D hexidecimal switch (which would have allowed for a nice and neat regexp of "[^\x20-\x7E]". In fact according to the documentation below, no escape sequences at all are understood in a character set (or list as oracle calls them):

                                  The following regular expression operators are allowed within the character list, any other metacharacters included in a character list lose their special meaning (are treated as literals):

                                  Range operator '-'
                                  POSIX character class [::]
                                  POSIX collating sequence [. .]
                                  POSIX character equivalence class [= =]
                                  (http://download.oracle.com/docs/cd/B14117_01/appdev.101/b10795/adfns_re.htm#1007566)

                                  However, it does appear that the range operator does support ranges over the entire language (assuming your NLS_SORT is binary I think). This simple regex worked for me:

                                  select regexp_replace('This is a test '||chr(191), '[^ -~]', '$') from dual;

                                  In other words, all characters between space and tilde which meets the requirement of chars 32-127. I realize a few chars in that set were added on but there's no reason that the range couldn't be split into multiple ranges, excluding those couple of chars.
                                  • 14. Re: Remove special characters from string using regexp_replace
                                    714088
                                    By the way, in my case I wanted to allow the entire ascii set so (including tabs, cr, lf, etc):

                                    select regexp_replace('This is a test crlf'||chr(10)|| 'blah', '[^' || chr(1) ||'-~]', '$') from dual;
                                    1 2 Previous Next