This discussion is archived
1 2 Previous Next 17 Replies Latest reply: Apr 8, 2013 3:59 PM by user1007 RSS

Remove special characters from string using regexp_replace

pl/sql novice Newbie
Currently Being Moderated
Hi,
I'm writing a function to remove special characters and non-printable characters that users have accidentally entered into CSV files. I've looked at the ASCII character map, and basically, for every varchar2 field, I'd like to keep characters inside the range from chr(32) to chr(126), and convert every other character in the string to '', which is nothing.
Below is my draft PL/SQL statement:

.....(inside my function that takes an input string)
processed_string := regexp_replace(input_string, '( [^[:graph:] | ^[:blank:]] )', '');
......(rest of function)

I've tried this in SQLPlus (using select.... from dual) and using something like 'This is a test $%&^*&*' as input_string, and nothing is replaced. I also tried the following:

select regexp_replace('This is a test $%&^*&*', '( [[:graph:] | [:blank:]] )', '$') from dual;

And the whole string is turned into '$$$$$$$$...... So I'm guessing here (a BIG guess) the opposite MIGHT work too.

Do you think this statement will work? Since I don't know how to specify a range using chr() in regular expression, that's why I'm using posix. However, I'm new to this construct, and I'm having doubts especially with the '|' and '^' operators in the expression. Can you shed some light on this?

Many thanks.
  • 1. Re: Remove special characters from string using regexp_replace
    BEDE Explorer
    Currently Being Moderated
    I believe regexp can't handle special chars, but only printable chars. So, if you have a ^G char, you can't eliminate it with regexp_replace.
    I would create a PL/SQL table containing non-printable chars and perform a classic replace for every element of the PL/SQL table. And put all that in a function in a package, in the package initialization having what it takes to populate the PL/SQL table of non-printable chars.
  • 2. Re: Remove special characters from string using regexp_replace
    113128 Newbie
    Currently Being Moderated
    Try this

    select regexp_replace('This is a test $%&^*&* ', '( *[[:punct:]])', '') from dual;

    :) not correct forget it

    Message was edited by:
    tehbc98
  • 3. Re: Remove special characters from string using regexp_replace
    BluShadow Guru Moderator
    Currently Being Moderated
    Try this

    select regexp_replace('This is a test $%&^*&* ', '(
    *[[:punct:]])', '') from dual;
    Doesn't quite cover the OP's requirements of keeping only the chrs in the range 32 to 126...
    SQL> ed
    Wrote file afiedt.buf
    
      1* select regexp_replace('This is a test $%&^*&*'||chr(10)||'*"ú%bsfd.', '( *[[:punct:]])', '') from dual
    SQL> /
    
    REGEXP_REPLACE('THI
    -------------------
    This is a test
    bsfd
    
    
    Elapsed: 00:00:00.00
    SQL>
  • 4. Re: Remove special characters from string using regexp_replace
    pl/sql novice Newbie
    Currently Being Moderated
    Hi, I just tried the following test cases:

    select length('Str' || chr(10) || 'ing' || chr(13)) from dual;

    Result: 8

    select regexp_replace('Str' || chr(10) || 'ing' || chr(13), '([^[:graph:] | ^[:blank:]])', '$') from dual;

    Result: Str$ing$

    so I guess regexp_replace can identify special characters, but I'm not sure my expression can successfully shut out all special characters.
  • 5. Re: Remove special characters from string using regexp_replace
    572471 Newbie
    Currently Being Moderated
    so I guess regexp_replace can identify special
    characters, but I'm not sure my expression can
    successfully shut out all special characters.
    You can check what characters [:graph:], [:blank:] and other classes are standing for by using the following query:
    select CHR(level),
           decode(chr(level), regexp_substr(chr(level), '[[:graph:]]'), 1) is_graph,
           decode(chr(level), regexp_substr(chr(level), '[[:blank:]]'), 1) is_blank,
           decode(chr(level), regexp_substr(chr(level), '[[:alnum:]]'), 1) is_alnum,
           decode(chr(level), regexp_substr(chr(level), '[[:alpha:]]'), 1) is_alpha,
           decode(chr(level), regexp_substr(chr(level), '[[:digit:]]'), 1) is_digit,
           decode(chr(level), regexp_substr(chr(level), '[[:cntrl:]]'), 1) is_cntrl,
           decode(chr(level), regexp_substr(chr(level), '[[:lower:]]'), 1) is_lower,
           decode(chr(level), regexp_substr(chr(level), '[[:upper:]]'), 1) is_upper,
           decode(chr(level), regexp_substr(chr(level), '[[:print:]]'), 1) is_print,
           decode(chr(level), regexp_substr(chr(level), '[[:punct:]]'), 1) is_punct,
           decode(chr(level), regexp_substr(chr(level), '[[:space:]]'), 1) is_space,
           decode(chr(level), regexp_substr(chr(level), '[[:xdigit:]]'), 1) is_xdigit
      from dual
    connect by level <= 300
    Well, actually if you want to leave characters from 32 up to 126 - you can form the pattern you need by yourself.
    E.g.:
    SQL> set serveroutput on;
    SQL> 
    SQL> declare
      2      pattern varchar2(100) := '][';
      3      str     varchar2(100) := 'Str' || chr(10) || 'ing' || chr(13);
      4    begin
      5      for i in 32 .. 126 loop
      6        pattern := pattern || case when chr(i) = '''' then ''''''
      7                                   when chr(i) in ('[', ']') then null
      8                                   else chr(i)
      9                              end;
     10      end loop;
     11      dbms_output.put_line('pattern: ''' || pattern||'''');
     12      dbms_output.put_line('str_old: ' || str);
     13      str := regexp_replace(str, '([' || pattern || '])|.', '\1', 1, 0, 'n');
     14      dbms_output.put_line('str_new: ' || str);
     15    end;
     16  /
    
    pattern: '][ !"#$%&''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~'
    str_old: Str
    ing
    
    str_new: String
    
    PL/SQL procedure successfully completed
    
    SQL> 
  • 6. Re: Remove special characters from string using regexp_replace
    pl/sql novice Newbie
    Currently Being Moderated
    Thanks so much for the suggestion, this function is neat! Well, I just discovered another special character (actually within the range chr(32)-chr(126)).
    This character is chr(38), the '&' character.

    Here is my test case:
    in sqlplus:

    SQL> select replace('OSG-1;OSG-1;DFWRW\DFWW&SDFWW;SDFS~1~^DFSF', chr(38), '') from dual;
    Enter value for sdfww:
    (here I just hit Enter)
    old 1: select replace('OSG-1;OSG-1;DFWRW\DFWW&SDFWW;SDFS~1~^DFSF', chr(38), '') from dual;
    new 2: select replace('OSG-1;OSG-1;DFWRW\DFWW;SDFS~1~^DFSF', chr(38), '') from dual;

    REPLACE('OSG-1;OSG-1;DFWRW\DFWW;SDF
    -----------------------------------
    OSG-1;OSG-1;DFWRW\DFWW;SDFS~1~^DFSF

    The answer is wrong since it got rid of extra characters.
    Besides, I intend to put this statement inside a stored procedure (like your suggestion) invoked by a GUI and nobody will hit the enter key manually, or type "set define off" in the xterm. Do you know how to get rid of the '&' character inside a stored procedure? Many thanks.
  • 7. Re: Remove special characters from string using regexp_replace
    572471 Newbie
    Currently Being Moderated
    Ampersands

    or maybe it is possible for you to get the input string not as 'blabla&blabla', but as 'blabla'||'&'||'blabla' or 'blabla'||chr(38)||'blabla'?
  • 8. Re: Remove special characters from string using regexp_replace
    Sven W. Guru
    Currently Being Moderated
    Hi,
    I'm writing a function to remove special characters and non-printable characters that users have accidentally entered into CSV files.
    ...

    So you try to program some automat, that is more clever than your users? It will have to distinguish between accidentally entered special chars and not-accidentally entered special chars.

    Good luck to you, explaining them, why their currency symbol suddenly is missing.
  • 9. Re: Remove special characters from string using regexp_replace
    60660 Journeyer
    Currently Being Moderated
    I've seen someone using the CONVERT function to check for special characters.

    C.
  • 10. Re: Remove special characters from string using regexp_replace
    pl/sql novice Newbie
    Currently Being Moderated
    I just tried this command:
    set escape \
    which uses '\' as escape character.

    After I ran this command, I tried this:
    SQL> select replace('OSG-1;OSG-1;DFWRW\DFWW\&SDFWW;SDFS~1~^DFSF',chr(38),'') from dual;

    REPLACE('OSG-1;OSG-1;DFWRWDFWW&SDFWW;SD
    ---------------------------------------
    OSG-1;OSG-1;DFWRWDFWWSDFWW;SDFS~1~^DFSF

    So I get it works. But as Sven pointed out, users might not understand why some characters are reserved and not allowed (don't worry, I always get their agreement before changing specs). Another problem is, what if user enters the '\' as part of data string? Therefore I'm now requesting to reserve both '\' and '&' as special characters, not sure what else I need to reserve in future though.
  • 11. Re: Remove special characters from string using regexp_replace
    Yasu Newbie
    Currently Being Moderated
    Hi Volder,

    It would great helpful for me if you can help me in below mentioned thread, where i used the function provided by you, but its not giving the desired output.

    Re: Removing non-printable characters

    Thanks,
    Yasser
  • 12. Re: Remove special characters from string using regexp_replace
    817616 Newbie
    Currently Being Moderated
    Hi,

    This will remove all special charcters and spaces in the string.

    select regexp_replace('This is a test .,_-$%&^& ', '[^0-9A-Za-z]', '') from dual;
  • 13. Re: Remove special characters from string using regexp_replace
    714088 Newbie
    Currently Being Moderated
    I hate to dig up an old post but this might have saved me a couple of hours so maybe I can do the same for someone else.

    In response to the original question: Despite some misleading documentation about the internet, oracle does not support the \x1D hexidecimal switch (which would have allowed for a nice and neat regexp of "[^\x20-\x7E]". In fact according to the documentation below, no escape sequences at all are understood in a character set (or list as oracle calls them):

    The following regular expression operators are allowed within the character list, any other metacharacters included in a character list lose their special meaning (are treated as literals):

    Range operator '-'
    POSIX character class [::]
    POSIX collating sequence [. .]
    POSIX character equivalence class [= =]
    (http://download.oracle.com/docs/cd/B14117_01/appdev.101/b10795/adfns_re.htm#1007566)

    However, it does appear that the range operator does support ranges over the entire language (assuming your NLS_SORT is binary I think). This simple regex worked for me:

    select regexp_replace('This is a test '||chr(191), '[^ -~]', '$') from dual;

    In other words, all characters between space and tilde which meets the requirement of chars 32-127. I realize a few chars in that set were added on but there's no reason that the range couldn't be split into multiple ranges, excluding those couple of chars.
  • 14. Re: Remove special characters from string using regexp_replace
    714088 Newbie
    Currently Being Moderated
    By the way, in my case I wanted to allow the entire ascii set so (including tabs, cr, lf, etc):

    select regexp_replace('This is a test crlf'||chr(10)|| 'blah', '[^' || chr(1) ||'-~]', '$') from dual;
1 2 Previous Next