regexp_replace and chr versus characterset

Jul 8, 2018 1:18PM

I'm moving from Oracle 11 to 12 and have hit on something I don't understand.

In my Oracle 11 database where NLS_CHARACTERSET=US7ASCII I use regexp_replace and chr to clean up bad user input, as illustred by this simple query.

SELECT temp, REGEXP_REPLACE (temp, '[' || CHR (1) || '-' || CHR (31) || CHR (127) || '-' || CHR (255) || ']', '') temp_norm

FROM (SELECT 'DRAFTING-STAN' || CHR (9) || 'DARD' temp FROM DUAL)

Running this in Oracle 11 shows how TEMP gets properly cleaned up to DRAFTING-STANDARD (tab character is removed)

When I run this in Oracle 12 where NLS_CHARACTERSET=AL32UTF8 the replace is more aggressive. The result for temp_norm is DRAFTINGSTANDARD (tab

SQL Language (MOSC)