This content has been marked as final. Show 6 replies
We are preparing to change our character set from us7ascii to al32utf8 on 18.104.22.168 HP-UX. The majority of the "lossy" rows identified contain accented and umlauted vowels. I have tried to change the accented and umlauted characters to their "plain" counterparts (eg, accented a to 'a'). I've tried several variations of the replace and translate functions with no succcess. One of the statements I was "sure" that it would work follows - what am I doing wrong?
update tbl1 set desc_fld = regexp_replace(desc_fld, '[=a=]', 'a');
<tt> [=a=] </tt> means "any variation on the letter a" only when it's inside square-brackets; otherwise, it looks like you want the set of characters consisting of '=', 'a' and '='.
update tbl1 set desc_fld = regexp_replace (desc_fld, '[[=a=]]', 'a');
Thanks for the reply - I was missing the "extra" brackets but I tried:
SQL> update je_ddet set debit_memo = regexp_replace(debit_memo, '[[=a=]]', 'a');
237273 rows updated.
SQL> select debit_memo from je_ddet where rowid = 'AACL6QACjAAAqgrAAZ';
recl trsf à BCV 08.04.09
It still has the accented a.
As far as i understand you have data corruption in your ascii7_DB. You stored 2-byte characters in a 1-byte characterset. This seems to work as the characters were dsiplayed correctyl, but know you are facing some problems.
I cant reproduce the issue it because i dont have a ascci7-db running.
So i might be that those nls-based functions dont really work on the corrupted data.
I would examine the dumps of the data (dump(debit_memo, 1010)) in both environments.
Perhaps something on raw-level is possible.
Since your current character set is US7ASCII, it may not recognize characters with codes above 127 as members of pre-defined set, such as [=a=].
Is it possible to type all of the accented characters? If so, you could do something like:
Don't test on your full table with 200,000 rows. Create a small test table and experiment on it, first.
... SET debit_memo = TRANSLATE ( debit_memo , 'âäàåáé...' , 'aaaaae...' )
Why replacing? Why not preserving the characters? Use the Database Migration Assistant for Unicode to perform the migration and to solve (cleanse) convertibility issues. You can use the Assumed Database Character Set property to declare the actual character set of your data (usually WE8MSWIN1252 for Western European languages) to use for migration.