This discussion is archived
6 Replies Latest reply: Sep 25, 2012 5:15 AM by Srini Chavali-Oracle RSS

Convert Chinese Character from US7ASCII to AL32UTF8 issue

927417 Newbie
Currently Being Moderated
Source Database: AMERICAN_AMERICA.US7ASCII
Target Database: AMERICAN_AMERICA.AL32UTF8

From the source database, the chinese characters are stored in some schema table. From the csscan result, there are convertiable, truncate, data lossy character. So, I have tried to use exp/imp for the conversion. However, all chinese characters are invalided and cannot be read anymore. How can I convert them from US7ASCCI to UTF8 database?

Also, I have tried build up another database with AMERICAN_AMERICA.ZHT16MSWIN950. The exp/imp is used for conversion again. The chinese characters are readable in AL32UTF8 database.

- source database (US7ASCII)
export NLS_LANG=AMERICAN_AMERICA.US7ASCII
export LANG=AMERICAN_AMERICA.US7ASCII
exp userid='/ as sysdba' file=export.dmp full=y

- target database (AL32UTF8)
export NLS_LANG=AMERICAN_AMERICA.US7ASCII
export LANG=AMERICAN_AMERICA.US7ASCII
imp userid='/ as sysdba' file=export.dmp full=y ignore=y


Result:
from US7ASCII to AL32UTF8:
the chinese characters cannot be read


from US7ASCII to ZHT16MSWIN950:
the chinese characters cannot be read


from ZHT16MSWIN950 to AL32UTF8:
the chinese characters can be read

How can I convert the chinese character from US7ASCCI to UTF8 database?
  • 1. Re: Convert Chinese Character from US7ASCII to AL32UTF8 issue
    Zoltan Kecskemethy Expert
    Currently Being Moderated
    You cannot store Chinese characters in US7ASCII. You need to fix your database character set first.
    I guess you have invalid data under US7ASCII. What is your database version?

    I had similar issue in a 10g database. Our US7ASCII db had WE8MSWIN1252 characters in it. I fixed it this way, (basically convert DB charset without actuall conversion so trick the tools)
    You need to use csscan with FROMCHAR=WE8MSWIN1252 and TOCHAR=WE8MSWIN1252 parameters and create clean csscan result tables set so you can use csalter.plb to correct database character set. Because csalter checks these result tables and won’t work until you reach a clean state. So here it is a full example how I used csscan:
    oracle@salamander:~ $ csscan FULL=Y FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252 LOG=csscan_from_win1252_to_win1252_IRIS21 CAPTURE=Y ARRAY=1000000 PROCESS=2
    from ZHT16MSWIN950 to AL32UTF8:
    the Chinese characters can be read
    because this is a valid conversion for your case.

    Edited by: Zoltan Kecskemethy on Sep 17, 2012 4:43 AM
  • 2. Re: Convert Chinese Character from US7ASCII to AL32UTF8 issue
    Amar_Singh Newbie
    Currently Being Moderated
    Basically the correct method for performing charset upgrade is to use tools like csscans.

    Run csscan on the existing database and see which all data falls under truncation and loosy datatype. You need to fix those data before doing actual migration.

    Truncation data can be corrected by increasing the width of the existing table columns so as to accomodate the width increase in columns when going from single byte charset US7ASCII to multibyte charset like UTF8.

    However its the loosy data that causes a problem. Since its due to existing improper chatset like chinese characters which are being stored in US7ASCII and hence these loosy characters cannot be migrated.

    So there can be couple of options:
    1) Either to correct those loosy data manually or truncate those data and reconstruct it after going to UTF8 charset.
    2) Just import the data into UTF8 database with NLS_LANG set to US7ASCII and while trying to read the data from application there also set NLS_LANG to UTF8 or US7ASCII and see if it works.


    Thanks
    Amar
  • 3. Re: Convert Chinese Character from US7ASCII to AL32UTF8 issue
    Sergiusz Wolicki (Oracle) Expert
    Currently Being Moderated
    Upgrade to 11.2.0.3, if you are not already running on this version, and use the Database Migration Assistant for Unicode (http://www.oracle.com/technetwork/products/globalization/dmu/overview/index.html) to do the migration. Review the documentation of the tool (http://docs.oracle.com/cd/E15847_01/welcome.html), especially the chapter on cleansing. You will be able to use the Assumed Database Character Set or the Assumed Column Character Set properties to declare your US7ASCII data as being in a Chinese character set, which you have to identify first.


    -- Sergiusz
  • 4. Re: Convert Chinese Character from US7ASCII to AL32UTF8 issue
    927417 Newbie
    Currently Being Moderated
    Thanks for all adviced.

    The current database is 10.2.0.4. The client does not want to upgrade their database verison yet.

    The chinsese character is stored in US7ASCCI database correctly. All applications can show the chinese word from databases. It does not have any problem in it. Basically, I have run the csscan already. There were convertible characters which should not delete or modify due to client information. This methods cannot be worked in my case.

    In this moment, I have found another solution to exp / imp from US7ASCII to AL32UTF8. I have changed the character set of INTERNAL_USE to ZHT16MSWIN950 and set NLS_LANG to AMERICAN_AMERICA.ZHT16MSWIN950 before export. The import is successful and all chinese characters are store correctly. We can view all chinese in SQL Developers under UTF8 database. We are testing the application now. Please correct me the below method if there is any issue.


    source database (US7ASCII)
    SQL> shutdown immediate
    SQL> startup restrict;
    SQL> alter database character set internal_use ZHT16MSWIN950;
    SQL> exit
    export NLS_LANG=AMERICAN_AMERICA.ZHT16MSWIN950
    exp userid='/ as sysdba' file=export.dmp full=y statistics=n
    SQL> shutdown immediate
    SQL> startup restrict;
    SQL> alter database character set internal_use US7ASCII;
    SQL> shutdown immediate
    SQL> startup

    target database (AL32UTF8):
    export NLS_LANG=AMERICAN_AMERICA.ZHT16MSWIN950
    imp userid='/ as sysdba' file=export.dmp full=y ignore=y
  • 5. Re: Convert Chinese Character from US7ASCII to AL32UTF8 issue
    Sergiusz Wolicki (Oracle) Expert
    Currently Being Moderated
    SQL> alter database character set internal_use ZHT16MSWIN950;
    You have just corrupted all CLOB values in your US7ASCII database.


    -- Sergiusz
  • 6. Re: Convert Chinese Character from US7ASCII to AL32UTF8 issue
    Srini Chavali-Oracle Oracle ACE Director
    Currently Being Moderated
    Alex Kwok wrote:
    Thanks for all adviced.

    The current database is 10.2.0.4. The client does not want to upgrade their database verison yet.

    The chinsese character is stored in US7ASCCI database correctly. All applications can show the chinese word from databases. It does not have any problem in it. Basically, I have run the csscan already. There were convertible characters which should not delete or modify due to client information. This methods cannot be worked in my case.

    In this moment, I have found another solution to exp / imp from US7ASCII to AL32UTF8. I have changed the character set of INTERNAL_USE to ZHT16MSWIN950 and set NLS_LANG to AMERICAN_AMERICA.ZHT16MSWIN950 before export. The import is successful and all chinese characters are store correctly. We can view all chinese in SQL Developers under UTF8 database. We are testing the application now. Please correct me the below method if there is any issue.


    source database (US7ASCII)
    SQL> shutdown immediate
    SQL> startup restrict;
    SQL> alter database character set internal_use ZHT16MSWIN950;
    SQL> exit
    export NLS_LANG=AMERICAN_AMERICA.ZHT16MSWIN950
    exp userid='/ as sysdba' file=export.dmp full=y statistics=n
    SQL> shutdown immediate
    SQL> startup restrict;
    SQL> alter database character set internal_use US7ASCII;
    SQL> shutdown immediate
    SQL> startup

    target database (AL32UTF8):
    export NLS_LANG=AMERICAN_AMERICA.ZHT16MSWIN950
    imp userid='/ as sysdba' file=export.dmp full=y ignore=y
    See my advice in this related thread - i have a problem with arabic language

    You will need to restore the database from a good backup taken before this command was issued and follow the official documented steps required to change the characterset of the database.

    HTH
    Srini

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points