9 Replies Latest reply: Aug 9, 2012 4:35 AM by Sergiusz Wolicki-Oracle RSS

    Storing Chinese characters with WE8ISO8859P1 character set?

    766913
      Hi all,

      I have a customer which is having problems with fetching and displaying Chinese characters in the client application which is running on a Citrix server. Instead of "蹕庌煉蛁扞撙1g 1" the "ÂÞÊÏ·Ò×¢Éä¼Á1g 1" is returned. Microsoft Oracle Data Provider is used (.NET 2.0) for fetching data.
      I am trying to reproduce the issue on my machine but I am not successful. The problem is that I cannot store the Chinese characters into my db. The data type of the column is VARCHAR2(240), same as in the customer's db.

      I set up the following environment:
      I am running Oracle 9.2.0.8 on Windows Server 2003, my client machine is Windows XP. The regional settings on WinXP is set to Chinese (PRC).

      HKLM\SOFTWARE\ORACLE\HOME0 registry key value = AMERICAN_AMERICA.WE8MSWIN1252

      select * from nls_database_parameters

      NLS_NCHAR_CHARACTERSET     AL16UTF16
      NLS_LANGUAGE     AMERICAN
      NLS_TERRITORY     AMERICA
      NLS_CHARACTERSET     WE8ISO8859P1

      All these settings are the same as the customer's settings.

      Now if i try to store "蹕庌煉蛁扞撙1g 1" to the db, only "¿¿¿¿¿¿1g1" is stored. Running select dump(column_name), column_name from table_name; returns Typ=1 Len=9: 191,191,191,191,191,191,49,103,49. So only the last three characters are actually correctly inserted. So how can I store Chinese characters into field with WE8ISO8859P1 charset? How could the customer store the Chinese characters???

      And I must not forget to mention - the old application written in VB6 and using RDO connection which fetches the same data displays the Chinese characters OK.

      Any hep would be appreciated. If I need to provide any other information let me know. Thank you!

      Edited by: Vklop on Aug 6, 2012 11:37 PM
        • 1. Re: Storing Chinese characters with WE8ISO8859P1 character set?
          952768
          It depends on your client NLS_LANG setting.

          Link: http://docs.oracle.com/cd/E11882_01/server.112/e10729/ch11charsetmig.htm

          'Invalid Data That Results from Setting the Client's NLS_LANG Parameter Incorrectly

          If the database character set is WE8ISO8859P1 and the NLS_LANG setting of the Chinese Windows NT client is SIMPLIFIED CHINESE_CHINA.WE8ISO8859P1, then all multibyte Chinese data (from the ZHS16GBK character set) is stored as multiples of single-byte WE8ISO8859P1 data. This means that Oracle treats these characters as single-byte WE8ISO8859P1 characters. Hence all SQL string manipulation functions such as SUBSTR or LENGTH are based on bytes rather than characters. All bytes constituting ZHS16GBK data are legal WE8ISO8859P1 codes.'
          • 2. Re: Storing Chinese characters with WE8ISO8859P1 character set?
            Srini Chavali-Oracle
            >
            ...
            NLS_CHARACTERSET WE8ISO8859P1
            ...
            >

            This database characterset is not capable of storing Chinese characters - the client process is somehow managing to store the characters in an unsupported manner

            HTH
            Srini
            • 3. Re: Storing Chinese characters with WE8ISO8859P1 character set?
              766913
              Thanks for the responses guys.

              @SpecDev
              The NLS_LANG is set to AMERICAN_AMERICA.WE8MSWIN1252 so the conversion does occur.

              I also tried the procedure you mentioned - set the nls_characterset and nls_lang to WE8ISO8859P1 so that no conversion would be done and each multibyte chinese character would be saved as single byte WE8ISO8859P1 character, but I still had no success.

              @Srini Chavali
              If I understand correctly, they could be stored if nls_lang and characterset is the same so no conversion is made. And if you query such a value with TOAD it would display it in Chinese because the Regional Settings for non-Unicode programs is set to Chinese (PRC) - correct me if I am wrong.
              It could be that the application which inserted the values is doing some additional logic, but I am not so sure about this. I will try to check this. But I still don't understand why the old VB6 application with RDO fetches Chinese characters without a problem and it's doing just SELECT FROM... and no additional logic.
              • 4. Re: Storing Chinese characters with WE8ISO8859P1 character set?
                952768
                I think ODP.net will not use the NLS_LANG setting from the registry.

                Is it possible to do this in your code of the application?

                Something like:

                Environment.SetEnvironmentVariable("NLS_LANG",
                "CHINESE_CHINA.WE8ISO8859P1",
                EnvironmentVariableTarget.Process)

                Or maybe setting NLS_LANG as default environment variable.
                • 5. Re: Storing Chinese characters with WE8ISO8859P1 character set?
                  Sergiusz Wolicki-Oracle
                  I do not know which method Microsoft Oracle Data Provider uses but if it works similarly to ODP.NET, then it uses OCI UTF-16 mode for connection. In this mode, NLS_LANG character set is irrelevant and conversion to/from UTF-16 always takes place. Anyway, unless the customer stores the data in NVARCHAR2 columns, then a WE8ISO8859P1 database is not supposed to store Chinese and you do not have to worry too much about supporting such database. The customer should migrate the database to AL32UTF8.


                  -- Sergiusz
                  • 6. Re: Storing Chinese characters with WE8ISO8859P1 character set?
                    766913
                    I'll first try the option that specdev suggested and I'll see if it makes a difference. I hope it works, because I don't think that the customer would be happy with migration with to AL32UTF8 :)
                    • 7. Re: Storing Chinese characters with WE8ISO8859P1 character set?
                      orafad
                      Vklop wrote:
                      ... I don't think that the customer would be happy with migration with to AL32UTF8 :)
                      Would they be happy with having their data corrupted (stored in illegal manner); not being able to use the data from any tool, not being able to do proper string manipulation, etc.?

                      Besides steering clear of the 'gi-go' scenario, moving to a Unicode database would be a way to prepare for future requirements (and next versions of the Database, where AL32UTF8 likely will be the natural/common choice).
                      • 8. Re: Storing Chinese characters with WE8ISO8859P1 character set?
                        952768
                        There is no common solution in this cases. Most of the time you see that the nls_lang was set the same as the database nls settings. There was no remapping/conversion of characters. The same characters come back the way they were stored (without NLS conversion). In such a situation a standard migration on database level will corrupt your data. So there is a lot of research needed to migrate to AL32UTF8. It could and should be done but it is not easy in such a case.
                        • 9. Re: Storing Chinese characters with WE8ISO8859P1 character set?
                          Sergiusz Wolicki-Oracle
                          For 9.2.0.8, there is no good solution, indeed. But if the database is upgraded to 11.2.0.3, the Database Migration Assistant for Unicode can also convert from a garbage-in garbage-out configuration, provided multiple character sets are not mixed in a single column.


                          -- Sergiusz