2 Replies Latest reply: Sep 28, 2012 12:53 PM by Sergiusz Wolicki-Oracle RSS

    Western European Characterset to Turkish in sql

    962612
      Hello,

      I am having a serious issue with character encoding. I know I asked a similar question some days back but the requirement is different here.

      To give some background:

      1. I have turkish business users who enter some data on Unix screens in Turkish language.
      2. My database NLS parameter is set to AMERICAN, WE8ISO8859P1 and Unix NLS_LANG to AMERICAN_AMERICA.WE8ISO8859P1.
      3. Turkey business is able to see all the turkish characters on UNIX screens and TOAD while I'm not. I can only see them in Western European Character set.

      At business end: ÖZER İNŞAAT TAAHHÜT VE
      At our end : ÖZER ÝNÞAAT TAAHHÜT VE

      If you notice the turkish characters İ and Ş are getting converted to ISO 8859-1 character set. However, all the settings(NLS paramaters in db and unix) are same at both end- ISO8859-1(Western European)

      With some study, I can understand - Turkish machines can display turkish data by doing conversion in real-time(DB NLS settings are overridden by local NLS settings).

      Now, I have a interface running in my db- have some PL/SQL scripts(run through shell script) that extracts some data from database and spool them to a .csv file on a unix path. Then that .csv file is transferred to an external system via MFT(Managed File transfer).

      The problem is- Exract never conains any turkish character. Every turkish character is getting converted into Western European Characterset and goes like this to the external system which is treated as a case of data conversion/loss and my business is really unhappy.

      Could anyone tell me - How could I retain all the turkish characters?

      P.S. : External System's characterset could be set to ISP8859-9 charcterset.

      Many thanks in advance.
        • 1. Re: Western European Characterset to Turkish in sql
          orafad
          Database version?
          pps7sep wrote:
          2. My database NLS parameter is set to AMERICAN, WE8ISO8859P1 ...
          If you notice the turkish characters İ and Ş are getting converted to ISO 8859-1 character set. However, all the settings(NLS paramaters in db and unix) are same at both end- ISO8859-1(Western European)
          The characters capital/small letter I with dot above, S with cedilla and G with breve, which are typical for turkish alphabet, does not exist in the character repertoire of 8859-1.

          So basically the chosen database character set is not suitable for the language/alphabet requirements or expectations.

          http://czyborra.com/charsets/iso8859.html
          http://en.wikipedia.org/wiki/ISO/IEC_8859-9


          Use Oracle SQL Developer for querying character data, because it is somewhat immune to character conversion issues, as compared to e.g. Toad.

          Use something like the following to verify what character data is actually stored -- the key here is dump() function:
          select column, dump(column, 1016) from table where suitable_condition ... ; 
          With some study, I can understand - Turkish machines can display turkish data by doing conversion in real-time(DB NLS settings are overridden by local NLS settings).
          Very likely, what you see is a "side effect" of the pass-through configuration (aka garbage in, garbage out).

          An explanation if this type of incorrect setup is given in the following link, or if you search this forum for "pass-through".

          http://www.oracle.com/technetwork/products/globalization/nls-lang-099431.html#_Toc110410550 (An example of a wrong setup)

          The problem is- Exract never conains any turkish character. Every turkish character is getting converted into Western European Characterset and goes like this to the external system which is treated as a case of data conversion/loss and my business is really unhappy.
          Your shell script connects to database, how? I'm guessing the client part of NLS_LANG for shell environment is either set/defaults to US7ASCII or is set to WE8ISO8859P9

          However, the basic issue is that application does not legally store any turkish (specific) characters in the database, per character set in use.

          Could anyone tell me - How could I retain all the turkish characters?
          Quick "fix": continue with pass-through setup, ignoring character set definition.

          Solution:
          - Change database character set to AL32UTF8, which supports most common languages -- but such migration might not be doable in the short term.
          - Change to ISO 8859-9 corresponding database character set WE8ISO8859P9 (or win-1254 corresponding TR8MSWIN1254) -- also takes a migration effort and some required other language characters may be missing from char set repertoire e.g. for non-Turkish users.

          http://docs.oracle.com/cd/E11882_01/server.112/e10729/applocaledata.htm#NLSPG585

          Edited by: orafad on Sep 27, 2012 10:13 PM
          • 2. Re: Western European Characterset to Turkish in sql
            Sergiusz Wolicki-Oracle
            ## 2. My database NLS parameter is set to AMERICAN, WE8ISO8859P1 and Unix NLS_LANG to AMERICAN_AMERICA.WE8ISO8859P1.

            This is a pass-through configuration. WE8ISO8859P1 does not support Turkish.

            ## 3. Turkey business is able to see all the turkish characters on UNIX screens and TOAD while I'm not. I can only see them in Western European Character set.

            This is because their terminal/terminal emulator is configured for Turkish, most probably iso8859-9 (aka WE8ISO8859P9), while your terminal is not. But the data in the database is incorrect anyway as it is declared as WE8ISO8859P1.

            ## With some study, I can understand - Turkish machines can display turkish data by doing conversion in real-time(DB NLS settings are overridden by local NLS settings).

            Turkish machines just display the same codes as your machine but they show the codes in a Turkish font and your machine shows it in a Western European font.

            ## The problem is- Exract never conains any turkish character. Every turkish character is getting converted into Western European

            If you extract the data with NLS_LANG=.WE8ISO8859P1, recreating the pass-through configuration, the text files does contain proper data. But you need a Turkish system to view this data.

            ## Could anyone tell me - How could I retain all the turkish characters?

            The simplest solution is to view the generated files on a Turkish system, but the recommended solution would be to migrate the customer's database to Unicode (AL32UTF8) using the Database Migration Assistant for Unicode (which can solve the problem of WE8ISO8859P9 data declared incorrectly as WE8ISO8859P1). Or at least, CSSCAN+CSALTER.PLB should be used to repair the DB character set declaration by changing it from WE8ISO8859P1 to WE8ISO8859P9 without touching the data itself. Turkish users could then configure their NLS_LANG as expected by their terminals (I assume WE8ISO8859P9). You can spool your data in UTF-8 or UTF-16 (Unicode) and then view it with Notepad or other Unicode-capable editor/viewer, even on a US system.

            -- Sergiusz