This discussion is archived
8 Replies Latest reply: Apr 6, 2009 10:40 AM by Sergiusz Wolicki (Oracle) Branched to a new discussion. RSS

Hebrew Characters...Chars display as junk after import from 8i to 10g

694311 Newbie
Currently Being Moderated
Gurus,
I have a problem with a customer upgrade...

Background of the issue... the customer is an Agile PLM customer of version 8.5 (Agile PLM version 8.5). The database was hosted on oracle 8i. He is intending to upgrade from Agile 8.5 to Agile 9.2.2.4. During this process he has upgraded he db from Agile 8.5 to Agile 9.2.2.4, and has also shifted the DB platform from 8i to 10g.

Problem: There were hebrew characters entered in Varchar2 columns (Oracle 8i), which after upgrade are not displaying correctly. Newly entered hebrew characters after upgrade display correctly in UI...

Customer DB Parameters : The nls parameters on the source db before upgrade(8i) are American_America.WE8ISO8859P1, and the destination db parameters are American_America.UTF8.

What i have done to deal with the issue: i have tried exporting the db using UTF8 and importing the db to 10g on UTF8, but still the characters show as garble characters..., have tried various options of exporting/importing using the combinations of WE8ISO8859P1 char set as well as IW8ISO8859P8 charsets, as i have learnt during my research abt the charsets that Hebrew Characters are supported in IW8ISO8859P8 charset and not WE8ISO8859P1. My suspicion here is that the problem is with the export and import from 8i to 10g, and the Char conversion which is happening during this process..(this is my guess and i might be wrong tooo...)

Currently this is a hot issue with the customer, and needs an immediate fix (to display the Hebrew characters properly after upgrade)

I am a layman on the NLS Settings and couldnt figure out what else to do....I would request all the Gurus out there to help us figure out the problem and try resolve it...

Thanks for your Help in Advance
Regards,
Raja.
  • 1. Re: Hebrew Characters...Chars display as junk after import from 8i to 10g
    Justin Cave Oracle ACE
    Currently Being Moderated
    user774542 wrote:
    Customer DB Parameters : The nls parameters on the source db before upgrade(8i) are American_America.WE8ISO8859P1, and the destination db parameters are American_America.UTF8.
    It looks like you are posting the client NLS_LANG. What is the database character set of the source and destination
    SELECT *
      FROM v$nls_parameters
     WHERE parameter like '%CHARACTERSET'
    What i have done to deal with the issue: i have tried exporting the db using UTF8 and importing the db to 10g on UTF8, but still the characters show as garble characters..., have tried various options of exporting/importing using the combinations of WE8ISO8859P1 char set as well as IW8ISO8859P8 charsets, as i have learnt during my research abt the charsets that Hebrew Characters are supported in IW8ISO8859P8 charset and not WE8ISO8859P1. My suspicion here is that the problem is with the export and import from 8i to 10g, and the Char conversion which is happening during this process..(this is my guess and i might be wrong tooo...)
    If the original database is using ISO-8859-1 as the database character set, you've got a serious problem because that character set doesn't support Hebrew. Since Hebrew characters are not valid ISO-8859-1 characters, any conversion to any other character set is going to fail.

    Justin
  • 2. Re: Hebrew Characters...Chars display as junk after import from 8i to 10g
    Sergiusz Wolicki (Oracle) Expert
    Currently Being Moderated
    In case the source database character set is WE8ISO8859P1, you should identify the actual character set of data. If the clients used to enter data are on the Windows platform, then the character set is most probably IW8MSWIN1255. After this check, you should set NLS_LANG to the database character set (like NLS_LANG=.WE8ISO8859P1) and spool your data into text files. Then, you should load the files with SQL*Loader, specifying the real character set (e.g. IW8MSWIN1255) in the control file.


    -- Sergiusz
  • 3. Re: Hebrew Characters...Chars display as junk after import from 8i to 10g
    694311 Newbie
    Currently Being Moderated
    Justin,

    The following are the database paramters for Source
    NLS_CHARACTERSET = WE8ISO8859P1
    NLS_NCHAR_CHARACTERSET = WE8ISO8859P1

    and at the destination the following are the parameters.
    NLS_CHARACTERSET = UTF8
    NLS_NCHAR_CHARACTERSET = UTF8

    You have mentioned that if the original database is using ISO-8859-1 as the characterset then there is a serious issue. But with the DB settings on the original database, it seems like it is using the ISO-8859-1 character set only. In such a case, is there any way where we can try to import the data into UTF8.

    Btw if i create a new database using the database settings of the source(original 8i) by setting the NLS_CHARACTERSET to WE8ISO8859P1, and then import the dmp file to the new instance on 10g, would it help retain the characters..

    Regards,
    Raja.
  • 4. Re: Hebrew Characters...Chars display as junk after import from 8i to 10g
    694311 Newbie
    Currently Being Moderated
    Serguisz,
    The data was entered using client(written in c++) from a windows machine.. You have mentioned that i should try to get the client characterset, if you can please provide me with the steps to get the details of Client Characterset, then it would be helpful. In the mean time i will try to follow the suggestion provided by you and see if it helps.

    Thanks & Regards,
    Raja.
  • 5. Re: Hebrew Characters...Chars display as junk after import from 8i to 10g
    Justin Cave Oracle ACE
    Currently Being Moderated
    Hebrew characters aren't supported using the ISO 8859-1 character set. In the original system, what must be happening is that the NLS_LANG on the client matches the database character set, which tells the Oracle client not to do character set conversion. This causes Oracle to treat character data just as a binary data stream and to store whatever bits the client sends. So long as the client is really sending ISO 8859-8 data, telling Oracle it is ISO 8859-1, and on the return side asking Oracle for ISO 8859-1 and treating it internally as ISO 8859-8 while Oracle is doing no character set conversions, this will appear to work. But the data in the database is fundamentally wrong. You can configure things so that you can, apparently, store Chinese data in a US7ASCII database using this sort of approach, but you end up with screwed up data.

    This sort of problem usually comes to light when you start using tools (like export) that don't know how to mis-identify the data they are sending and retrieving like your application is doing or when character set conversion is necessary. Since the data in the database isn't valid ISO 8859-1, Oracle has no idea how to translate it to UTF8 properly.

    As previously suggested, the safest option is to move the data with a solution that replicates the behavior of the application. So
    - Set the client NLS_LANG to match the database character set (WE8ISO8859P1)
    - Extract the data to a flat file using SQL*Plus or a C/C++ application
    - This data file will, presumably, really be ISO 8859-8 encoded
    - Use SQL*Loader to load the data into the UTF8 database, specifying the actual character set (ISO 8859-8) in the control file.

    If you're rather more adventurous and working with Oracle Support, it is potentially possible to change the character set of the source database from ISO 8859-1 to ISO 8859-8 and then export and import will work properly. But this would require some undocumented options that shouldn't be used without Oracle Support being involved and introduces a number of risks that the flat file option avoids.

    Justin
  • 6. Re: Hebrew Characters...Chars display as junk after import from 8i to 10g
    Sergiusz Wolicki (Oracle) Expert
    Currently Being Moderated
    Raja,


    Justin described the situation that you have in your database. My only correction is that the data is most probably not in ISO 8859-8 (IW8ISO8859P8) but in Windows Code Page 1255 (IW8MSWIN1255). Except for two codes, ISO 8859-8 is a subset of Windows Code Page 1255.

    With a C++ application on Windows, Hebrew data in your WE8ISO8859P1 database may be theoretically encoded in IW8MSWIN1255, AL32UTF8, or AL16UTF16LE. The probability is >95% that the real character set is IW8MSWIN1255, <4% that it is AL32UTF8 and <1% that it is AL16UTF16LE (these numbers are just my assumption).

    To verify the real character set, select a random value that you see correctly on the screen -- one that preferably but not necessarily contains a Sheqel (₪) or a Euro (€) sign -- locate this value in the database and select it using SELECT DUMP(column,1016) FROM table WHERE ... Post the result here.

    Note, if your data is in IW8MSWIN1255, it will expand during loading into an AL32UTF8 or UTF8 database. This is because Hebrew characters occupy two bytes each in UTF-8. You need to make sure that the target columns are large enough to accept the longer values.


    -- Sergiusz
  • 7. Re: Hebrew Characters...Chars display as junk after import from 8i to 10g
    694311 Newbie
    Currently Being Moderated
    Justin/Sergiusz,
    Thanks for your posts... The explanation you have provided is clear and in detail... that was helpful...

    Regarding the Agile Product, Agile PLM is an oracle Product, and i work for Oracle development in the upgrade and schema management of Agile PLM. This issue has been escalated to me as a Bug in the upgrade, which now looks like a wrong db settings of the customer which lead to this scenario.

    In reply to Justins suggestion, exporting the data to flat files and then reimporting them to the db with the correct db setttings.... i will give it a try. Tried to replicate the customer environment back here in house, but couldnt successfully import the characters(Problem is with the export dmp file, and all the hebrew characters are lost during the translation.) They have exported the db in UTF8 and currently i have requested them to export the db using the NLS Settings of the customer db. Will try to import the db inhouse and try out the suggestions give by you folks..

    In the mean time, i have a few questions... I have been reading about the CSSCAN and CSALTR routines... do you folks think that this would be helpful to us in the current scenario....

    As suggested by Justin, i have requested the Support team to escalate a bug to the DB Support team... but in the mean time i would like to try out different options... and see if we can resolve the defect. Your guidence would help me a lot in this matter and the customer too....

    At the end of the day we can see a happy customer....

    Justin/Sergiusz, Do you know of any contact in the DB NLS Team(Dev/Support) who can help me out to do a webex/conference with the customer and help us on this issue...

    Thanks & Regards,
    Raja
  • 8. Re: Hebrew Characters...Chars display as junk after import from 8i to 10g
    Sergiusz Wolicki (Oracle) Expert
    Currently Being Moderated
    Raja,


    I am in Oracle NLS Development consulting, DB technical lead. Please, contact me through e-mail with the SR and bug details (you can find me in Aria).


    -- Sergiusz

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points