I have several Oracle 8i databases on a HP-UX platform.
I would like to upgrade to 11g, running on RHEL6. I will probably use exp/imp for this.
Obviously there are many things to consider when planning this, but at the moment I am at the very start of my testing, pre-export, and I am trying to get my head around character sets.
Currently, I have the following character sets on my 8i DBs.
LS_CHARACTERSET = WE8ISO8859P1
NLS_NCHAR_CHARACTERSET = US7ASCII
I've used CSSCAN to try and identify data that could be troublesome when completing the export/import. The 8i version of CSSCAN doesn't know about AL32UTF8, so I attempted to use tochar=UTF8 and that resulted in some lossy conversions, which I think is because UTF8 is not a superset of WE8ISO8859P1 (Please correct me if I am wrong).
What I don't understand is if I use tochar=WE8ISO8859P1, there is still the same amount of lossy data. If the DB is in WE8ISO8859P1, and tochar is set to WE8ISO8859P1, surely there can be no lossy data? Clearly I am mistaken somehow?
I found a presentation (http://nyoug.org/Presentations/2009/Li_Unicode.pdf), which suggests using "fromchar=WE8MSWIN1252 tochar=WE8MSWIN1252", which sure enough, results in no lossy conversions.
Is this a good idea? If I'm not using WE8MSWIN1252 in my 8i DB, is using "fromchar=WE8MSWIN1252" not going to falsely interpret the data? I believe WE8MSWIN1252 is a superset of WE8ISO8859P1, so maybe not? But I don't understand why this seems to work yet using "fromchar=WE8ISO8859P1 tochar=WE8ISO8859P1" does not, even though the database is using the WE8ISO8859P1 character set.
Can anyone shed some light on this, or let me know the best way foward?
Thanks for any help on this,
Section C in metalink article 444701.1, in combination with the discussion at http://www.freelists.org/post/oracle-l/Replacement-of-US7ASCII-character-set-in-11g , helped me understand the problems I am having a lot more. ie. Why "fromchar=WE8ISO8859P1 tochar=WE8ISO8859P1" was causing lossy conversion errors even though that was the character set I was using. Probably whoever/whatever entered this data was using a different client that the DB is in.
I'm not to happy about the massive amount of work to correct such a problem, but at least I think I understand.
I think I've answered my own questions from my original post. But if anyone has any personal experience of any bright ideas I can use for my testing, I'd still be interested in hearing it.
I'm not sure why you are stressed out about this - export/import utilities automagically take care of any characterset conversions. Are you planning to use a different characterset for your 11gR2 database ?
It may be lack of understanding on my behalf. I thought there may be data I needed to manually change as it's not recored in the current character set correctly, I may be wrong.
Originally, I wanted the 11g DB to use the AL32UTF8 or a similar more 'advanced' character set than WE8ISO8859P1 so there would not be user-inputted characters that the character set didn't know.
I have found by opening the .err file from CSSCAN (from when I used "fromchar=WE8ISO8859P1 tochar=WE8ISO8859P1") in Notepad++ (using Windows-1252 char set) that 95% of the 'lossy conversions' are euro signs. So I believe using WE8MSWIN1252 as a character set would suffice.
I'm sure you know more than me about this, but what you said seems to contradict section C.2) of 444701.1. It reads:
Please let me know if I misunderstand.
So, correct me if I am wrong, I think the correct course of action is as follows?:
1)Create new 11gR2 DB with WE8MSWIN1252 char set. (Or AL32UTF8 might work too?)
2)Change the 8i database using csalter/Alter database character set.
3)Set NLS_LANG to WE8MSWIN1252 on exporting server, & export data using exp.
4)Set NLS_LANG to WE8MSWIN1252 on importing server, & import data using imp.
Unless this is wrong, I will test this within the next couple days to see where I get.
Thanks for your help,
Pl edit your reply to remove the contents of the MOS Doc - posting such content is a violation of your Support agreement :-)
I would suggest you try this -
1. Set the client NLS_LANG on the source database to match the source database characterset, then perform the export.
2. Set the client NLS_LANG on the target database to match the target database characterset, then perform the import -- and all will be well.
This process assumes that the target database characterset is the same, or a superset, of the source database characterset.
Oracle recommends the use of AL32UTF8 characerset - About Creating an Oracle Database