This content has been marked as final. Show 21 replies
Well, I'll answer myself.... looking for confirmations of my ideas (or not)
I'm thinking of such an approach, please advise if in you opinion it is correct:
1. Create a new AL32UTF8 database (fresh installation)
2. Running csscan on the WE8MSWIN1252 database, but just on the users I need in the new database
3. Check (and resolve) lossy/truncation data
4. Re-run csscan on the WE8MSWIN1252 database, to check that there isn't any truncation/lossy
5. Export the users I need setting on the client NLS_LANG to the one of the old (WE8MSWIN1252) database
6. Import the exported users to the new (AL32UTF8) database using the same NLS_LANG of the export operation
Does it make sense?
Yes, it makes perfect sense. If you just need a subset of the schemas in a new Unicode database, then this is the simplest approach.
A different approach would be required if you wanted the whole WE8MSWIN1252 database in Unicode but only a few schemas had convertible data (with the rest being changeless).
I accidentally created a database with the default character set and thought I would just migrate it to AL32UTF8 before importing my data. The utilities don't even work with a newly created database. The utilities are a JOKE. Fire the people who wrote them and burn the books. It was a total F$%^ing waste of my time to try to migrate the database. Just put in your manuals a picture of someone bending over with a big red thing going you know where in the migrating to another character set chapter.
We have two utilities to assist in migrating the database character set: csscan utility to look for data problems and csalter.plb script to perform the database character set change. I understand you are referring to these tools. The goal of the tools is to assist with the migration. They do not support "one-click" migration at this stage and the migration process is manual. We try to develop something more comfortable.
The cssan tool does have some bugs but I cannot say it does not work at all. If you would like us to help you with the problem or fix the bugs in the tool, please provide some technical details of the problem.
Due to resource constraints we will not be able to utilize your suggestion regarding improvements to the documentation ;-)
How did you resolve Point#3. Check (and resolve) lossy/truncation data.
Do I need to correct the data itself?
I am having WE8MSWIN1252 database on 10g and AL32UTF8 DB on Oracle 11g(R2). I am only having truncation Data nowhere lossy.
I followed below steps and got success but developers are complaining that there is Junk Data comming in AL32UTF8 DB.
1. Create a new AL32UTF8 database (fresh installation)
2. Running csscan on the WE8MSWIN1252 database, but just on the TABLES I need in the new database.
3. Find truncation Columns from CSSCAN results.
4. Generate "Table CREATION" script from WE8MSWIN1252 Database.
5. Modifying the script, by finding those columns (identified by CSSCAN) and Changing those from VARCHAR2(xxx) to VARCHAR2(xxx CHAR). Now I have empty Tables Ready in New Database which can accept data in CHARACTERS not in BYTES.
6. CREATING TABLES in AL32UTF8 DB using the same modified script.
7. Connecting to WE8MSWIN1252 Database using NLS_LANG=WE8MSWIN1252.
8. CREATING DB LINK on WE8MSWIN1252 to connect with AL32UTF8 DB.
9. INSERT INTO TABLE_NAME@DB_LINK(RESIDING IN AL32UTF8) SEELCT * FROM TABLE_NAME(RESIDING IN WE8MSWIN1252)
10. GENERATE Index CREATION Script from WE8MSWIN1252 Database and execute it in AL32UTF8 DB.
Does it make sense? developers are complaining that there is Junk Data comming in AL32UTF8 DB and they unable to create a copy of those tables migrated from WE8MSWIN1252 Database.
Appreciate your prompt response on it.
Edited by: 830638 on Jun 1, 2011 2:29 AM
You could use dump() function to verify characters stored in table_name columns.
E.g. select col, dump(col, 1016) from table_name where suitable_condition...;
It may be advisable to open a new thread for your specific issue, instead of re-opening a 3 year old one.
Hi OraFad, Thanks for your reply. I am interested in verifying my steps with recipients here on this thread specially Paolo who has already done it. I may not be able to find him on new thread so thought of extending the same thread. My Apologies. I already tried dump() it is showing AL32UTF8 for all the records even junk data.
830638 wrote:That's only half of the story, you'd need to check code units as well (hex code representation of characters) for the supposed "junk data".
I already tried dump() it is showing AL32UTF8 for all the records even junk data.
My thought was, hopefully characters are stored correctly and you may get back to developers with that.
Hex Code Unit of Junk data is like below but how can we differenciate it with correct data which looks same. I would like know what should be the next step or way forward to resolve it. Thank you for your great assistance in this regards.
Typ=1 Len=6662 CharacterSet=AL32UTF8: xxxxxxxx
Edited by: S. Wolicki, Oracle on Jun 1, 2011 3:23 PM
Removed most of the output to protect possibly sensitive data.
Edited by: 830638 on Jun 1, 2011 7:43 AM
(This is why I suggested to open up a separate thread : -) )
You may check yourself, using http://rishida.net/tools/conversion/
But beginning of that line seems to be Arabic letters, "...بعد الإطلاع على". First (uncomposed) letter is Beh.
However, that won't tell if there's corruption somewhere among the remaining thousands bytes of character data. (Since I just checked few first words or so.)
The DUMP result you pasted is valid text in Arabic concerning some legal (law) matters. Google Translate provided a reasonable translation into English, so it is certainly not random garbage.
You need to ask your developers what exactly they call "junk". A precise description of the "junk" can frequently tell what the actual issue is.
Also, as the text is Arabic, it does not come from the WE8MSWIN1252 database. It is some new data.
You may want to edit that post (remove dump output) due to nature of information it contains.
Thans Orafad and Sergiusz.
Actually, we are unable to see the actual arabic writing in the resultset for a record having this kinda
data when executing the query. The data is comming like invalid arabic characters combination for wordings and this is getting possible only if selecting all the columns. Selecting single column For a record (having this kinda data) is not possible for us as we are receiving an error message "End of File on Communication Channel". As per my calculation 95% of the data has been correctly migrated by following the above provided steps. 5% of the data, developers are unable to play in their procedures and receiving EOF error in their cursors during the execution. Also they are unable to create a copy of tables having this kinda data using Create table as ... statement.
My question is How can I fix this kinda data in one go where I have 38 lakh records in one of the table.