This content has been marked as final. Show 9 replies
It depends on your client NLS_LANG setting.
'Invalid Data That Results from Setting the Client's NLS_LANG Parameter Incorrectly
If the database character set is WE8ISO8859P1 and the NLS_LANG setting of the Chinese Windows NT client is SIMPLIFIED CHINESE_CHINA.WE8ISO8859P1, then all multibyte Chinese data (from the ZHS16GBK character set) is stored as multiples of single-byte WE8ISO8859P1 data. This means that Oracle treats these characters as single-byte WE8ISO8859P1 characters. Hence all SQL string manipulation functions such as SUBSTR or LENGTH are based on bytes rather than characters. All bytes constituting ZHS16GBK data are legal WE8ISO8859P1 codes.'
Thanks for the responses guys.
The NLS_LANG is set to AMERICAN_AMERICA.WE8MSWIN1252 so the conversion does occur.
I also tried the procedure you mentioned - set the nls_characterset and nls_lang to WE8ISO8859P1 so that no conversion would be done and each multibyte chinese character would be saved as single byte WE8ISO8859P1 character, but I still had no success.
If I understand correctly, they could be stored if nls_lang and characterset is the same so no conversion is made. And if you query such a value with TOAD it would display it in Chinese because the Regional Settings for non-Unicode programs is set to Chinese (PRC) - correct me if I am wrong.
It could be that the application which inserted the values is doing some additional logic, but I am not so sure about this. I will try to check this. But I still don't understand why the old VB6 application with RDO fetches Chinese characters without a problem and it's doing just SELECT FROM... and no additional logic.
I think ODP.net will not use the NLS_LANG setting from the registry.
Is it possible to do this in your code of the application?
Or maybe setting NLS_LANG as default environment variable.
I do not know which method Microsoft Oracle Data Provider uses but if it works similarly to ODP.NET, then it uses OCI UTF-16 mode for connection. In this mode, NLS_LANG character set is irrelevant and conversion to/from UTF-16 always takes place. Anyway, unless the customer stores the data in NVARCHAR2 columns, then a WE8ISO8859P1 database is not supposed to store Chinese and you do not have to worry too much about supporting such database. The customer should migrate the database to AL32UTF8.
Vklop wrote:Would they be happy with having their data corrupted (stored in illegal manner); not being able to use the data from any tool, not being able to do proper string manipulation, etc.?
... I don't think that the customer would be happy with migration with to AL32UTF8 :)
Besides steering clear of the 'gi-go' scenario, moving to a Unicode database would be a way to prepare for future requirements (and next versions of the Database, where AL32UTF8 likely will be the natural/common choice).
There is no common solution in this cases. Most of the time you see that the nls_lang was set the same as the database nls settings. There was no remapping/conversion of characters. The same characters come back the way they were stored (without NLS conversion). In such a situation a standard migration on database level will corrupt your data. So there is a lot of research needed to migrate to AL32UTF8. It could and should be done but it is not easy in such a case.