This content has been marked as final. Show 8 replies
Thank you for quick reply orafad.
We are inserting the broken pipe (￤) character in a table-column of type varchar2 in oracle (10g version:10.1.0.5.0 ) using
java.sql.Statement in java (Java version : 22.214.171.124 and JDBC Driver: ojdbc14.2.jar). But when it's inserted in database it gets changed into normal pipe character (|).
Following are the NLS settings:
PARAMETER : VALUE
NLS_LANGUAGE : JAPANESE
NLS_TERRITORY : JAPAN
NLS_CURRENCY : Y
NLS_ISO_CURRENCY : JAPAN
NLS_NUMERIC_CHARACTERS : .,
NLS_CHARACTERSET : JA16SJIS
NLS_CALENDAR : GREGORIAN
NLS_DATE_FORMAT : DD-MON-RRRR
NLS_DATE_LANGUAGE : AMERICAN
NLS_SORT : BINARY
NLS_TIME_FORMAT : HH24:MI:SSXFF
NLS_TIMESTAMP_FORMAT : RR-MM-DD HH24:MI:SSXFF
NLS_TIME_TZ_FORMAT : HH24:MI:SSXFF TZR
NLS_TIMESTAMP_TZ_FORMAT : RR-MM-DD HH24:MI:SSXFF TZR
NLS_DUAL_CURRENCY : \
NLS_COMP : BINARY
NLS_LENGTH_SEMANTICS : BYTE
NLS_NCHAR_CONV_EXCP : FALSE
NLS_NCHAR_CHARACTERSET : AL16UTF16
NLS_RDBMS_VERSION : 10.1.0.5.0
Please, post the code fragment you use to insert this character. By "broken pipe" character, I guess you mean "U+FFE4 Full Width Broken Bar", which has code 0xFA55 in JA16SJIS.
A probable cause of the issue is that you use a text literal but you do not specify the proper encoding when compiling the Java source. Instead of using the character directly, try its Unicode escape "\uffe4".
I understand your explanation.
It is not possible to put actual code, that's complex and coupled with other things.
I like to share the scenario where it failing,
1. Broken Pipe character(double byte (\u00fa\u0055)) is put along with other japanese long text in a *.txt file.
2. A java file reads the above *.txt file and inserts the file text in a table column in oracle.
The .txt file is saved in Shift_JIS encoding.
In java file, the same *.txt file is read using 'CP943C' charset and written into database.
So as per your suggestion I have to compare literal first and than if I found broken bar in file, I need to change it with specified unicode.
Is that so?
As far as I can tell, if you read the Shift_JIS file using the CP943C encoding, then the full width broken bar character will become the normal (half) width broken bar character U+00A6. But as this character is not represented in JA16SJIS, it gets converted to the replacement character U+007C Vertical Line aka "pipe character". To read the Shift_JIS file preserving the full width broken bar U+FFE4, use the MS932 encoding instead of CP943C.
Earlier we have used the same encoding MS932 for reading and writing files & writing data into database.
But in that case full width tilde character (～) is changed into question mark ? character, and to solve this we changed encoding from MS932 to CP943C.
Now, if we will revert it back, we will reach into previous position.
Anyway, if there is any common encoding which can support number of characters together that only can help us in Japanese.
Thanks for valuable comments.
For the correct support of the tilde character 0x8160 (U+FF5E), use JA16SJISTILDE for the database character set. It differs from JA16SJIS exactly by this single character. So, use MS932 and JA16SJISTILDE.
You can also go the modern way and use AL32UTF8 for the database character set. Then, you can have a single database for all languages. not just for Japanese. This is the recommended character set for all new deployments, even if it uses more space for Japanese than the dedicated JA16SJIS(TILDE).