This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Aug 14, 2012 4:10 PM by 950015 RSS

Oracle 10g R2 - Unicode is converted to unknown ?����

950015 Newbie
Currently Being Moderated
Ok, I've read a thousand posts. Tried everything still stuck.

I am using Oracle 10g : 10.2.0.5.0
I'm on a Mac, using SQL Developer 3.1.07

When I enter "ė€£¥©", the insertion into the database converts it to: ?����

This is straight from a SQL INSERT command, in SQLDeveloper, no java code or anything.

I tried adding a file called SQLDeveloper.app/Resources/sqldeveloper/sqldeveloper/sqldeveloper.conf and added these lines:
-Doracle.jdbc.defaultNChar=true
-Doracle.jdbc.convertNcharLiterals=true

Didn't help.

I tried created a test column and made the datatype NVARCHAR2. same results. VARCHAR column does the same corruption but that's expected.

I tried N'©' - doesn't work

I tried INSERT INTO testchar (column1) VALUES ( unistr('©') );
results is still : �


I check NSL_NVAR_CHARACTERSET and it's UTF8
NLS_LANGUAGE     AMERICAN
NLS_TERRITORY     AMERICA
NLS_CURRENCY     $
NLS_ISO_CURRENCY     AMERICA
NLS_NUMERIC_CHARACTERS     .,
NLS_CHARACTERSET     US7ASCII
NLS_CALENDAR     GREGORIAN
NLS_DATE_FORMAT     DD-MON-RR
NLS_DATE_LANGUAGE     AMERICAN
NLS_SORT     BINARY
NLS_TIME_FORMAT     HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT     DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT     HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT     DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY     $
NLS_COMP     BINARY
NLS_LENGTH_SEMANTICS     BYTE
NLS_NCHAR_CONV_EXCP     FALSE
NLS_NCHAR_CHARACTERSET     UTF8
NLS_RDBMS_VERSION     10.2.0.5.0
I don't know what else to do. I am not a DBA. I'm just trying to fix a bug or at least tell a customer what's wrong. Entering special characters is required. Can anyone help? thank you!

I have read that the insertion into the database does its own conversion but I don't know if that applies to me since I'm in 10g R2

Edited by: 947012 on Aug 14, 2012 11:43 AM


My mac's locales are also :
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
Edited by: 947012 on Aug 14, 2012 11:45 AM

Edited by: 947012 on Aug 14, 2012 11:49 AM
  • 1. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    sb92075 Guru
    Currently Being Moderated
    947012 wrote:
    Ok, I've read a thousand posts. Tried everything still stuck.

    I am using 10g : 10.2.0.5.0

    When I enter "ė€£¥©", the insertion into the database converts it to: ?����

    This is straight from a SQL INSERT command, in SQLDeveloper, no java code or anything.
    do you have data storage problem or data presentation problem?

    SELECT ASCIISTR(funky_col) from test_table where ID = 1;

    post results from SQL above
  • 2. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    950015 Newbie
    Currently Being Moderated
    There was an error in that query (I don't know much about SQL) from SQL Devleoper
    select asciistr(column2) from testchar where ID=1;
    
    
    ORA-00904: "ID": invalid identifier
    00904. 00000 -  "%s: invalid identifier"
    *Cause:    
    *Action:
    But, you might be asking for the raw data, which I've check and forgot to mention. The raw values aren't correct. The raw hex values for the insertion is :
    insert into testvar (column1, column2) values ('£¥©', '£¥©');
    select dump (column2, 1016) from testchar
    
    
    Typ=1 Len=9 CharacterSet=UTF8: ef,bf,bd,ef,bf,bd,ef,bf,bd
    But it should be : \u00a3 \u00a5 \u00a9
    I think

    or \xa3\xa5\xa9

    thank you

    btw - notice that CharacterSet=UTF8. I thought that was a good sign......
  • 3. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    Srini Chavali-Oracle Oracle ACE Director
    Currently Being Moderated
    Your database characterset is US7ASCII ...

    >
    ...
    NLS_CHARACTERSET     US7ASCII
    ...
    >

    .. which is incapable of storing or representing the characters "ė€£¥©" if tables are using standard columns

    You will need switch to AL32UTF8 characterset. If this is a new database, simple drop and re-create the database with AL32UTF8, else you will need to migrate this database to the AL32UTF8 characterset

    http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#sthref1441

    The NLS_NCHAR_CHARACTERSET is used for columns using the NCHAR data type

    HTH
    Srini
  • 4. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    950015 Newbie
    Currently Being Moderated
    Yes, my column IS NVARCHAR2

    EDIT: Oh, so NLS_CHARACTERSET is for the whole database, not just for VARCHAR columns? I didn't realize that I couldn't override that by using NVARCHAR2. Is that a correct statement? thank you

    Edited by: 947012 on Aug 14, 2012 12:09 PM
  • 5. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    950015 Newbie
    Currently Being Moderated
    but my column is NVARCHAR2
  • 6. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    950015 Newbie
    Currently Being Moderated
    AND, is there a difference between my NVAR UTF8 and the AL32UTF8 Encoding ? Does it have to be changed? thank you!
  • 7. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    Srini Chavali-Oracle Oracle ACE Director
    Currently Being Moderated
    947012 wrote:
    but my column is NVARCHAR2
    Pl post a description of the table you are inserting into, and the complete insert statement.

    Correct - NLS_CHARACTERSET applies to standard CHAR and VARCHAR2 columns, while NLS_NCHAR_CHARACTERSET applies to only NCHAR and NVARCHAR2 columns

    http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch2charset.htm#i1006750

    HTH
    Srini
  • 8. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    950015 Newbie
    Currently Being Moderated
    OH, so NVARCHAR2 should work. Darn. I was hoping that it just isn't supported by the way the database was set up with US7ASCII.


    Here is the SQL code for the table I created this morning :
    CREATE TABLE "ALFRESCO"."TESTCHAR"
      (
        "COLUMN1" VARCHAR2(1024 CHAR),
        "COLUMN2" NVARCHAR2(1024)
      )
      PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE
      (
        INITIAL 131072 NEXT 131072 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT
      )
      TABLESPACE "ALFRESCO_DATA" ;
    ^^^^ is that a description? Please let me know if it's something different.


    I was just inserting by hand in SQL Developer :
    insert into testchar (column2) values ('£¥©');
    (note: i made column1 regular VARCHAR just for testing) Thank you

    Edited by: 947012 on Aug 14, 2012 12:34 PM
  • 9. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    Srini Chavali-Oracle Oracle ACE Director
    Currently Being Moderated
    Pl see http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:3349711400346955195

    Try
    insert into testchar (column2) values (n'£¥©');
    HTH
    Srini
  • 10. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    950015 Newbie
    Currently Being Moderated
    I tried that and I don't get the correct raw data

    same error :

    ���

    Typ=1 Len=9 CharacterSet=UTF8: ef,bf,bd,ef,bf,bd,ef,bf,bd




    What does all this mean? Does the NLS_NVAR_CHARACTERSET have to be chaged to AL32UTF8? and not UTF8? Should I upgrade jdbc driver? I don't know what to do.

    Edited by: 947012 on Aug 14, 2012 1:53 PM
  • 11. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    Sergiusz Wolicki (Oracle) Expert
    Currently Being Moderated
    This two things together should work:

    1. Adding the letter "n" in front of the character literal.
    2. Defining the property oracle.jdbc.convertNcharLiterals=true in sqldeveloper.conf

    The line added to the configuration file should be:

    AddVMOption -Doracle.jdbc.convertNcharLiterals=true


    To make sure that the option was actually consumed, go to SQL Developer's Help menu, click About, select Properties tab and look for oracle.jdbc.convertNcharLiterals.


    -- Sergiusz
  • 12. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    950015 Newbie
    Currently Being Moderated
    I'm going to faint. It works! I thought I already did that but I put the flags in a local conf file so when I double checked what you mentioned; and corrected it and it is showing up in the database correctly.

    Is there a way to specify that flag from the Java code ? When I do the same thing from Java the characters are corrupted. thank you!

    Edited by: 947012 on Aug 14, 2012 3:39 PM
  • 13. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    Sergiusz Wolicki (Oracle) Expert
    Currently Being Moderated
    NLS_NCHAR_CHARACTERSET specifies the encoding of the NVARCHAR2 variables but not the SQL statements. SQL statements, when they arrive to the database server, are always encoded in the database character set (NLS_CHARACTERSET). This is true for the whole statement, including any character literals. Therefore, even if a literal is valid in SQL Developer (which works in Unicode), when the literal arrives to the database, it is already stripped of all characters that the database character set does not support. The characters are not actually removed but they are converted to a replacement character. In case of the US7ASCII database character set, anything non-ASCII is lost.

    The trick to avoid this problem is to:
    - mark the literals that need to be preserved for NVARCHAR2 columns as NVARCHAR2 literals by prefixing them with "n".
    - set the mentioned property to activate the client side encoding mechanism with basically does a rough parsing of the statement and replaces all N-literals with U-literals. Undocumented U literals are similar to UNISTR calls. They encode non-ASCII characters with Unicode escape sequences. For example, n'€' (U+20AC) is encoded as u'\20AC'. As u'\20AC' contains only ASCII characters, it arrives to the database unchanged. The SQL parser recognizes the U-literals and converts them (unescapes them) to NVARCHAR2 constants before further SQL processing, such as INSERT.


    -- Sergiusz
  • 14. Re: Oracle 10g R2 - Unicode is converted to unknown ?����
    Sergiusz Wolicki (Oracle) Expert
    Currently Being Moderated
    Yes, you should be able to do this from the Java code. For example:
    import java.util.Properties;
    import java.sql.*;
    
        Properties p = new Properties();
        p.setProperty("user", "scott");
        p.setProperty("password", "manager");
        p.setProperty("oracle.jdbc.convertNcharLiterals", "true");
        Connection c = DriverManager.getConnection( "jdbc:oracle:thin:@localhost:1522:ora11203", p);
     
    -- Sergiusz
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points