9 Replies Latest reply: Oct 12, 2012 6:01 PM by orafad RSS

    sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output

    sbroeckmann
      hi,

      we have an unicode database with cyrillic content and I wanted to select some data in (windows) sqlplus (sounds easy...).

      These are the environments

      Oracle DB
      select * from v$version;
      
      BANNER
      ----------------------------------------------------------------
      Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
      PL/SQL Release 10.2.0.4.0 - Production
      CORE    10.2.0.4.0      Production
      TNS for HPUX: Version 10.2.0.4.0 - Production
      NLSRTL Version 10.2.0.4.0 - Production
      sqlplus
      SQL*Plus: Release 11.2.0.1.0 Production on Thu Oct 11 11:20:47 2012
      
      Copyright (c) 1982, 2010, Oracle.  All rights reserved.
      
      show all :
      
      appinfo is OFF and set to "SQL*Plus"
      arraysize 5000
      autocommit OFF
      autoprint OFF
      autorecovery OFF
      autotrace OFF
      blockterminator "." (hex 2e)
      btitle OFF and is the first few characters of the next SELECT statement
      cmdsep OFF
      colsep " "
      compatibility version NATIVE
      concat "." (hex 2e)
      copycommit 0
      COPYTYPECHECK is ON
      define "&" (hex 26)
      describe DEPTH 1 LINENUM OFF INDENT ON
      echo ON
      editfile "D:\tmp\afiedt.buf"
      embedded OFF
      escape OFF
      escchar OFF
      exitcommit ON
      FEEDBACK ON for 1 or more rows
      flagger OFF
      flush OFF
      heading ON
      headsep "|" (hex 7c)
      instance "local"
      linesize 4000
      lno 4
      loboffset 1
      logsource ""
      long 999999
      longchunksize 80
      markup HTML OFF HEAD "<style type='text/css'> body {font:10pt Arial,Helvetica,sans-serif; color:black; background:White;} p {font:10pt Arial,Helvetica,sans-serif; color:black; background:White;} table,tr,td {font:10pt Arial,Helvetica,sans-serif; color:Black; background:#f7f7e7; padding:0px 0px 0px 0px; margin:0px 0px 0px 0px;} th {font:bold 10pt Arial,Helvetica,sans-serif; color:#336699; background:#cccc99; padding:0px 0px 0px 0px;} h1 {font:16pt Arial,Helvetica,Geneva,sans-serif; color:#336699; background-color:White; border-bottom:1px solid #cccc99; margin-top:0pt; margin-bottom:0pt; padding:0px 0px 0px 0px;-
      } h2 {font:bold 10pt Arial,Helvetica,Geneva,sans-serif; color:#336699; background-color:White; margin-top:4pt; margin-bottom:0pt;} a {font:9pt Arial,Helvetica,sans-serif; color:#663300; background:#ffffff; margin-top:0pt; margin-bottom:0pt; vertical-align:top;}</style><title>SQL*Plus Report</title>" BODY "" TABLE "border='1' width='90%' align='center' summary='Script output'" SPOOL OFF ENTMAP ON PREFORMAT OFF
      newpage 1
      null ""
      numformat ""
      numwidth 10
      pagesize 999
      PAUSE is OFF
      pno 1
      recsep WRAP
      recsepchar " " (hex 20)
      release 1002000400
      repfooter OFF and is NULL
      repheader OFF and is NULL
      securedcol is OFF
      serveroutput ON SIZE UNLIMITED FORMAT WORD_WRAPPED
      shiftinout INVISIBLE
      showmode OFF
      spool OFF
      sqlblanklines OFF
      sqlcase MIXED
      sqlcode 0
      sqlcontinue "> "
      sqlnumber OFF
      sqlpluscompatibility 11.2.0
      sqlprefix "#" (hex 23)
      sqlprompt "> "
      sqlterminator ";" (hex 3b)
      suffix "sql"
      tab OFF
      termout ON
      timing OFF
      trimout ON
      trimspool ON
      ttitle OFF and is the first few characters of the next SELECT statement
      underline "-" (hex 2d)
      USER is "nyuser"
      verify ON
      wrap : lines will be wrapped
      errorlogging is OFF
      Windows settings (cmd)
      chcp 65001
      set NLS_LANG=.AL32UTF8
      Create table and insert one row:
      create table forumexample (longdescr varchar2(128));
      
      insert into forumexample (longdescr) values (unistr( '\0420\0443\0441\0441\043A\0438\0439' ));
      now lets select some data

      First shot (naive) :
      select longdescr from forumexample;
      
      LONGDESCR
      ------------------------
      ��усский
      
      1 row selected.
      hmm...2 goofy "question mark somethings"....the rest looks ok...

      Suspect: There is no unicode in the DB....something went wrong ...

      Second shot (selfcritical) :
      select longdescr,dump(longdescr) dump_longdescr from forumexample;
      
      LONGDESCR                DUMP_LONGDESCR
      ------------------------ -------------------------------------------------------------------------
      ��усский                  Typ=1 Len=14: 208,160,209,131,209,129,209,129,208,186,208,184,208,185
      
      1 row selected.
      so the first char is 208 160 which is something like a big "P" (like in Painful experience :-)):
      U+0420     Р     208 160
      looks good (=it is unicode)

      Note (see output of "second shot") :

      The output of the second column is shifted on digit to the right...?!
      LONGDESCR                DUMP_LONGDESCR
      ------------------------ -------------------------------------------------------------------------
      ��усский                  Typ=1 Len=14: 208,160,209,131,209,129,209,129,208,186,208,184,208,185
                                x
                                x
                                x
      Third shot (desparation):

      Put a blank in front of the first column :
      select ' '||longdescr longdescr,dump(longdescr) dump_longdescr from forumexample;
      
      LONGDESCR                DUMP_LONGDESCR
      ------------------------ -----------------------------------------------------------------------
       Русский                 Typ=1 Len=14: 208,160,209,131,209,129,209,129,208,186,208,184,208,185
      
      1 row selected.
      => Now the 208-161 character is shown...without problems....( so the windows font is able to show it !!)

      Fourth shot (curiosity):

      Select cyrillic attribute twice:
      
      select longdescr longdescr_1,longdescr longdescr_2,dump(longdescr) dump_longdescr from forumexample;
      
      LONGDESCR_1              LONGDESCR_2              DUMP_LONGDESCR
      ------------------------ ------------------------ ------------------------------------------------------------------------
      ��усский                  Русский                  Typ=1 Len=14: 208,160,209,131,209,129,209,129,208,186,208,184,208,185
      
      1 row selected.
      
      Select 2 rows:
      
      select longdescr longdescr_1,dump(longdescr) dump_longdescr from forumexample union all
      select longdescr longdescr_1,dump(longdescr) dump_longdescr from forumexample ;
      
      LONGDESCR_1              DUMP_LONGDESCR
      ------------------------ -------------------------------------------------------------------------
      ��усский                  Typ=1 Len=14: 208,160,209,131,209,129,209,129,208,186,208,184,208,185
      ��усский                  Typ=1 Len=14: 208,160,209,131,209,129,209,129,208,186,208,184,208,185
      
      2 rows selected.
      
      etc....
      Conclusion :

      I have did some more testing and the basic issue seems to be :

      If the first char in the first attribute of a select is a multi byte character : sqlplus puts in 2 "funny signs" and shifts the complete row,
      one digit to the right...

      Additional remarks:

      - Suspect is, that one of sqlplus parameter should be changed to change this behaviour, but I could not identify which one...

      - Creating the table differently
      create table forumexample2 (longdescr nvarchar2(128));
      (with NVARCHAR2) has not different result...


      Maybe one of you has a clue about this one...

      thx in advance

      Bye

      Stefan

      Edited by: sbroeckmann on 11.10.2012 06:21
        • 1. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
          Paul  Horth
          Sql*Plus on windows does not have any real capability to show international characters.

          Use SQL Developer or other more modern client.


          In SQL Developer:
          select *
          from forumexample;
          
          Русский
          • 2. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
            Sven W.
            >
            The behaviour as described sounds like a bug to me. You might want to consider raising it in oracle support.
            Windows settings (cmd)
            chcp 65001
            set NLS_LANG=.AL32UTF8
            This is somewhat unusual. AL32utf8 is the oracle database character set. The client character set should be differnt.
            More like "WE8ISO8859P5" (windows).

            Furthermore you are using a 11g sql plus client on a 10g database. I'm not aware of any problems with that, however for such strange problems you might consider testing it with an older client.
            Or set the compatibility to a differnt value
            SET SQLPLUSCOMPAT 10.2
            Edited by: Sven W. on Oct 11, 2012 6:10 PM
            • 3. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
              orafad
              sbroeckmann wrote:
              we have an unicode database with cyrillic content and I wanted to select some data in (windows) sqlplus (sounds easy...).
              Use Oracle SQL Developer instead.

              No-install unzip (Downloads tab):
              http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index.html


              >
              chcp 65001
              set NLS_LANG=.AL32UTF8
              Why?


              Use SQL Developer as suggested, or try:
              C:\Users\>chcp 1251
              Active code page: 1251
              
              C:\Users\>set nls_lang=.cl8mswin1251
              
              C:\Users\>sqlplus test
              
              SQL*Plus: Release 11.2.0.2.0 Production on Fri Oct 12 00:40:16 2012
              
              Copyright (c) 1982, 2010, Oracle.  All rights reserved.
              
              Enter password:
              ERROR:
              ORA-28002: the password will expire within 7 days
              
              
              
              Connected to:
              Oracle Database 11g Express Edition Release 11.2.0.2.0 - Production
              
              SQL> create table forumexample (longdescr varchar2(128));
              
              Table created.
              
              SQL>
              SQL> insert into forumexample (longdescr) values (unistr( '\0420\0443\0441\0441\043A\0438\0439' ));
              
              1 row created.
              
              SQL> commit;
              
              Commit complete.
              
              SQL> select longdescr,dump(longdescr,1016) dump from forumexample;
              
              LONGDESCR
              --------------------------------------------------------------------------------
              DUMP
              --------------------------------------------------------------------------------
              Русский
              Typ=1 Len=14 CharacterSet=AL32UTF8: d0,a0,d1,83,d1,81,d1,81,d0,ba,d0,b8,d0,b9
              (Cyrillic letter ER code units are 0xd0 a0, ... and so on)

              Edited by: orafad on Oct 12, 2012 12:47 AM
              • 4. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
                sbroeckmann
                Sql*Plus on windows does not have any real capability to show international characters. 
                ...but only the first character does not have this capability...?!
                Use SQL Developer or other more modern client.
                I will have a look at it...
                • 5. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
                  sbroeckmann
                  Sven W. wrote
                  This is somewhat unusual. AL32utf8 is the oracle database character set. The client character set should be differnt.
                  More like "WE8ISO8859P5" (windows).
                  I found these settings somewhere as "the settings for utf-8 support of sqlplus"...and as you can see : It works for everything except the first char....

                  I'm not aware of any problems with that, however for such strange problems you might consider testing it with an older client.
                  UNfortunately I cannot use different (Oracle 10) installation, as I am limited to the the standard environment, that we get from our DBAs....

                  Or set the compatibility to a differnt value
                  SET SQLPLUSCOMPAT 10.2
                  tried but didn't work.
                  • 6. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
                    sbroeckmann
                    orafad wrote: 
                    Use Oracle SQL Developer instead.
                    I will have a look at it...
                    chcp 65001
                    set NLS_LANG=.AL32UTF8
                    
                    
                    Why?
                    I found somewhere, that these are the settings for unicode support of sqlplus and except for the first char, this seems to be "ok"...
                    Use SQL Developer as suggested, or try:
                    
                    C:\Users\>chcp 1251
                    Active code page: 1251
                     
                    C:\Users\>set nls_lang=.cl8mswin1251
                    etc...
                    That is true, this works for cyrillic, but as we also other multi byte data in the DB, I wanted to have one "unicode" - environment,where I can handle everything without switching to different environments...

                    But OK, seems to be more tricky than expected... so, I will setup various settings...

                    Thx to all of you
                    • 7. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
                      orafad
                      sbroeckmann wrote:
                      That is true, this works for cyrillic, but as we also other multi byte data in the DB, I wanted to have one "unicode" - environment,where I can handle everything without switching to different environments...
                      If you want to work with character data of multiple languages, instead of trying to "hack" sqlplus and use it in a manner it was not designed to be used,
                      please use a Unicode enabled client such as - mentioned now for the fourth time or so ta da! -- Oracle SQL Developer.

                      Edited by: orafad on Oct 12, 2012 10:39 AM
                      • 8. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
                        Sergiusz Wolicki-Oracle
                        This is actually new to me that Command Prompt can be configured for UTF-8 (code page 65001). I recall having problems with this configuration some time ago. I wonder if it changed just in recent Windows versions. It does seem to work on my Win7 x64.

                        The SQL*Plus behavior is not reasonable here, even though we can argue the setup is unexpected/unsupported (no official policy on this exists, as far as I can tell). If you spool the content, it will look correct. There must be some issue with the terminal output. It is worth investigating.


                        -- Sergiusz
                        • 9. Re: sqlplus on windows : Select cyrillic ( multi byte ) data. Strange output
                          orafad
                          Maybe I'm pushing away from sqlplus a bit too hard and part of this may be related to the old "sqlplusw" gui version on Windows...

                          Anyway, in your testcase, if you spool output and open in e.g notepad, characters should be shown correctly.

                          And as you said, if you add a (ascii or single byte) character as the first character, console output works.
                          SQL> select longdescr,dump(longdescr,1016) dump from forumexample;
                          
                          LONGDESCR
                          -----------------------------------------------------------------------------
                          DUMP
                          -----------------------------------------------------------------------------
                          ��усский
                          Typ=1 Len=14 CharacterSet=AL32UTF8: d0,a0,d1,83,d1,81,d1,81,d0,ba,d0,b8,d0,b9
                          
                          
                          SQL> select '.'||longdescr dump from forumexample;
                          
                          DUMP
                          -----------------------------------------------------------------------------
                          .Русский
                          As stated in the following thread, there seem to be a quirk in the Win C runtime that relates to noted behaviour:
                          how to use sqlplus with utf8 on windows command line