5 Replies Latest reply: Jul 11, 2014 2:39 PM by Sergiusz Wolicki-Oracle RSS

    Pro*C unicode to retrieve and print

    user10101989

      Hi,

      I need to retrieve unicode values from the NVARCHAR column of the table and print on the putty terminal (final intention is to write into a file in utf8 or utf16 format ).

      I tried the sample program provided in the developer guide but the wprintf or printf is not printing any characters at all on the terminal. When I do a sample C only program using unicode character in the printf is it displaying properly on the putty terminal.

      I have tried search all thr the web but couldnt find any material or sample for retrieving and displaying unicode.

       

       

      #include <sqlca.h>

      #include <sqlucs2.h>

       

      main()

      {

        ...

        /* Change to STRING datatype: */

        EXEC ORACLE OPTION (CHAR_MAP=STRING) ;

        utext ename[20] ; /* unsigned short type */

      uvarchar address[50] ; /* Pro*C/C++ uvarchar type */

       

        EXEC SQL SELECT ename, address INTO :ename, :address FROM emp;

        /* ename is NULL-terminated */

      wprintf(L"ENAME = %s, ADDRESS = %.*s\n", ename, address.len,

      address.arr);

      ...

      }

       

       

      My Environment details:

      Database: Oracle 11.1g

      OS: SunOS 5.10

       

      locale

      LANG=

      LC_CTYPE="C"

      LC_NUMERIC="C"

      LC_TIME="C"

      LC_COLLATE="C"

      LC_MONETARY="C"

      LC_MESSAGES="C"

      LC_ALL=

       

       

      Database NLS parameters:

      NLS_NCHAR_CHARACTERSET AL16UTF16

      NLS_CHARACTERSET US7ASCI

      NLS_LANGUAGE AMERICAN

       

       

      The sample C program which worked is:

      #include <stdio.h>

        #include <locale.h>

       

        int main()

        {

        if (!setlocale(LC_CTYPE, "")) {

        printf("Can't set the specified locale! \n");

        return 1;

        }

        printf("%ls\n", L"Schöne Grüße");

        return 0;

        }

      Note: Above I used Chinese character for test.

        • 1. Re: Pro*C unicode to retrieve and print
          Sergiusz Wolicki-Oracle

          You try to output Unicode in UTF-16 format. I do not see an option in Putty to process UTF-16. You should output UTF-8.

           

          I suggest you set and export the NLS_LANG variable to .AL32UTF8 before running your program. In your program, use normal character variables (not utext/uvarchar) but mark them as NCHAR using syntax shown here at http://docs.oracle.com/cd/E11882_01/appdev.112/e10825/pc_04dat.htm#i22704 (CHARACTER SET [IS] NCHAR_CS).

           

           

          Thanks,

          Sergiusz

          • 2. Re: Pro*C unicode to retrieve and print
            user10101989

            Hi Sergiusz, thanks very much for reply. I m able to print and write the UTF8 chars into a binary file. (But this is using static SQL).

             

            My application design is based on Dynamic SQL Method 4. We store all the sql statements in a table, which we retrieve & execute at run time. We format the output before writing into file.

            So, I have added the new column (NVARCHAR type) to one of the existing sql statement and declared a host variable in the code as (CHARACTER SET [IS] NCHAR_CS) and tried print/write the value to a file which didnt work. I tried copying the value from the select_des->V variable to the host variable and tried printing/writing which didnt gave any output.

            Can you suggest how I can make this work in Dynamic SQL too.

             

             

             

            BTW here is was working code using static SQL.

            (sorry indentation is lost)

            <code>

            #include <stdio.h>

            #include <string.h>

            #include <sqlca.h>

            #include <sqlucs2.h>

            #include <wchar.h>

            #include <locale.h>

             

            int main()

            {

            char username[10] = "test";

            char password[15] = "test";

            int cstm_id=0;

            //EXEC ORACLE OPTION (CHAR_MAP=STRING) ;

            char character set is nchar_cs my_utf8_str[100];

            char str2[100];

            FILE *fp;

            EXEC SQL CONNECT :username IDENTIFIED BY :password;

             

            EXEC SQL SELECT my_id, my_utf8_tx

            INTO :my_pk_id, :my_utf8_str -- output host variables

            FROM test_utf8_tb

            WHERE my_id = 948045;

             

            int a=strlen(my_utf8_str);

            printf("str is %s\n",my_utf8_str);

            printf("a=%d c=%zu y=%zu\n",a,sizeof(my_utf8_str),sizeof(my_utf8_str[0]));

             

            //str2[60]='\0';

            strcpy(str2,my_utf8_str);

            printf("str2 is %s\n",str2);

            int x=strlen(str2);

            printf("x=%d c=%zu\n",x,sizeof(str2));

             

            fp=fopen("file_output.dat","w");

            fwrite(my_utf8_str,sizeof(my_utf8_str[0]),sizeof(my_utf8_str)/sizeof(my_utf8_str[0]),fp);

            fclose(fp);

             

            char cpstr[200];

            char mystr[]="My output string is ";

            sprintf(cpstr,"%s",mystr);

            memcpy(cpstr+strlen(mystr),my_utf8_str,strlen(my_utf8_str));

            memcpy(cpstr+strlen(cpstr)+1,"\n",1);

            printf("cpstr=%s and is of length=%d\n",cpstr,strlen(cpstr));

             

            fp=fopen("file_output2.dat","w");

            fwrite(cpstr,sizeof(cpstr[0]),strlen(cpstr)+2,fp);

            fclose(fp);

             

             

            return(0);

            };

            </code>

            • 3. Re: Pro*C unicode to retrieve and print
              Sergiusz Wolicki-Oracle

              I am not an expert in Pro*C dynamic SQL processing and the documentation is very poor in this area. However, the result of my research suggests that:

               

              • Oracle Dynamic SQL Method 4 does not support NCHAR variables -- I see no way to get NCHAR content without it being converted to US7ASCII
              • You should be able to use ANSI Dynamic SQL with NCHAR (Chapter 14 in Pro*C/C++ Programmer's Guide 11.2)
              • Alternatively, give up Pro*C and use OCI. OCI is the most powerful Oracle Database access API.

               

              Check the demo program ansidyn1.pc and see if it can output UTF-8 just by setting NLS_LANG=.AL32UTF8. If you do not get satisfactory output from NVARCHAR2 for non-ASCII characters, try to modify the example as follows:

               

              int process_output()

              {

                 [...]

                 for (i = 0; i < output_count; i++)

                  {

                      occurs = i + 1;

                      EXEC SQL GET DESCRIPTOR 'output_descriptor' VALUE :occurs

                               :name = NAME;

                      printf("%-*.*s ", 9,9, name);

                      EXEC SQL SET DESCRIPTOR 'output_descriptor' VALUE :occurs

                               TYPE = :type, LENGTH = :len, NATIONAL_CHARACTER = 2;

                  }

                  printf("\n");

                [...]

               

              }

               

               

              Make sure to test non-ASCII characters! Seeing English text correctly is not a confirmation that everything is fine.

               

              Thanks,

              Sergiusz

              • 4. Re: Pro*C unicode to retrieve and print
                user10101989

                Hi Sergiusz,

                Thanks for your suggestion, I was able to retrieve the unicode character in the UTF-8 format using ANSI dynamic SQL as below.

                 

                int coltype = 5;//for string datatype

                int nchar_typeset=2;

                char putcharset[10]="AL32UTF8";

                char output[MAX_SELECT_LIST_NUM][100][3001];

                for(i=0;i<col_count;i++)

                {

                     EXEC SQL SET DESCRIPTOR 'select_des' VALUE :occurs TYPE = :coltype, LENGTH = :len, NATIONAL_CHARACTER = :nchar_typeset, CHARACTER_SET_NAME=:putcharset;

                     EXEC SQL FOR :nFetchArraySize SET DESCRIPTOR 'select_des' VALUE :occurs REF DATA = :output[i];

                }

                And I m using fwrite to write the data into a file by specifying the number of bytes. (using strlen function to find the byte size).

                 

                 

                Can you help me in few other things:

                1)

                My requirement is to create a fixed record length file (no. of characters).

                Eg: The NVARCHAR column should be 250 char length in the output file. If the retrieved value from the db has 10 unicode characters then I will concatenate 240 space filler to make it to 250 chars.

                To implement this I have included length(mynvarchar_column) in the select list and using it to find the difference of chars that I need to concatenate.

                I m not able to find the no. of characters in the unicode string in C as the strlen gives the results in bytes which I cannot use in my calculation to find no. of spaces I need to append.

                My approach above (length(mynvarchar_column) in the select list)  is working as intended,  do you think I need to find a better way or whatever I m doing is ok.

                 

                2) I read in some websites that NULL character is valid character in a unicode string. The nvarchar fields in my database will be populated by my interfacing applications and the fields I have is for Name, Address, Description etc. What is possibility of getting NULL characters in these kind of fields, do I need to take special care of it and if so how can I implement it.

                • 5. Re: Pro*C unicode to retrieve and print
                  Sergiusz Wolicki-Oracle

                  1) The approach with LENGTH is acceptable if you want to save some coding in C. However, as your input is NVARCHAR2 with UTF-16 encoding, you should use LENGTH4 in place of LENGTH, so that supplementary characters are handled correctly. Supplementary characters use two character codes (a surrogate pair) in UTF-16 but one character code in UTF-8. LENGTH4 should calculate the number of codes in UTF-8.

                   

                   

                  Alternatively, you can implement a character counting function for UTF-8 in the application:

                   

                   

                  unsigned int countUTF8Characters(unsigned char *inputStringPtr, unsigned stringSize /*incl.NUL*/)

                  {                             /* ^^^ 'unsigned' important here! */

                    unsigned charCount = 0;

                    unsigned advance;

                   

                    if (inputStringPtr == (unsigned char *)NULL || stringSize == 0)

                      return 0; /* no input */

                   

                    do {

                      if (*inputStringPtr == 0x00)

                        break; /* end of string */


                      ++charCount;


                      if (*inputStringPtr < 0xc0)

                        /* one byte character or a continuation byte;

                           standalone continuation bytes are illegal and should

                           not come up in a string but if they come, we will count

                           each as one character */

                        advance = 1;

                      else

                      if (*inputStringPtr < 0xe0)

                        /* two bytes character; we will not check if a continuation

                           byte follows; we will just assume this */

                        advance = 2;

                      else

                      if (*inputStringPtr < 0xf0)  /* three bytes character */

                        advance = 3;

                   

                      else

                      if (*inputStringPtr < 0xf8)  /* four bytes character */

                        advance = 4;

                      else

                        /* illegal leading byte, which should not come up

                           in the string; we will count it as one character */

                        advance = 1;

                     

                      if (stringSize <= advance)

                        break; /* no more bytes in the string */


                      stringSize -= advance;

                      inputStringPtr += advance;


                    } while(1);


                    return charCount;

                  }

                   

                   

                  (Code not tested, not even compiled. Review and test before using. Use on your own risk.)

                   

                   

                  The semantics of this function for malformed UTF-8 code sequences is one of a few possible and not necessarily optimal for your case. However, as the database converts from AL16UTF8 and does not provide raw database content, such sequences should not appear in output.

                   

                   

                  2) The NUL character is theoretically valid in Unicode but also in many other character sets supported by Oracle Database. However, its handling is not well defined nor tested and its use conflicts with the use as the C string terminator. In general, the NUL character should not show up in normal text data entered from a keyboard, because there is generally no way to key it in. However, data loaded from a file may contain any rubbish, depending on what the application producing the file emitted. You can also use SQL to enter the NUL character into a column. If you cannot control data entered into the database, you should be prepared to handle the NUL character. You should use the VARCHAR external data type (type=9) in place of STRING type (type=5) to read the data. The external VARCHAR data type (not to be confused with internal VARCHAR/VARCHAR2) relies on separate length information and does not treat NUL as string terminator. You must verify the application requirements with your business and decide if you want to output NUL as any other character, flag it as a fatal error, remove it silently from output string, or replace it with the space character (0x20). Any of the approaches is possible, based on the business requirements. To handle some of these cases, you can modify the above function countUTF8Characters(). For example, to treat NUL as normal character, remove the two lines:


                      if (*inputStringPtr == 0x00)

                        break; /* end of string */


                  To emit space in place of NUL, change the two lines to:


                      if (*inputStringPtr == 0x00)

                        *inputStringPtr = 0x20;


                  etc. Rename the function to something like countAndCleanupUTF8Characters. You can also use modify the function to report malformed UTF-8 output. Some cases of malformed output (non-shortest form, surrogate codes) are not recognized by the function, so it would need to be rewritten to handle full UTF-8 validation.

                   

                   

                   

                   

                  Thanks,

                  Sergiusz