6 Replies Latest reply: Feb 11, 2013 3:51 AM by 834033 RSS

    Unicode Issues

      Hi All,

      I have a requirement where i need to get content from a file and then loading it into a table.

      The file is containing Some Unicode Characters.

      The content is inserted into a CLOB Column in a table.
      I placed the file in a directory located in DB Server and loaded into table using below proc.

      -- (1) Insert a new row into html_documents with an empty CLOB, and
      -- (2) Retrieve the empty CLOB into a variable with RETURNING..INTO
      INSERT INTO clob_docs(id,code,
      VALUES( id, Code,
      RETURNING html_document INTO theCLob;

      -- (3) Get a BFile handle to the external file
      theBFile := BFileName(dir,file);

      -- (4) Open the file

      -- (5) Copy the contents of the BFile into the empty CLOB
      dbms_lob.loadCLOBFromFile(dest_lob => theCLob,
      src_bfile => theBFile,
      amount => dbms_lob.getLength(theBFile),
      DEST_OFFSET => dst_offset,
      SRC_OFFSET => src_offset,
      LANG_CONTEXT => lang_ctx,
      WARNING => warning);

      -- (6) Close the file and commit

      The file is loaded fine, but inverted commas are loaded in the place of unicode characters.

      Example: *¿acc¿¿¿*

      Characterset in database is NLS_CHARACTERSET -->AL32UTF8


      Please help me out to avoid this situation and all the characters to be properly loaded.

        • 1. Re: Unicode Issues
          What character set is used to encode the file? UTF-8? UTF-16? UTF-32? UCS-2? Something else?

          • 2. Re: Unicode Issues
            Hi Justin,

            File is just a text file and it contains UTF-16 characters also.

            • 3. Re: Unicode Issues
              Just to be clear, are you saying that the text file is encoded using the UTF-16 character set? Otherwise, I'm not sure what "UTF-16 character" means to you.

              • 4. Re: Unicode Issues
                Hi Justin,

                Text in the file is copied from a web page or a word document. There is no encoding takes place. It is just a copy and paste in the file and file is saved in db location. After this, using the above mentioned procedure, content in the file is loaded into a CLOB column of a table.

                This content after loading into table contains inverted question marks.

                • 5. Re: Unicode Issues
                  All text files are, by definition, encoded using some character set. You have to know what character set you are using to know how to translate the series of 1's and 0's in the file into a particular character. The same character will have very different binary representations in different character sets.

                  We need to know what character set your file is encoded using to know how to load it into the database properly. If you don't specify, Oracle assumes that the file is encoded using the database character set (AL32UTF8). If that is not the actual character set used to encode the file, you'll get character set translation errors which would result in replacement characters getting stored (those are the question marks that you're seeing).

                  • 6. Re: Unicode Issues
                    Hi Justin,

                    Thanks for your help.
                    We have not been trying to use any encoding feature while inserting the CLOB to the database.
                    As far as our application is concerned we are using UTF-8.
                    Please suggest if we need to set encoding before we insert the data in to the CLOB .
                    Is it something that can be set at database object level (say for example for a table) or at the database level.

                    Would be helpful if you could share the steps for the same.