1 2 Previous Next 17 Replies Latest reply on Oct 21, 2005 3:56 PM by 119014 Go to original post
      • 15. Re: Problem with XmlSerialize on Clob
        what is the basic character encoding of the CLOB on the db? Is it UTF-8? I would like to encode to that but it gives a Unicode error?

        Is OracleClob always Unicode? I don't need for my text to be Unicode.
        • 16. Re: Problem with XmlSerialize on Clob
          Here's a short excerpt from Chapter 6 (Working with Large Objects) of my book "Pro .NET Oracle Programming (Apress, 2004)":

          The CLOB Large Object Type
          You use the CLOB large object type to store text (or string) data, typically using the database’s
          character set as the encoding character set. The CLOB data type is an internal data type, meaning
          the data is stored in the database rather than on the filesystem as is the case with a BFILE.
          The CLOB data is stored in the database in a fixed-width format. If the database character set is
          a multi byte character set (a requirement for a varying-width character set), then CLOB data is
          stored internally in a format known as UCS-2.
          The UCS-2 format is a 16-bit, fixed-width format. If the database character set is a single byte
          character set (i.e., non-varying width) then the data is stored using that fixed-width
          character set. Typically, you’ll use the CLOB data type when the length of the stored data
          exceeds 4,000 bytes—the limit of the varchar2 data type. The CLOB data type is a read-write
          data type that may also participate in database transactions.
          Here's the sample code we've been working with using Encoding.ASCII:
          using System;
          using System.Xml;
          using System.Xml.Serialization;
          using System.Text;
          using System.IO;
          using System.Data;
          using Oracle.DataAccess.Client;
          using Oracle.DataAccess.Types;
          namespace OracleClobTest1
            public class Class1
              /// <summary>
              /// The main entry point for the application.
              /// </summary>
              static void Main(string[] args)
                OracleConnection con=new OracleConnection("User Id=/");
                  OracleClob clob=new OracleClob(con);
                  MemoryStream ms=new MemoryStream();
                  XmlSerializer xs=new XmlSerializer(typeof(Class1));
                  Class1 a=new Class1(1,2,3.4d);
                  XmlTextWriter xtwMs=new XmlTextWriter(ms,Encoding.ASCII);
                  StreamReader sr=new StreamReader(ms,Encoding.ASCII);
                  char[] charArray = sr.ReadToEnd().ToCharArray();
                  clob.Write(charArray, 0, charArray.Length);
                catch(OracleException oe)
                catch(Exception ex)
              public Class1()
              public Class1(int X,int Y,double Z)
              int _x;
              int _y;
              double _z;
              public int X
                get{return _x;}
              public int Y
                get{return _y;}
              public double Z
                get{return _z;}
          Using the ASCII encoding and examining the "ms" object in the locals window you can see that the byte order marker is now gone and the data is stored using single-byte semantics.

          So, a question might be when we were using the "xs.Serialize(xtwClob,a);" technique why the byte order marker was showing up in the CLOB rather than being interpreted as metadata.

          The OracleClob.Write method is overloaded. One version accepts a byte array and the other version accepts a character array. When the "xs.Serialize(xtwClob,a);" is executed, it invokes the version of OracleClob.Write that takes a byte array as a parameter. That is why the byte order marker is injected into the stream "raw" or uninterpreted.

          Internally the data in the CLOB is stored using Unicode. Even when you use the ASCII encoding, reading the "raw" bytes from the CLOB using code like this
          clob.Position = 0;
          byte[] b = new byte[2];
          clob.Read(b, 0, 2);
          shows it is stored using Unicode semantics.

          I think there is alot of potential for confusion digging through all this in the trenches. Plus this is my interpretation which could very well be wrong. If anyone finds any errors, please feel free to correct!

          But, the way I see it, in order to avoid the byte order marker being injected into the stream underlying the CLOB, you need to use the OracleClob.Write method with a character array rather than a byte array. Maybe there is a way to utilize the "xs.Serialize(xtwClob,a);" technique and force the invocation of the OracleClob.Write method that accepts a character array, but I have not discovered a way to do it.

          - Mark
          • 17. Re: Problem with XmlSerialize on Clob
            I think that nails it exactly.

            Use this technique requires an extra memory stream and an extra character array which for large objects or objects called frequently can make a difference.

            Thanks for helping track this down.
            1 2 Previous Next