1 Reply Latest reply: Jun 25, 2013 8:56 AM by odie_63 RSS

    UTF-8 Encoding errors during nightly batch runs

    akasemeyer

      My boss recently tasked me with researching (and hopefully resolving) why our XML frequently has UTF-8 encoding errors.

       

      I've been in the IS world for less than a year now so please bear with me when it comes to terms, data flow, etc.

       

      Overview:
      Our Oracle DB spits out XML for the nightly batch runs into a file location, lets say C:\xPression\CustomerData\Certificate.xml. The XML is in Courier New font but some characters make their way into the XML but arent supported. The big one is the elongated ' - ' character. Just one instance of this and the entire XML fails.

       

      When the batch job is run sometimes there are encoding errors (¿, ¡, -, etc) and every morning I have to come in, finding the invalid character, fix it and have the job re-run.

       

      I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.

        • 1. Re: UTF-8 Encoding errors during nightly batch runs
          odie_63

          I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.

          First thing first, an XML file is a text file, it doesn't have a "font" but an encoding.

          The font is the graphical representation of characters and it is related to whatever client tool you're using to view the content, not to the content itself.

          That being said, a lot of fonts do not support the full range of unicode characters so you may get replacement characters in some case.

           

          We're missing some information to provide an answer :

           

          - what's the database version?

          - what's the character set of the database?

          - how are you generating and writing the XML to the file ? UTL_FILE, dbms_xslprocessor, dbms_xmldom?

           

          If the file is generated using UTF-8 encoding then the issue might just be that you're not using an UTF-8-enable editor.