4 Replies Latest reply: Jul 4, 2013 4:04 AM by user13734057 RSS

    Loading Data into a table from XML files

    user13734057

      Hi ,

       

      I was trying to load data from XML files to an Oracle database table.

      I followed these below steps to load that file data into a table.

       

      Created XML_DIR1 as oracle directory where i have kept all XML files.

       

      Create table import_rpt_xml of xmltype

      xmltype store as binary xml;

       

      insert into import_rpt_xml

      values (

      xmltype (

      bfilename('XML_DIR1','I-Yamanouchi-20040525-501.xml'),

      nls_charset_id('AL32UTF8')

      )

      );

       

      This insert shows below error:

       

      Error starting at line 80 in command:

      insert into import_rpt_xml

      values(

      xmltype(

      bfilename('XML_DIR1', 'I-Yamanouchi-20040525-501.SGM')

      , nls_charset_id('AL32UTF8')

      )

      )

      Error report:

      SQL Error: ORA-31061: XDB error: XML event error

      ORA-19202: Error occurred in XML processing

      In line 69 of orastream:

      LPX-00217: invalid character 142 (U+008E)

       

      I tried to look into my XML and got that it has some Japanese characters in it.

       

      Could anybody help me on this to deal with japanese characters in XML. I don't want to miss those characters.

       

      My databse NLS_CHARACTERSET is 'AL32UTF8'.

       

      My sample XML file looks like this.

       

       

       

       

      <ichicsr lang="ja">

          <ichicsrmessageheader>

            <messagetype>ichicsr</messagetype>

            <messageformatversion>2.1</messageformatversion>

            <messageformatrelease>2.0</messageformatrelease>

            <messagenumb>US-Yamanouchi-W2004050033-4</messagenumb>

            <messagesenderidentifier>Yamanouchi</messagesenderidentifier>

            <messagereceiveridentifier>PMDA</messagereceiveridentifier>

            <messagedateformat>204</messagedateformat>

            <messagedate>20040525100000</messagedate>

          </ichicsrmessageheader>

          <safetyreport>

            <safetyreportversion>4</safetyreportversion>

            <safetyreportid>US-Yamanouchi-2004003040Lip</safetyreportid>

            <primarysourcecountry>US</primarysourcecountry>

            <occurcountry>US</occurcountry>

            <transmissiondateformat>102</transmissiondateformat>

            <transmissiondate>20040525</transmissiondate>

            <reporttype>1</reporttype>

            <serious>1</serious>

            <seriousnessdeath></seriousnessdeath>

            <seriousnesslifethreatening></seriousnesslifethreatening>

            <seriousnesshospitalization>1</seriousnesshospitalization>

            <seriousnessdisabling></seriousnessdisabling>

            <seriousnesscongenitalanomali></seriousnesscongenitalanomali>

            <seriousnessother>1</seriousnessother>

            <receivedateformat>102</receivedateformat>

            <receivedate>20040311</receivedate>

            <receiptdateformat>102</receiptdateformat>

            <receiptdate>20040506</receiptdate>

            <additionaldocument>2</additionaldocument>

            <documentlist></documentlist>

            <fulfillexpeditecriteria>1</fulfillexpeditecriteria>

            <authoritynumb></authoritynumb>

            <companynumb>US-Yamanouchi-2004003040</companynumb>

            <duplicate></duplicate>

            <casenullification></casenullification>

            <nullificationreason></nullificationreason>

            <medicallyconfirm></medicallyconfirm>

            <reportduplicate>

              <duplicatesource></duplicatesource>

              <duplicatenumb></duplicatenumb>

            </reportduplicate>

            <linkedreport>

              <linkreportnumb></linkreportnumb>

            </linkedreport>

            <primarysource>

              <reportertitle></reportertitle>

              <reportergivename></reportergivename>

              <reportermiddlename></reportermiddlename>

              <reporterfamilyname></reporterfamilyname>

              <reporterorganization></reporterorganization>

              <reporterdepartment></reporterdepartment>

              <reporterstreet></reporterstreet>

              <reportercity></reportercity>

              <reporterstate></reporterstate>

              <reporterpostcode></reporterpostcode>

              <reportercountry>US</reportercountry>

              <qualification>1</qualification>

              <literaturereference></literaturereference>

              <studyname></studyname>

              <sponsorstudynumb></sponsorstudynumb>

              <observestudytype></observestudytype>

            </primarysource>

            <sender>

              <sendertype>1</sendertype>

              <senderorganization>Yamanouchi</senderorganization>

              <senderdepartment></senderdepartment>

              <sendertitle>Žæ’÷–ðŽÐ’·</sendertitle>

              <sendergivename>“oˆê</sendergivename>

              <sendermiddlename></sendermiddlename>

              <senderfamilyname>’|’†</senderfamilyname>

              <senderstreetaddress>“ú–{‹´–{’¬2-3-11</senderstreetaddress>

              <sendercity>’†‰›‹æ</sendercity>

              <senderstate>“Œ‹ž“s</senderstate>

              <senderpostcode></senderpostcode>

              <sendercountrycode>JP</sendercountrycode>

              <sendertel>359165493</sendertel>

              <sendertelextension></sendertelextension>

              <sendertelcountrycode>81</sendertelcountrycode>

              <senderfax>359165608</senderfax>

              <senderfaxextension></senderfaxextension>

              <senderfaxcountrycode>81</senderfaxcountrycode>

              <senderemailaddress>sfid_e2b@yamanouchi.co.jp</senderemailaddress>

            </sender>

            <receiver>

              <receivertype>2</receivertype>

              <receiverorganization>“Æ—§&#141;s&#144;–@&#144;lˆã–ò•iˆã—Ë@Ší‘&#141;&#141;‡‹@&#141;\</receiverorganization>

              <receiverdepartment></receiverdepartment>

              <receivertitle>—&#157;Ž–’·</receivertitle>

              <receivergivename>&#143;²</receivergivename>

              <receivermiddlename></receivermiddlename>

              <receiverfamilyname>‹{“‡</receiverfamilyname>

              <receiverstreetaddress>‰à‚ªŠÖ3-3-2</receiverstreetaddress>

              <receivercity>&#144;ç‘ã“c‹æ</receivercity>

              <receiverstate>“Œ‹ž“s</receiverstate>

      ........and so on.

       

      Please help me to crack this problem.

       

      Regards ,

      Vinayak

        • 1. Re: Loading Data into a table from XML files
          odie_63

          Since your database character set is AL32UTF8, you should be able to load any XML-valid character (u008E being one of them).

           

          So I guess the problem is with the real encoding of the input file.

          Could you check that?

          • 2. Re: Loading Data into a table from XML files
            user13734057

            Thanks for this response.

            Yes, i checked my source files which are not UTF-8. When i do keep these files on Japanese system, it displays Japanese characters but on any other system it displays garbage.

            My Oracle Database is on normal windows system hence it reads it as a garbage.

             

            Could you please suggest any method to change these files as UTF-8. I have 30,000 XML files for this loading.

             

            Is there any way to handle this issue in Oracle itself.

             

            Please suggest.

            • 3. Re: Loading Data into a table from XML files
              odie_63

              Yes, i checked my source files which are not UTF-8.

              Ok, so what is this encoding then?

              If you're able to determine that, then you can specify it in the XMLType constructor, instead of 'AL32UTF8'. Oracle will take care of the conversion while loading the file.

              • 4. Re: Loading Data into a table from XML files
                user13734057

                A few files are in ANSI and a few files are in ASCII encoded.Some are Shift-JIS as well.

                 

                I tried to look for a few files only but i doubt that there may be any other encoding as well.

                 

                Is there any option which can handle any source encoding but UTF8 target encoding.

                 

                Is there any option to change all files encoding?

                 

                Please suggest.