6 Replies Latest reply: May 18, 2007 8:55 AM by 807606 RSS

    Can't parse XML data

    807606
      I'm receiving an XML document over a TCP socket, I then instantiate an instance of DOMParser and attempt to parse the data. Here's the exception:

      org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x0) was found in markup after the end of the element content.
           at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1213)
           at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:588)
           at org.apache.xerces.framework.XMLDocumentScanner$TrailingMiscDispatcher.dispatch(XMLDocumentScanner.java:1461)
           at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
           at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1098)

      Here's the relevant snippet(s) of source (there is code to write out the socket, but I didn't include that)
      Socket socket = new Socket( hostIP, hostPort );
      OutputStream outStream = socket.getOutputStream();
      InputStream inStream = socket.getInputStream();
      DataInputStream dataInStream = new DataInputStream( inStream );
      byte[] inByteArray  = new byte[ 2048 ];
      int length =  dataInStream.read( inByteArray );
      InputStream byteData  =  new ByteArrayInputStream( inByteArray );
      
      try 
      {
             DOMParser dp = new DOMParser();
             dp.parse( new InputSource( byteData ));
             Document doc = dp.getDocument();
      }
      catch ( Exception e )
      {
              e.printStackTrace();
              System.exit( 1 );
      }
      Is there a different way to do this? I even tried creating a new String based on the length read from the socket, minus one. The parser then saw that the final angle bracket of my root element was missing, so it doesn't seem to be an encoding issue.

      Any help would be appreciated.
      Jeff
        • 1. Re: Can't parse XML data
          jschellSomeoneStoleMyAlias
          An invalid XML character (Unicode: 0x0)
          The XML is corrupt.
          ...saw that the final angle bracket of my root element was missing,
          After you subtracted one from the length - that would suggest that not subtracting one would be a good idea.

          I would suppose that the real problem here has nothing to do with XML nor DOM but rather that you are not correctly retreiving the data from the socket.
          • 2. Re: Can't parse XML data
            807606
            Did you try creating a new String based on the length read from the socket, without subtracting one?
            • 3. Re: Can't parse XML data
              807606
              An invalid XML character (Unicode: 0x0)
              The XML is corrupt.
              ...saw that the final angle bracket of my root
              element was missing,

              After you subtracted one from the length - that would
              suggest that not subtracting one would be a good
              idea.
              Yep. That was a test to verify that the data was in tact and not null terminated. To doubly check, I took an Ethereal trace of the wire, and there was no null. If I convert the byte[] to a String, everything is fine. Unfortunately,
              there wasn't a parser method that took the XML document as a String.

              >
              I would suppose that the real problem here has
              nothing to do with XML nor DOM but rather that you
              are not correctly retreiving the data from the socket.
              Data is fine from the socket, I just had to implement my own getAttribute() and getElementContent, and use brute force. I just thought using an already written parser made more sense.

              Thanks for the reply.
              • 4. Re: Can't parse XML data
                807606
                Did you try creating a new String based on the length
                read from the socket, without subtracting one?
                Yes. I still have to convert from a String to an InputSource. I must be something in the conversion?
                • 5. Re: Can't parse XML data
                  807606
                  I suspect the DataInputStream is reading less than 2048 bytes, leaving the unused array elements at their default value, zero. Then the ByteArrayInputStream is passing those zeroes along to the parser. What I'm wondering is why you need either of those things? Why not construct the InputSource directly from the socket's InputStream?
                  • 6. Re: Can't parse XML data
                    807606
                    I suspect the DataInputStream is reading less than
                    2048 bytes, leaving the unused array elements at
                    their default value, zero. Then the
                    ByteArrayInputStream is passing those zeroes along to
                    the parser.
                    Thanks, I didn't consider that.

                    What I'm wondering is why you need
                    either of those things? Why not construct the
                    InputSource directly from the socket's InputStream?
                    I did look at, but there's a binary application-level transport protocol wrapping the XML data, as well, I need to handle other binary transactions.

                    Thanks for the input and responses.