8 Replies Latest reply: Mar 22, 2010 4:02 PM by JoachimSauer RSS

    Parsing XML: getElementsByTagName trouble

    807580
      Hi, I'm trying to parse XML. However, I have some trouble with getElementsByTagName because it flattens my structure.

      For instance, say I have the following XML structure:
      <Expression>
         <Operand>
            <Operand>3</Operand>
            <Operator>plus</Operator>
            <Operand>5</Operand>
         </Operand>
         <Operator>times</Operator>
         <Operand>2</Operand>
      </Expression>
      If I do getElementsByTagName("Operand") on the Element that contains "Expression", it will return the first Operand (the one that contains the "3+5" sub expression) and the Operand with the value 2, but also the Operand with the value 3 and the one with the value 5. Is there any way to get only the direct childs? Because as is, I'm basically losing my whole hierarchy unless I jump through some hoops. (I'd basically have to open the big operand, compare the operands in there with the operands I got previously and remove the duplicates).
        • 1. Re: Parsing XML: getElementsByTagName trouble
          JoachimSauer
          1.) Are you actually talking about Java or are you talking about JavaScript? If you're talking about JavaScript, then I have to inform you that it's an entirely different thing than Java and that you're in the wrong forum
          2.) What code did you use? Could you post it? What API are you using?
          • 2. Re: Parsing XML: getElementsByTagName trouble
            DrClap
            PhilipeB wrote:
            Hi, I'm trying to parse XML. However, I have some trouble with getElementsByTagName because it flattens my structure.
            Yes, it ignores your structure. So if you don't want to do that, don't use it.
            If I do getElementsByTagName("Operand") on the Element that contains "Expression", it will return the first Operand (the one that contains the "3+5" sub expression) and the Operand with the value 2, but also the Operand with the value 3 and the one with the value 5. Is there any way to get only the direct childs?
            Yes... the getChildNodes() method. Did you not notice that in the API documentation?

            (I'm assuming here that you are talking about the org.w3c.dom package, but if this is a Javascript question or a question about some other API, they are all going to have a method or function which returns the children of a node. I guarantee you the designers wouldn't have left out that.)
            • 3. Re: Parsing XML: getElementsByTagName trouble
              807580
              Yes, I am talking about Java. Sorry I didn't make this clear. (I would have expected posting this in a Java forum would have been enough of a clue)
              DrClap wrote:
              Yes... the getChildNodes() method. Did you not notice that in the API documentation?

              (I'm assuming here that you are talking about the org.w3c.dom package, but if this is a Javascript question or a question about some other API, they are all going to have a method or function which returns the children of a node. I guarantee you the designers wouldn't have left out that.)
              Yes, I am talking about the org.w3c.dom package... that I really should have made clear. I'm sorry for that mistake.

              Yes, I did notice the getChildNodes() method... however, when I try to cast it, I get an error and I don't seem to be able to figure out how to make it work.

              For instance, I can do:
              public void parse(Element root)
              {
                 NodeList elements = root.getElementsByTagName("*");
                 for(int i = 0; i < elements.getLength(); ++i)
                 {
                     System.out.println("tag name is " + (Element)elements.item(i)).getTagName());
                     System.out.println("value is " + elements.item(i).getFirstChild().getNodeValue());
                 }
              }
              This will work fine (other than the problem I mentioned earlier)
              However, if I replace the getElementsByTagName("*") by getChildNodes(), the first print line will cause an exception because of the cast and the second one will throw a nullPointerException, probably because of the "getFirstChild(). If I replace the second print line by:
              System.out.println("value is " + elements.item(i).getNodeValue());
              It just doesn't print anything.
              I
              • 4. Re: Parsing XML: getElementsByTagName trouble
                DrClap
                Well, the first child of your <Expression> element is a whitespace text node. So yes, you can't cast that to an Element, and yes, when you print it you won't see anything.
                • 5. Re: Parsing XML: getElementsByTagName trouble
                  807580
                  Ah, yes, that would be the problem. Thank you.

                  On a side note, why is there a whitespace text node there?

                  Edited by: PhilipeB on Mar 22, 2010 1:01 PM
                  • 6. Re: Parsing XML: getElementsByTagName trouble
                    DrClap
                    PhilipeB wrote:
                    Ah, yes, that would be the problem. Thank you.

                    On a side note, why is there a whitespace text node there?
                    You would have to ask the person who created the document why they put that whitespace there. Typically people do that so that the document is readable by humans.
                    • 7. Re: Parsing XML: getElementsByTagName trouble
                      807580
                      I ment... why doesn't the parser ignore those whitespaces.
                      • 8. Re: Parsing XML: getElementsByTagName trouble
                        JoachimSauer
                        PhilipeB wrote:
                        I ment... why doesn't the parser ignore those whitespaces.
                        Why should it?