This discussion is archived
12 Replies Latest reply: Sep 10, 2007 10:48 AM by 807605 RSS

recursive xml parsing solution - help needed

807605 Newbie
Currently Being Moderated
FYI, I'm not the best with words, so here it goes.

I have an xml document of which I am iterating through the nodes. When I discover a node of a specific type (i.e. <box>) I begin to iterate through it's child nodes.
The problem is, that a <box> element can contain multiple <box> elements as it's children. So a possible structure is:

<box>
<node1>
<box>
<nestedBoxNode>
</box>
<node2>
<box>
<nestedBoxNode2>
</box>
</box>

In the method that handles the <box> element, if it discovers another <box> element as a child, it calls itself.
However, I need to do some processing on each <box> element separately.

My question is, what are some possible methods of determining if a parent<box> element contains other <box> elements as children?

How can I determine when to call my processing functions for each child <box> node?

Forgive me for the vagueness of my questions, this is most detail that I am able to give.

Any help whatsoever is greatly appreciated...
Thanks.
  • 1. Re: recursive xml parsing solution - help needed
    796125 Newbie
    Currently Being Moderated
    This all depends on what technology you are using to work through your xml.

    Are you using SAX? Or are you using DOM? Or are you just using regexs or some other plain text solution to identify tags?

    - Adam
  • 2. Re: recursive xml parsing solution - help needed
    807605 Newbie
    Currently Being Moderated
    I am using jaxb to iterate through each element.
    The elements have been unmarshalled, and I am in the process of marshalling them back into new source code. I am converting one reporting tool's source code into another type of source using jaxb. so a <box> that contains a <box> means that the child <box> needs to be processed first.

    am i making any sense?
  • 3. Re: recursive xml parsing solution - help needed
    807605 Newbie
    Currently Being Moderated
    BTW xerces will do all of this very nicely for you. Is there a reason you don't want to use it?
  • 4. Re: recursive xml parsing solution - help needed
    807605 Newbie
    Currently Being Moderated
    Not specifically, I have just taken over where another developer has left off. For what we are doing, JAXB seems to be working fine. I just don't have the experience/knowledge that I need.

    Thanks though.
  • 5. Re: recursive xml parsing solution - help needed
    796125 Newbie
    Currently Being Moderated
    I am using jaxb to iterate through each element.
    The elements have been unmarshalled, and I am in the
    process of marshalling them back into new source
    code. I am converting one reporting tool's source
    code into another type of source using jaxb. so a
    <box> that contains a <box> means that the child
    <box> needs to be processed first.

    am i making any sense?
    You lost me at "the chlid <box> needs to be processed first". I'm not really sure why you think this is true. I would think that processing the child box would be done as part of processing the parent box. Or are you somehow stripping the heirarchy structure out of the file before you start to process it?

    - Adam
  • 6. Re: recursive xml parsing solution - help needed
    807605 Newbie
    Currently Being Moderated
    Ok, hypothetically a <box> element has these attributes:
    x position
    y position
    height
    width

    If I am to convert a <box> element into the new source equivalent, I need to know if / how many child <box> elements exist. That way I can process their x, y, height, & width before processing the parent elements attributes.

    All of my processing is taking place in a loop. so:

    //some pseudo code involved below//
    ** Note this is inside the method doBoxProcessing(args) { } **
     while (xmlIterator.hasnext()) {
      if (xmlIterator.next().equals(<box>)){
        doBoxProcessing(//args);
      } else if (xmlIterator.next().equals(someOtherElement)){
       doSomeOtherElementProcessing(//args);
     }
    
     }
    Does this help?

    Message was edited by:
    aaaaaaaaaaaaaaaaaaaaaa
  • 7. Re: recursive xml parsing solution - help needed
    796125 Newbie
    Currently Being Moderated
    Okay, so what type of iterator is xmlIterator? Where is it coming from. What kind of objects is it returning? Knowing what type of object you have may help figure out how to find information about its children.

    - Adam
  • 8. Re: recursive xml parsing solution - help needed
    807605 Newbie
    Currently Being Moderated
    does your API support XPath? Maybe you can query for all child nodes with <box> tag?
  • 9. Re: recursive xml parsing solution - help needed
    807605 Newbie
    Currently Being Moderated
    Yes, I understand this.

    xmlIterator is a List Iterator, that contains objects of type (**original source). I need to gather info from the original source code's child elements to build the generated source code.
    The child elements are all contained within the parents attribute values. In other words, a child <box> element's x and y values must be within the parents x and y values.

    <box>
    <x = 10>
    <y = 20>
    <height = 100>
    <width = 200>
    <box>
    <x = 5>
    <y = 10>
    <height = 50>
    <width = 100>
    </box>
    <box>
    <x = 2>
    <y = 10>
    <height = 20>
    <width = 80>
    </box>
    </box>

    I know that I am not making this easy by any means, but there is reason behind the madness.

    Before we go any further, I want to say thanks. Also, copy and paste, this forum doesn't reflect the indentation of my examples...
    -----------------------------

    **I cannot reveal this info :~ (
  • 10. Re: recursive xml parsing solution - help needed
    796125 Newbie
    Currently Being Moderated
    **I cannot reveal this info :~ (
    Fair enough, but this makes it hard to help you out.

    So you're problem is that you don't know how to find out if a parent contains a child <box> element?

    What about can you find out if a child lives inside a parent box element?

    For example, you have structure like this:
    <box id="parent">
      <[!CDATA[ some sort of data here ]]>
      <box id="child"/>
      <[!CDATA[ and maybe some more data here ]]>
    </box>
    So anyway, you're problem seems to be identifying whether "parent" has any children that are boxes. How about this: with the system you have, can you identify if "child" is inside a <box> tag -- can you find out from "child" that it is a child of "parent"?

    - Adam
  • 11. Re: recursive xml parsing solution - help needed
    807605 Newbie
    Currently Being Moderated
    This is possible, but only on a one by one basis. Each <box> element has a "name" attribute. (in reality) However, the naming isn't consistent. Therefore, I could hard code for a specific source, but that defeats the entire purpose of our tool. We want to be able to take the source code of any existing report on our system, run it through our utility, and see the same output in another reporting tool. (Doesn't have to be exact, but the majority needs to be similar)

    The fact that we only have one function to do the processing on all of the <box> elements really limits us in this.

    I had originally used a boolean value to flag when a child is found within the parent element, but the declaration gets re-initialized each time a new child is found. (because of the recursion)

    I have also tried incrementing a counter each time a new child element is found, but once again the recursion catches me everytime.

    I am sure that there is a simpler solution, I will keep investigating, thanks again for your input. You've given me a better perspective of what I need to focus on.

    Thanks!
  • 12. Re: recursive xml parsing solution - help needed
    796125 Newbie
    Currently Being Moderated
    Why not run a two-pass algorithm.

    First, just interate over the nodes, and when you find out that one box is the child of another box, add that to a HashMap (or something similar), using the PARENT as the key (not the child).

    Then, iterate over the document again, and when you get to a box element, retrieve the child information from the map -- if its not null, you will know that it has children. Then, as you process those children, add them to a black-list, which tells you not to reprocess them as they come up in the iterator (actually, they technically already exist in map.values(), which could be used as your black-list).

    - Adam