6 Replies Latest reply: Mar 10, 2012 9:52 AM by 922980 RSS

    Simplest possible use of JAXB  unmarshal convenience method

    922980
      The JAXB API describes the arguments and return type in such a way that I can't devise a simple concrete call that will unmarshal my file.
      So I'm looking for a one line answer with actual arguments and types.
      Here's the API and my tiny XML example file:

      Arguments for:
      unmarshal

      public static <T> T unmarshal(File xml, Class<T> type)

      Reads in a Java object tree from the given XML input.
      Parameters:
      xml - Reads the entire file as XML.

      My XML file:

      <?xml version="1.0" encoding="UTF-8"?>
      <note>
      <to>Flopsy</to>
      <from>Mopsy</from>
      <heading>Reminder</heading>
      <body>Get your Easter Bunny outfit ready.</body>
      </note>

      I know some might suggest that I study XML, JAXB, Generics, and the like until I understand them all.
      But life is short and I need to get on with my work in natural language processing.
      I'm using approx. 100,000 articles in XML format from BioMed Central. So XML is important.
      This unmarshal issue is a tiny roadblock on the way to continuing my research.

      Thanks!
        • 1. Re: Simplest possible use of JAXB  unmarshal convenience method
          EJP
          You can only use JAXB to unmarshall an XML file if the XML file was produced in accordance with how JAXB would have written it.

          Your file doesn't appear to do that.
          • 2. Re: Simplest possible use of JAXB  unmarshal convenience method
            922980
            Well that's that.

            Thanks for your straightforward answer.

            I'll return to my tried and true methods and forget about JAXB.
            • 3. Re: Simplest possible use of JAXB  unmarshal convenience method
              r035198x
              I don't see why you think that you can't use JAXB here.

              Your one liner is simply
              Note note = JAXB.unmarshal(new File("D:\\note.xml"), Note.class);
              All you need to do is create/generate the Note.java class according to the XML structure you have and JAXB will marshal/unmarshal it.
              • 4. Re: Simplest possible use of JAXB  unmarshal convenience method
                922980
                For the documents I work with, I would have to create a Doc.java class corresponding to an 800+ line XML schema.
                [http://www.biomedcentral.com/about/xml]

                Here are a few lines from it:
                     <xs:element name="fig">
                          <xs:complexType>
                               <xs:sequence>
                                    <xs:element ref="title"/>
                                    <xs:element ref="caption" minOccurs="0"/>
                                    <xs:element ref="text" minOccurs="0"/>
                                    <xs:element ref="graphic"/>
                               </xs:sequence>
                               <xs:attribute name="id" type="xs:ID" use="required"/>
                          </xs:complexType>
                     </xs:element>

                Looks daunting. But if there's a tool for this, I'd love to hear about it.

                Currently, my methods for dealing with XMLs consist of many normalization steps that reduce the XML to plaintext, no markup, so that we can apply standard natural language processing operations to it (part-of-speech tagging, parsing). The plaintext preserves the basic semantic content of the original. That's what's needed for knowledge extraction. Our techniques are specialized ones focused on normalization. They lie outside the realm of the standard XML world. We have to draw outside the lines.

                We did try JAXB in some 2002 work we published and earlier we developed our own techniques for marked-up text (SGML), published in 1991. But it looks like JAXB and today's many other XML tools can't magically replace the insights and techniques that have taken my students and I a couple of decades to develop.

                See [http://www.ccs.neu.edu/home/futrelle/research37/naturalLanguage.html] and the PDF linked to there, though it describes our thinking three or four years ago. We've moved on.
                • 5. Re: Simplest possible use of JAXB  unmarshal convenience method
                  jtahlborn
                  919977 wrote:
                  Looks daunting. But if there's a tool for this, I'd love to hear about it.
                  seriously, did you do any research? i'm sure even a basic jaxb tutorial would mention how to use xjc to generate java classes from XML Schema. (I understand that the subjects you mentioned are all potentially very large, but doing some intro reading on each one of them would have answered your questions far faster than posting to a forum and waiting for all the answers to come back to you).
                  • 6. Re: Simplest possible use of JAXB  unmarshal convenience method
                    922980
                    I've done a lot research, 60 or so publications, including the SGML work of 20+ years ago, but none of it seriously devoted to XML.

                    XML is a system for hierarchical, in-line markup. Heavily marked-up natural language is nearly impossible to read and understand. One word, with full syntactic and semantic markup can occupy an entire line.

                    So I've gone back to basics: Natural language text is a transcription of the ultimate language origin, speech, a linear phenomenon. I now treat text as linear with whatever structures needed added by stand-off markup (in a way which I think is far superior to another major stand-off markup scheme, the one in the ANC, the American National Corpus.) My tools are normalization, inverted indexes, and metadata.

                    Chomsky brought about a half-century of work viewing language as hierarchical. Chomsky got many of his earliest ideas from the formal language research in the RLE at MIT in the 1950s. He merely hypothesized (insisted) that natural language could be treated in the same way. Thousands fell under his hypnotic spell. Contemporary research in cognitive science and the rise of Construction Grammar have steadily driven out many of Chomsky's notions, if not flatly disproved them.

                    This thread, and even this entire forum is not the place to discuss issues of human language and cognition. Sorry if I've intruded. I've learned some things and probably annoyed many of you. Let's drop it, or do it offline (via gmail).

                    Nevertheless, thanks for the xjc pointer [sic]. I'll probably give it look.