This discussion is archived
1 2 Previous Next 23 Replies Latest reply: May 26, 2010 9:22 AM by thomas.behr RSS

Javadoc Comments Begin and End Offset, Line number etc...

843810 Newbie
Currently Being Moderated
Hello Everyone,

I was wondering if there is a method that return the begin and end offsets, and line number of source code comment. I know this is possible to get the line number for Doc, MethodDoc, ConstutorDoc etc... with the position() method. I was unable to get the locations of comments corresponding to the actual source code. Also, I found that when asking for the position of constructors the value returned was that of the class, is this a known bug?

Thanks in Advanced,

Nick
  • 1. Re: Javadoc Comments Begin and End Offset, Line number etc...
    843810 Newbie
    Currently Being Moderated
    Can anyone please help with this?

    Thanks in advanced.
  • 2. Re: Javadoc Comments Begin and End Offset, Line number etc...
    thomas.behr Newbie
    Currently Being Moderated
    nkhamis wrote:
    I was unable to get the locations of comments corresponding to the actual source code.
    What is a comment "corresponding to the actual source code"?

    For what purpose do you need offset and line number?
  • 3. Re: Javadoc Comments Begin and End Offset, Line number etc...
    843810 Newbie
    Currently Being Moderated
    Hello Thomas,

    I apologize for the was referring to javadoc comments. I would like the line number, for the javadoc comments just like we have line numbers for all the identifiers (class, method, field etc...) using the position() method. I need to know where in the java file is the comment (line number) and if possible begin and end offsets of the comments.

    Thanks in Advanced,

    Ninus
  • 4. Re: Javadoc Comments Begin and End Offset, Line number etc...
    843810 Newbie
    Currently Being Moderated
    Ninus,

    As you are quite aware of Doclet API, I have to say that as much as you can see in it, so much you can do about your Java code using Javadoc in general.

    The Doclet API does provide some information about the Java code. But that information is far incomplete. Ultimately, you need to parse the Java code. Javadoc does use the standard Java parser. But the Doclet API is not an API to that parser!

    (You may wonder, why they didn't provide the API to their parser, after all? For that, there may be numerous organizational, commercial and marketing reasons. When you run a software enterprise, it is not your goal to fulfil every imaginable request of any imaginable user. Your primary goal is to keep that enterprise running and meet ends at that.)

    See also my response:
    [Javadoc Tool - Call Hierarchy Documentation|http://forums.sun.com/thread.jspa?threadID=5433793]

    Regards,

    Leonid Rudy
    [http://www.docflex.com|http://www.docflex.com]
  • 5. Re: Javadoc Comments Begin and End Offset, Line number etc...
    thomas.behr Newbie
    Currently Being Moderated
    nkhamis wrote:
    I apologize for the was referring to javadoc comments.
    Aha, ok.
    nkhamis wrote:
    I would like the line number, for the javadoc comments just like we have line numbers for all the identifiers (class, method, field etc...) using the position() method. I need to know where in the java file is the comment (line number) and if possible begin and end offsets of the comments.
    As already explained, the Doclet API does not provide that information. However, you still have not answered the question why you need it. If you tell us, maybe we can help you find an alternative way to achieve your desired goal.
  • 6. Re: Javadoc Comments Begin and End Offset, Line number etc...
    843810 Newbie
    Currently Being Moderated
    Hey Leon and Thomas,

    Thank you so much for your responses. I understand that Javadoc and the Doclet API provide services that are concern separated (i.e. not a complete java code parser). Just out of curiosity, what do the folks at sun use for Java Parsers? I found that the doclet API is really efficient and performs very well. We are actually trying to get as much as possible out of it to generate XML outputs that use a specialized schema for NLP services. The JavadocMiner is an NLP service that assess the quality of javadoc comments using a set of heuristics. We are currently in the process of creating the client application (eclipse plugin) that will return the results generated by the JavadocMiner to the user. Overall architecture:

    Javadoc Generated Corpus
    |
    V
    NLP Service
    |
    V
    Eclipse Plugin

    The whole project is open source, the doclet site is up and can be accessed at http://www.semanticsoftware.info/owlexporter, the paper regarding the JavadocMiner can be accessed here "http://www.rene-witte.net/publications"

    Part of our Javadoc assessment does include the guidelines set by Sun Microsystems in "How to Write Javadoc Comments". The JavadocMiner and Eclipse plugin site will be up very shortly.

    Which brings me to Thomas' question. We need the line number(s) and possibly the start and end locations of the comments in order to map the comments in the source code file to the comments analyzed by the JavadocMiner, for the eclipse plugin.

    Thanks in Advanced,

    Ninus.

    Edited by: nkhamis on Jun 1, 2010 7:12 AM

    Edited by: nkhamis on Jun 1, 2010 7:13 AM

    Edited by: nkhamis on Jun 1, 2010 7:15 AM
  • 7. Re: Javadoc Comments Begin and End Offset, Line number etc...
    thomas.behr Newbie
    Currently Being Moderated
    Well, if you are developing an Eclipse plugin, have you considered working with Eclipse' Abstract Syntax Tree (basically an object model for Java code) instead of the Doclet API? Although AST is quite more heavyweight than the Doclet API, it does provide access to line numbers (and, of course, it does include documentation comments).
  • 8. Re: Javadoc Comments Begin and End Offset, Line number etc...
    thomas.behr Newbie
    Currently Being Moderated
    Alternatively, you could try to establish the mapping you need using the name (and signature) of the documented member instead of using line numbers.
  • 9. Re: Javadoc Comments Begin and End Offset, Line number etc...
    843810 Newbie
    Currently Being Moderated
    nkhamis wrote:
    Just out of curiosity, what do the folks at sun use for Java Parsers?
    They didn't need to go too far. The Java Parser must be a part of any Java compiler itself! Therefore, it is a part of any JDK. Obviously, they have some internal API of their Java parser and, I guess, Javadoc uses it to build the Java code model provided via the Doclet API. But Sun had never published the full API to their Java parser. (At least, I don't know such.) Although, it would be very useful indeed...

    FYI, a parser for a high-level programming language (like Java) is a software module that inputs a program written (by humans) as text in that language and converts it into some object representation (model) "understandable" to computer. That object representation must be a sort of tree representing all the language constructs found in the program. Typically, that tree is accessible via a certain API, which I call here a Parser API (although, other terminology may be used).

    Other software modules may use that object representation of the program for various purposes, for instance:

    - To generate an executable machine (or byte-) code by it
    - To transform the initial program in order to optimize it (this is used by compilers) or to do other things (e.g. refactoring)
    - To generate documentation (that's what Javadoc does)

    Besides generating the program object representation, another job of a parser is to provide diagnostics about various syntactical errors found in the initial program source (a well as possibly some semantic warnings).

    A parser is rather sophisticated piece of code. To develop it properly, one needs to learn first some theory about this, e.g. a formal grammar that describes the given programming language and how to deal with it. I know, there have been attempts to create some universal parsers that can be adjusted to parse any programming language (within a certain range) by specifying to them some formal grammar of that language. That topic was particularly beloved by university guys, so there must be many scientific articles about this.
    thomas.behr wrote:
    Well, if you are developing an Eclipse plugin, have you considered working with Eclipse' Abstract Syntax Tree (basically an object model for Java code) instead of the Doclet API?
    Interesting stuff... Looks like this might be that very Java parser API I am talking about here.

    Leonid Rudy
    [http://www.docflex.com|http://www.docflex.com]
  • 10. Re: Javadoc Comments Begin and End Offset, Line number etc...
    EJP Guru
    Currently Being Moderated
    Most of that is very off topic, but:
    FYI, a parser for a high-level programming language (like Java) is a software module that inputs a program written (by humans) as text in that language and converts it into some object representation (model) "understandable" to computer.
    Not really. A parser is just the recognizer part. Many parsers don't build any representation at all other than what's implied by the state of the call stack.
    I know, there have been attempts to create some universal parsers that can be adjusted to parse any programming language (within a certain range) by specifying to them some formal grammar of that language.
    'Attempts'? This has been going on since the Floyd Production Language (1961), and full-scale since Don Knuth's LR(1) paper (1965). What would you call yacc (1979)? Bison? Antlr? JavaCC? 'Attempts'?
  • 11. Re: Javadoc Comments Begin and End Offset, Line number etc...
    843810 Newbie
    Currently Being Moderated
    ejp wrote:
    Not really. A parser is just the recognizer part. Many parsers don't build any representation at all other than what's implied by the state of the call stack.
    Yeah, right. That's like 'org.xml.sax' and 'org.w3c.dom' for XML. One extends other, but to me the whole thing is just an XML parser.

    Have to say, I'm not exactly professional in the parser stuff. But for those who are, I guess, there are lots of subtleties here with corresponding terminology.

    I only wish there would be "DOM" for the Java code, which would be produced by some Java "parser". That would greatly help to solve many problems discussed here...
    ejp wrote:
    'Attempts'? This has been going on since the Floyd Production Language (1961), and full-scale since Don Knuth's LR(1) paper (1965). What would you call yacc (1979)? Bison? Antlr? JavaCC? 'Attempts'?
    Sure. That's what I meant! These are not attempts in the sense they existed only in scientific papers. But anyway, I guess, this is rather fringe stuff few people know about (let alone, use it).
  • 12. Re: Javadoc Comments Begin and End Offset, Line number etc...
    EJP Guru
    Currently Being Moderated
    That's like 'org.xml.sax' and 'org.w3c.dom' for XML. One extends other, but to me the whole thing is just an XML parser.
    And only one of them creates an object model. QED.
    Have to say, I'm not exactly professional in the parser stuff. But for those who are, I guess, there are lots of subtleties here with corresponding terminology.
    You introduced the terminology here. It was off topic, and it was also incorrect. You have to expect to be corrected about that. There are plenty of people in these forums who can provide correct advice about parsing and compiler technology, when it is needed. It wasn't needed here.
    I only wish there would be "DOM" for the Java code, which would be produced by some Java "parser".
    There is. See JavaCC.
    That's what I meant! These are not attempts in the sense they existed only in scientific papers.
    But it's not what you said. You said 'attempts'. These are not 'attempts' at all. These are all mature software products, very widely used.
    But anyway, I guess, this is rather fringe stuff few people know about (let alone, use it).
    It is fringe to this discussion. There are plenty of compiler writers in the world and even more other people who use this technology without actually being compiler writers. Perhaps you personally don't know much about it. That's not the same thing.
  • 13. Re: Javadoc Comments Begin and End Offset, Line number etc...
    843810 Newbie
    Currently Being Moderated
    ejb,

    You provided us with a good information and that's fine. But I think you are a little bit aggressive about all this.

    I think, this forum is not the same as a scientific symposium or scientific article. If I wrote one, I would spend additional time to verify specifically every term used exactly to fend attacks from such "terminology warriors" like you. (In fact I have seen enough of such guys able to suppress and eventually kill any good discussion, idea or initiative only to prove they are more intelligent/wiser than others).
    You introduced the terminology here.
    I did not introduce a terminology here. At least, it wasn't my goal. I just wanted to help the guy who wrote:
    Just out of curiosity, what do the folks at sun use for Java Parsers?
    That was the response to my writing and to me it sounded like he is not sure what a Java parser actually is. I just tried to explain him what I know about this.
    There are plenty of people in these forums who can provide correct advice about parsing and compiler technology, when it is needed.
    Maybe there are plenty of them, but I don't see much comments coming from those people.

    Instead, what I typically see is a squabble like in "[javadoc bug or missing an option for class extending AbstractTableModel|http://forums.sun.com/thread.jspa?threadID=5439519]" coming from the guys like you.
  • 14. Re: Javadoc Comments Begin and End Offset, Line number etc...
    EJP Guru
    Currently Being Moderated
    Oh come off it. I'm one of those experts myself. You posted some misinformation and I corrected it. OK?
1 2 Previous Next