Forum Stats

  • 3,855,570 Users
  • 2,264,523 Discussions
  • 7,906,069 Comments

Discussions

Context-free LR(k) parsing of generics?

843793
843793 Member Posts: 41,732 Green Ribbon
edited Apr 29, 2003 2:17AM in Generics
Has anyone been able to develop a syntax-free LR(k) (for finite k) grammar for parsing generics? The proposed grammar in spec8.pdf is hideous, as it has several ambiguities and even some typos. The two biggest problems I have in a context-free parser is deciding whether >> should be one or two tokens, and whether a<b starts a parameterized type name or is a relational expression. I know these two problems can be solved by writing a context-sensitive parser, but I would prefer not to do that if possible. I assume (without looking at the sources, because I'm trying to do a clean-room implementation) that the prototype compiler is using a context-sensitive parser.

I also am interested if any of the JSR folks would be willing to let the public know where the current syntax of the JSR or Java 1.5 differs from the ideas in the public draft. Remember the last minute change of the assert syntax (from "assert(a, b)" to "assert a:b"), 11 months after the public draft closed, made in order to acheive LR(1) parsing?

I've come close to writing an LR(2) grammar that gets everything except for casts. And I may even be able to trim that to LR(1) without too much headache. The problem with casts, however, is shown below:

Suppose an LR(2) parser is at "a" on this sequence:
class A { void m() { Object o = ( a < b

We could reduce "a" to ClassOrInterface, and ultimately have a cast expression, as in:
class A { void m() { Object o = ( a < b > ) null; } }

CastExpression : ( ReferenceType ) UnaryExpressionNotPlusMinus
ReferencType : ClassOrInterfaceType
ClassOrIterfaceType : ClassOrInterface TypeArguments
ClassOrInterface : Identifier(a)
TypeArguments : < ReferenceTypeList >
ReferenceTypeList : ReferenceType
ReferenceType : ClassOrInterfaceType
ClassOrInterfaceType : ClassOrInterface
ClassOrInterface : Identifier(b)

Or we could reduce "a" to Name, and ultimately have some expression, as in:
class A { void m() { Object o = ( a < b ) + ""; } }

AdditiveExpression : AdditiveExpression + MultiplicativeExpression
AdditiveExpression : MultiplicativeExpression
MultiplicativeExpression : UnaryExpression
UnaryExpression : UnaryExpressionNotPlusMinus
UnaryExpressionNotPlusMinus : PostfixExpression
PostfixExpression : Primary
Primary : ( Expression )
Expression : AssignmentExpression
AssignmentExpression : ConditionalExpression
ConditionalExpression : ConditionalOrExpression
ConditionalOrExpression : ConditionalAndExpression
ConditionalAndExpression : InclusiveOrExpression
InclusiveOrExpression : ExclusiveOrExpression
ExclusiveOrExpression : AndExpression
AndExpression : EqualityExpression
EqualityExpression : RelationalExpression
RelationalExpression : RelationalExpression < ShiftExpression
RelationalExpression : ShiftExpression
ShiftExpression : AdditiveExpression
AdditiveExpression : MultiplicativeExpression
MultiplicativeExpression : UnaryExpression
UnaryExpression : UnaryExpressionNotPlusMinus
UnaryExpressionNotPlusMinus : PostfixExpression
PostfixExpression : Name
Name : Identifier(a)
ShiftExpression : AdditiveExpression
AdditiveExpression : MultiplicativeExpression
MultiplicativeExpression : UnaryExpression
UnaryExpression : UnaryExpressionNotPlusMinus
UnaryExpressionNotPlusMinus : PostfixExpression
PostfixExpression : Name
Name : Identifier(b)

And since an infinite number of tokens can appear in both ReferenceTypeList and ShiftExpression, there is no finite way to determine which reduction to choose.

Comments

  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    Has anyone been able to develop a syntax-free LR(k)
    (for finite k) grammar for parsing generics? The
    proposed grammar in spec8.pdf is hideous, as it has
    several ambiguities and even some typos. The two
    biggest problems I have in a context-free parser is
    deciding whether >> should be one or two tokens,
    This question is explicitly handled in the spec. It is always
    one token.

    See http://www.cs.princeton.edu/~appel/modern/java/CUP/
    for how Scott Ananian handled things.
  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    I am not interested in a parser generator targetting Java (I'm working with the Jikes compiler, where the parser generator targets C++). However, the ideas in CUP might be useful, if I ever look at it.

    I concede that the grammar in the public draft, WHEN CORRECTED, will correctly parse a<b<c>> when the tokenizer always treats >> as one token (it makes the grammar messy, but it worked). Below, I list the LALR(1) grammar that I have so far, with comments on the rules I had to fix so far. This grammar still does not allow (a<b>)o (as I stated earlier, this is highly ambiguous), nor does it allow a<b>.class (but the public draft did not mention this as being legal, and I think it would return the erased type anyway, so the user should have just used a.class).

    If anyone can produce an LR(k) way to allow parameterized type casts, I'd love to see it!

    By the way, there was a BugParade entry I once saw that suggested an alternate syntax for casts. Of course, the original syntax would have to be preserved for casts to raw types, so that the new compiler can still parse old code. But since parameterized types are new syntax, casting to a parameterized type could safely require new syntax. The proposal was that casts be allowed with this syntax: myvar.(MyClass). The semantics are that the compile type of expression myvar is cast to type MyClass. Notice that you can then write CastExpression as:

    CastExpression : '(' PrimitiveType Dimsopt ')' UnaryExpression
    | '(' Expression ')' UnaryExpressionNotPlusMinus
    | '(' Name Dims ')' UnaryExpressionNotPlusMinus
    | UnaryExpressionNotPlusMinus '.' '(' ReferenceType ')'
    | UnaryExpression '.' '(' PrimitiveType ')'

    Now parameterized type cases would have to use the new form, but you no longer have grammar ambiguity. And since casts to parameterized types should be rare (after all, the point of generics is to reduce the number of explicit casts), it is not too much to ask for this new syntax.

    ----------------------
    This grammar started from the LALR(1) version in JLS1 chapter 18. It
    has been tweaked to correctly parse everything of JLS2, and to add
    generics (JSR 14). Everything added or modified for generics is
    preceded by the comment GENERIC. Some rules of JLS1 were simplified in the interest of a simpler parse tree with code reuse, which occaisionally requires more work during semantic processing.

    Goal : CompilationUnit
    Literal : 'IntLiteral'
    | 'LongLiteral'
    | 'FloatLiteral'
    | 'DoubleLiteral'
    | BooleanLiteral
    | 'CharLiteral'
    | 'StringLiteral'
    | 'null'
    BooleanLiteral : 'true'
    | 'false'
    Type : PrimitiveType
    | ReferenceType
    PrimitiveType : NumericType
    | 'boolean'
    NumericType : IntegralType
    | FloatingPointType
    IntegralType : 'byte'
    | 'short'
    | 'int'
    | 'long'
    | 'char'
    FloatingPointType : 'float'
    | 'double'
    ReferenceType : ClassOrInterfaceType
    | ArrayType

    // I used ClassOrInterface instead of Name, to allow things like A<B>.C
    // GENERIC
    ClassOrInterfaceType : ClassOrInterface
    | ClassOrInterface '<' ReferenceTypeList1

    // I used 'Identifier' instead of Name, to have a distinction between
    // expression names and type names (since only type names can have type
    // arguments).
    // GENERIC
    ClassOrInterface : 'Identifier'
    | ClassOrInterfaceType '.' 'Identifier'

    // The ArrayType rule makes more sense like this.
    ArrayType : PrimitiveType Dims
    | ClassOrInterfaceType Dims

    Name : 'Identifier'
    | Name '.' 'Identifier'
    CompilationUnit : PackageDeclarationopt ImportDeclarationsopt
    TypeDeclarationsopt
    ImportDeclarations : ImportDeclaration
    | ImportDeclarations ImportDeclaration
    TypeDeclarations : TypeDeclaration
    | TypeDeclarations TypeDeclaration
    PackageDeclaration : 'package' Name ';'
    ImportDeclaration : SingleTypeImportDeclaration
    | TypeImportOnDemandDeclaration
    SingleTypeImportDeclaration : 'import' Name ';'
    TypeImportOnDemandDeclaration : 'import' Name '.' '*' ';'
    TypeDeclaration : ClassDeclaration
    | InterfaceDeclaration
    | ';'
    Modifiers : Modifier
    | Modifiers Modifier
    Modifier : 'public'
    | 'protected'
    | 'private'
    | 'static'
    | 'abstract'
    | 'final'
    | 'native'
    | 'strictfp'
    | 'synchronized'
    | 'transient'
    | 'volatile'

    // GENERIC
    ClassDeclaration : Modifiersopt 'class' 'Identifier' TypeParametersopt
    Superopt Interfacesopt ClassBody

    // I deleted ClassType and InterfaceType for grammar simplicity.
    Super : 'extends' ClassOrInterfaceType
    Interfaces : 'implements' TypeList

    // TypeList is the combination of InterfaceTypeList and ClassTypeList
    TypeList : ClassOrInterfaceType
    | TypeList ',' ClassOrInterfaceType
    ClassBody : '{' ClassBodyDeclarationsopt '}'
    ClassBodyDeclarations : ClassBodyDeclaration
    | ClassBodyDeclarations ClassBodyDeclaration

    // Since initializers are similar, I lumped them together into
    // InitializerDeclaration. Likewise, since interface and class members
    // are similar, I lumped them into MemberDeclaration. Of course, this
    // requires more work on the semantic side of things, but makes
    // parsing easier.
    ClassBodyDeclaration : MemberDeclaration
    | ConstructorDeclaration
    | InitializerDeclaration
    // Note that type declaration includes ';', the empty member declaration
    MemberDeclaration : FieldDeclaration
    | MethodDeclaration
    | TypeDeclaration
    FieldDeclaration : Modifiersopt Type VariableDeclarators ';'
    VariableDeclarators : VariableDeclarator
    | VariableDeclarators ',' VariableDeclarator
    VariableDeclarator : VariableDeclaratorId
    | VariableDeclaratorId '=' VariableInitializer
    VariableDeclaratorId : 'Identifier' Dimsopt
    VariableInitializer : Expression
    | ArrayInitializer

    // Similar to JLS2 chapter 18, I treat ExplicitConstructorInvocations
    // as statements, then perform semantic filtering to make sure they
    // only appear in the correct places. Then MethodBody can be shared in
    // more places.
    MethodDeclaration : MethodHeader MethodBody
    | MethodHeader ';'
    // To avoid ambiguity with FieldDeclarations, TypeParametersopt and
    // ResultType must be expanded inline.
    // GENERIC
    MethodHeader : Modifiersopt TypeParameters Type MethodDeclarator Throwsopt
    | Modifiersopt Type MethodDeclarator Throwsopt
    | Modifiersopt TypeParameters 'void' MethodDeclarator Throwsopt
    | Modifiersopt 'void' MethodDeclarator Throwsopt
    MethodDeclarator : 'Identifier' '(' FormalParameterListopt ')' Dimsopt
    FormalParameterList : FormalParameter
    | FormalParameterList ',' FormalParameter
    FormalParameter : 'final'opt Type VariableDeclaratorId
    Throws : 'throws' TypeList
    // See comment for MethodDeclaration
    MethodBody : Block
    InitializerDeclaration : 'static'opt MethodBody

    // The public draft is not clear whether constructors can be
    // parameterized, but it makes sense to me. TypeParametersopt needs to
    // be expanded inline to avoid ambiguities with field declarations.
    // GENERIC
    ConstructorDeclaration : Modifiersopt ConstructorDeclarator Throwsopt
    MethodBody
    | Modifiersopt TypeParameters ConstructorDeclarator
    Throwsopt MethodBody
    ConstructorDeclarator : 'Identifier' '(' FormalParameterListopt ')'
    ExplicitConstructorInvocation : 'this' '(' ArgumentListopt ')' ';'
    | 'super' '(' ArgumentListopt ')' ';'
    | Primary '.' 'super' '(' ArgumentListopt ')' ';'
    | Name '.' 'super' '(' ArgumentListopt
    ')' ';'
    // GENERIC
    InterfaceDeclaration : Modifiersopt 'interface' 'Identifier' TypeParametersopt
    ExtendsInterfacesopt InterfaceBody
    ExtendsInterfaces : 'extends' TypeList
    InterfaceBody : '{' InterfaceMemberDeclarationsopt '}'
    // See comment on MemberDeclaration.
    InterfaceMemberDeclarations : MemberDeclaration
    | InterfaceMemberDeclarations MemberDeclaration
    ArrayInitializer : '{' ','opt '}'
    | '{' VariableInitializers ','opt '}'
    VariableInitializers : VariableInitializer
    | VariableInitializers ',' VariableInitializer
    Block : '{' BlockStatementsopt '}'
    BlockStatements : BlockStatement
    | BlockStatements BlockStatement
    // I added ExplicitConstructorInvocation here to make MethodBody more
    // portable - this requires extra semantic work for a simpler grammar.
    BlockStatement : LocalVariableDeclarationStatement
    | Statement
    | ClassDeclaration
    | ExplicitConstructorInvocation
    LocalVariableDeclarationStatement : LocalVariableDeclaration ';'
    // To avoid ambiguities between Type and Name, I expanded this
    // inline. Quite tricky to get right, especially for things like
    // "a<b>.c<d> e;"
    // GENERIC
    LocalVariableDeclaration : PrimitiveType Dimsopt VariableDeclarators
    | Name VariableDeclarators
    | Name Dims VariableDeclarators
    | Name TypeArguments Dimsopt VariableDeclarators
    | Name TypeArguments '.' ClassOrInterfaceType
    Dimsopt VariableDeclarators
    | 'final' Type VariableDeclarators
    Statement : StatementWithoutTrailingSubstatement
    | LabeledStatement
    | IfThenStatement
    | IfThenElseStatement
    | WhileStatement
    | ForStatement
    StatementNoShortIf : StatementWithoutTrailingSubstatement
    | LabeledStatementNoShortIf
    | IfThenElseStatementNoShortIf
    | WhileStatementNoShortIf
    | ForStatementNoShortIf
    StatementWithoutTrailingSubstatement : Block
    | EmptyStatement
    | ExpressionStatement
    | SwitchStatement
    | DoStatement
    | BreakStatement
    | ContinueStatement
    | ReturnStatement
    | SynchronizedStatement
    | ThrowStatement
    | TryStatement
    | AssertStatement
    EmptyStatement : ';'
    LabeledStatement : 'Identifier' ':' Statement
    LabeledStatementNoShortIf : 'Identifier' ':' StatementNoShortIf
    ExpressionStatement : StatementExpression ';'
    StatementExpression : Assignment
    | PreIncrementExpression
    | PreDecrementExpression
    | PostIncrementExpression
    | PostDecrementExpression
    | MethodInvocation
    | ClassInstanceCreationExpression
    IfThenStatement : 'if' '(' Expression ')' Statement
    IfThenElseStatement : 'if' '(' Expression ')' StatementNoShortIf
    'else' Statement
    IfThenElseStatementNoShortIf : 'if' '(' Expression ')' StatementNoShortIf
    'else' StatementNoShortIf
    SwitchStatement : 'switch' '(' Expression ')' SwitchBlock
    SwitchBlock : '{' SwitchBlockStatements SwitchLabelsopt '}'
    | '{' SwitchLabelsopt '}'
    SwitchBlockStatements : SwitchBlockStatement
    | SwitchBlockStatements SwitchBlockStatement
    SwitchBlockStatement : SwitchLabels BlockStatements
    SwitchLabels : SwitchLabel
    | SwitchLabels SwitchLabel
    SwitchLabel : 'case' ConstantExpression ':'
    | 'default' ':'
    WhileStatement : 'while' '(' Expression ')' Statement
    WhileStatementNoShortIf : 'while' '(' Expression ')' StatementNoShortIf
    DoStatement : 'do' Statement 'while' '(' Expression ')' ';'
    ForStatement : 'for' '(' ForInitopt ';' Expressionopt ';' ForUpdateopt ')'
    Statement
    ForStatementNoShortIf : 'for' '(' ForInitopt ';' Expressionopt ';'
    ForUpdateopt ')' StatementNoShortIf
    ForInit : StatementExpressionList
    | LocalVariableDeclaration
    ForUpdate : StatementExpressionList
    StatementExpressionList : StatementExpression
    | StatementExpressionList ',' StatementExpression
    AssertStatement : 'assert' Expression ';'
    | 'assert' Expression ':' Expression ';'
    BreakStatement : 'break' 'Identifier'opt ';'
    ContinueStatement : 'continue' 'Identifier'opt ';'
    ReturnStatement : 'return' Expressionopt ';'
    ThrowStatement : 'throw' Expression ';'
    SynchronizedStatement : 'synchronized' '(' Expression ')' Block
    TryStatement : 'try' Block Catches
    | 'try' Block Catchesopt Finally
    Catches : CatchClause
    | Catches CatchClause
    CatchClause : 'catch' '(' FormalParameter ')' Block
    Finally : 'finally' Block

    // Because "new int[]{0}[0]" is legal syntax, I split
    // ArrayCreationExpression in two.
    Primary : PrimaryNoNewArray
    | ArrayCreationUninitialized
    | ArrayCreationInitialized
    PrimaryNoNewArray : Literal
    | 'this'
    | '(' Expression ')'
    | ClassInstanceCreationExpression
    | FieldAccess
    | Name '.' 'this'
    | PrimitiveType Dimsopt '.' 'class'
    | Name Dims '.' 'class'
    | Name '.' 'class'
    | 'void' '.' 'class'
    | MethodInvocation
    | ArrayAccess
    // GENERIC
    ClassInstanceCreationExpression : 'new' ClassOrInterfaceType '('
    ArgumentListopt ')' ClassBodyopt
    | Primary '.' 'new' 'Identifier'
    TypeArgumentsopt '(' ArgumentListopt
    ')' ClassBodyopt
    | Name '.' 'new' 'Identifier'
    TypeArgumentsopt '(' ArgumentListopt
    ')' ClassBodyopt
    ArgumentList : Expression
    | ArgumentList ',' Expression
    ArrayCreationUninitialized : 'new' PrimitiveType DimExprs Dimsopt
    | 'new' ClassOrInterfaceType DimExprs Dimsopt
    ArrayCreationInitialized : 'new' PrimitiveType Dims ArrayInitializer
    | 'new' ClassOrInterfaceType Dims ArrayInitializer
    DimExprs : DimExpr
    | DimExprs DimExpr
    DimExpr : '[' Expression ']'
    Dims : '[' ']'
    | Dims '[' ']'
    FieldAccess : Primary '.' 'Identifier'
    | 'super' '.' 'Identifier'
    | Name '.' 'super' '.' 'Identifier'
    MethodInvocation : Name '(' ArgumentListopt ')'
    | Primary '.' 'Identifier' '(' ArgumentListopt ')'
    | 'super' '.' 'Identifier' '(' ArgumentListopt ')'
    | Name '.' 'super' '.' 'Identifier' '(' ArgumentListopt ')'
    ArrayAccess : Name '[' Expression ']'
    | PrimaryNoNewArray '[' Expression ']'
    | ArrayCreationInitialized '[' Expression ']'
    PostfixExpression : Primary
    | Name
    | PostIncrementExpression
    | PostDecrementExpression
    PostIncrementExpression : PostfixExpression '++'
    PostDecrementExpression : PostfixExpression '--'
    UnaryExpression : PreIncrementExpression
    | PreDecrementExpression
    | '+' UnaryExpression
    | '-' UnaryExpression
    | UnaryExpressionNotPlusMinus
    PreIncrementExpression : '++' UnaryExpression
    PreDecrementExpression : '--' UnaryExpression
    UnaryExpressionNotPlusMinus : PostfixExpression
    | '~' UnaryExpression
    | '!' UnaryExpression
    | CastExpression
    // GENERIC : I don't know how to cast to a parameterized type - this
    // part of the grammar is too ambiguous to easily make LR(k).
    CastExpression : '(' PrimitiveType Dimsopt ')' UnaryExpression
    | '(' Expression ')' UnaryExpressionNotPlusMinus
    | '(' Name Dims ')' UnaryExpressionNotPlusMinus
    MultiplicativeExpression : UnaryExpression
    | MultiplicativeExpression '*' UnaryExpression
    | MultiplicativeExpression '/' UnaryExpression
    | MultiplicativeExpression '%' UnaryExpression
    AdditiveExpression : MultiplicativeExpression
    | AdditiveExpression '+' MultiplicativeExpression
    | AdditiveExpression '-' MultiplicativeExpression
    ShiftExpression : AdditiveExpression
    | ShiftExpression '<<' AdditiveExpression
    | ShiftExpression '>>' AdditiveExpression
    | ShiftExpression '>>>' AdditiveExpression
    // Note that since < never operates on booleans, but always produces a
    // boolean result, we can rewrite this rule. If we don't rewrite it, then
    // the following would be ambiguous: "a instanceof b<c>"
    // GENERIC
    RelationalExpression : ShiftExpression
    | ShiftExpression '<' ShiftExpression
    | RelationalExpression '>' ShiftExpression
    | RelationalExpression '<=' ShiftExpression
    | RelationalExpression '>=' ShiftExpression
    | RelationalExpression 'instanceof' ReferenceType
    EqualityExpression : RelationalExpression
    | EqualityExpression '==' RelationalExpression
    | EqualityExpression '!=' RelationalExpression
    AndExpression : EqualityExpression
    | AndExpression '&' EqualityExpression
    ExclusiveOrExpression : AndExpression
    | ExclusiveOrExpression '^' AndExpression
    InclusiveOrExpression : ExclusiveOrExpression
    | InclusiveOrExpression '|' ExclusiveOrExpression
    ConditionalAndExpression : InclusiveOrExpression
    | ConditionalAndExpression '&&'
    InclusiveOrExpression
    ConditionalOrExpression : ConditionalAndExpression
    | ConditionalOrExpression '||'
    ConditionalAndExpression
    ConditionalExpression : ConditionalOrExpression
    | ConditionalOrExpression '?' Expression ':'
    ConditionalExpression
    AssignmentExpression : ConditionalExpression
    | Assignment
    // Because "(i) = 1" is legal, I rewrote this to allow all variables
    // on the left hand side. This requires some semantic checks that it is
    // actually a variable being assigned.
    Assignment : PostfixExpression AssignmentOperator AssignmentExpression
    AssignmentOperator : '='
    | '*='
    | '/='
    | '%='
    | '+='
    | '-='
    | '<<='
    | '>>='
    | '>>>='
    | '&='
    | '^='
    | '|='
    Expression : AssignmentExpression
    ConstantExpression : Expression

    // All rules below deal with generics.
    // GENERIC
    TypeArguments : '<' ReferenceTypeList '>'
    ReferenceTypeList : ReferenceType
    | ReferenceTypeList ',' ReferenceType
    ReferenceTypeList1 : ReferenceType1
    | ReferenceTypeList ',' ReferenceType1
    // Note that I used ClassOrInterface instead of Name, to allow "a<b>.c<d>"
    ReferenceType1 : ReferenceType '>'
    | ClassOrInterface '<' ReferenceTypeList2
    ReferenceTypeList2 : ReferenceType2
    | ReferenceTypeList ',' ReferenceType2
    ReferenceType2 : ReferenceType '>>'
    | ClassOrInterface '<' ReferenceTypeList3
    ReferenceTypeList3 : ReferenceType3
    | ReferenceTypeList ',' ReferenceType3
    ReferenceType3 : ReferenceType '>>>'
    TypeParameters : '<' TypeParameterList1
    TypeParameterList : TypeParameter
    | TypeParameterList ',' TypeParameter
    TypeParameterList1 : TypeParameter1
    | TypeParameterList ',' TypeParameter1
    TypeParameter : 'Identifier' TypeBoundopt
    // I had to fix TypeParameter1, including adding new rules.
    TypeParameter1 : 'Identifier' '>'
    | 'Identifier' TypeBound1
    TypeBound : 'extends' ClassOrInterfaceType AdditionalBoundListopt
    TypeBound1 : 'extends' ReferenceType1
    | 'extends' ClassOrInterfaceType AdditionalBoundList1
    AdditionalBoundList : AdditionalBound
    | AdditionalBoundList AdditionalBound
    AdditionalBoundList1 : AdditionalBound1
    | AdditionalBoundList AdditionalBound1
    AdditionalBound : '&' ClassOrInterfaceType
    AdditionalBound1 : '&' ReferenceType1

  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    Correction. This rule needs to change too:

    TypeArguments : '<' ReferenceTypeList1
  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    I am not interested in a parser generator targetting Java (I'm working
    with the Jikes compiler, where the parser generator targets C++).
    I think you're missing the main point. CUP is an LALR(1) parser, and there's a CUP grammar for the Java language (including generics support) on that site. That means there's an LALR(1) grammar for generics there.
  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    You missed my point. I looked at CUP, and it is a CONTEXT-SENSITIVE LALR grammar. I am interested in a CONTEXT-FREE LALR grammar.

    In CUP, Scott invented the terminal PLT, a LT with lookahead for differentiating whether "(a<b" starts an expression (as in "(a<b)") or a cast to parameterized type (as in "(a<b>)o"). Yes, I could do that with my parser (and in the end I will probably have to, since no one else has come up with a better solution), but it is context sensitive - it requires the lexer to do lookahead!
  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    Also, Scott's sample grammar does not (yet) parse the following, while the prototype compiler does.
    class A<T> {
      class B<S> {}
      void m() {
        int i;
        (i) = new int[]{0}[0];
        A<Integer>.B<Integer> c;
      }
    }
  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    I have FINALLY managed to write an LALR(1) grammar for generics, with NO LOOKAHEAD in the lexer. The solution for casts to parameterized types is rather ugly, but it works. If you are interested in seeing the grammar, it is in the CVS repository of jikes, under the generics-branch tag, in the file src/java.g: http://www-124.ibm.com/developerworks/oss/cvs/jikes/~checkout~/jikes/src/java.g?rev=1.31.2.17&content-type=text/plain

    I hope to translate that grammar to CUP, and inform Scott Ananian of my success, in the next few days.
  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    Just a quick note to this forum that I've updated my CUP grammar to fix the deficiencies Eric noted. It is still available from
    http://www.cs.princeton.edu/~appel/modern/java/CUP/
    and linked to from
    http://cag.lcs.mit.edu/~cananian/Projects/GJ/

    Thanks for the bug reports, Eric!

    My grammar still uses the "smart lexer" look-ahead feature, but I've looked over Eric's work and it appears quite clever.
  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    the rules I had to fix so far. This grammar still
    does not allow (a<b>)o (as I stated earlier, this is
    highly ambiguous), nor does it allow a<b>.class (but
    the public draft did not mention this as being legal,
    and I think it would return the erased type anyway, so
    the user should have just used a.class).
    Neither of those forms are supposed to be legal.

  • 843793
    843793 Member Posts: 41,732 Green Ribbon
    I spoke too soon. The cast form is legal. The class literal
    form, I believe, is not.
This discussion has been closed.