Attribute Grammars

Attribute grammars are formal specifications that can be used to define the semantics, a meaning of some sort, to a program, expression, or phrase, in a language. Attribute grammars define semantics by specifying that certain semantic attributes (values) will be associated with certain nodes in the syntax tree and providing equations that specify how the values of these attributes are to be computed. Don Knuth’s 1968 paper “Semantics of Context Free Languages” [3] introduced attribute grammars and is essential reading.

Attributes may be associated with nonterminals in the grammar. Attributes that are used to propagate semantic information up the syntax tree are called synthesized attributes and attributes that propagate information down the tree are called inherited attributes. In an AG defining the semantics of a programming language, an example of a synthesized attribute may be an errors attribute that collects semantic errors and propagates them to the root of the tree so that they can be reported to the programmer. An inherited attribute may be a symbol-table or environment that associates variable names with their types. This information is propagated down the trees representing statements and expression so that it can be used to determine the types of variable uses and then be used in type checking.

Synthesized attributes

In Silver, attribute declarations have the following form:

( inherited | synthesized ) attribute <attr-name> :: <attr-type> <optional-attr-clauses> ;

For example, consider the declaration of a string attribute that “unparses” the tree, generating a string representation of it named .

synthesized attribute pp :: String ;

Since this attribute is synthesized, its value on a node is computed, typically, from values of this attribute on the node’s children. Values of attributes on nodes are written using the name of the node as indicated in the signature of the production and the name of the attribute separated by a dot ("."). Figure 2 fills in the definitions of the pp attribute that were left out of Figure 1. The attribute pp is declared as a String attribute and it decorates all four nonterminals. On the root production, the value of pp on the root node r is the same as the value of the pp attribute on its single child e. On the production for addition the string concatenation operator ++ is used to compute the value of pp on the node named sum from the value of pp on its children and string constants. Sting constants in Silver are represented as they are in most other programming languages. On the production integerConstant_c we see that the value of pp is copied from the lexeme of the integer literal terminal symbol named i in this production. Terminals are also decorated with Integer-valued attributes line and column indicating where in the input file they were matched.

Silver supports attributes of various types. Primitive types supported by Silver include Integer, Float, String, and Boolean. The expected arithmetic (+, *, etc) and relational (<, !=, etc.) operators are supported on numeric values. Logical (&&, ||, !) operators are supported on boolean values. The string concatenation operator ++ and numerous library functions on strings are also supported. The reader is referred to the Silver Reference Manual for a full accounting of these.

Silver also supports attributes whose values are syntax trees. These are called higher-order attributes [7] but are not as complicated as this name might indicate. Productions are essentially tree building functions that take trees or other values that become the child nodes of the root node in the constructed tree. The parser generated by Copper in essence uses concrete productions as tree building functions to construct the concrete syntax tree.

One important use of higher-order attributes is in creating an abstract syntax tree from the concrete syntax tree. An abstract syntax tree is typically based on a simpler grammar that can omit punctuation and operator symbols that are only needed in recognizing the structure of the program from the linear sequence of tokens produced by the scanner. This grammar is also usually simpler because it does not need to conform to any parser-imposed restrictions. Abstract productions are not used by Copper in generating the parser. The abstract syntax of dc can be seen in Figure 3. Abstract productions are indicated by the modifier abstract. These productions define values of pp and an integer valued value attribute that computes the value of the expression.

Figure 4 shows some declarations of the higher-order attributes ast_Expr and ast_Root, some of the concrete productions with their definitions of these attributes. For example, on add_c we see that the abstract syntax tree from addition is built using the abstract production add and the abstract syntax trees of the two expression children; the operator symbol is not present in the abstract syntax since it is no longer need. On the production exprTerm_c the abstract syntax tree of the child t is used as the abstract syntax tree for e without modification; there is no corresponding exprTerm production in the abstract syntax since this production is used in the concrete syntax only to encode the operator precedence and associativity rules. Similarly, there are not abstract versions of the concrete nonterminals Term_c and Factor_c. There is only one abstract expression nonterminal, Expr, and an abstract root nonterminal Root.

Figure 2: Definition of the pp attribute on the concrete syntax productions.

nonterminal Root_c ;
synthesized attribute pp :: String ;
attribute pp occurs on Root_c ;

concrete production root_c
r::Root_c ::= e::Expr_c
{ r.pp = e.pp;
}

nonterminal Expr_c with pp ;
nonterminal Term_c with pp ;
nonterminal Factor_c with pp ;

concrete production add_c
sum::Expr_c ::= e::Expr_c '+' t::Term_c
{ sum.pp = e.pp ++ " + " ++ t.pp ; }

concrete production sub_c
dff::Expr_c ::= e::Expr_c '-' t::Term_c
{ dff.pp = e.pp ++ " - " ++ t.pp ; }

concrete production exprTerm_c
e::Expr_c ::= t::Term_c
{ e.pp = t.pp ; }

concrete production mul_c
prd::Term_c ::= t::Term_c '*' f::Factor_c
{ prd.pp = t.pp ++ " * " ++ f.pp ; }

concrete production div_c
d::Term_c ::= t::Term_c '/' f::Factor_c
{ d.pp = t.pp ++ " / " ++ f.pp ; }

concrete production termFactor_c 
t::Term_c ::= f::Factor_c
{ t.pp = f.pp ; }

concrete production nested_c
e::Factor_c ::= '(' inner::Expr_c ')'
{ e.pp = "(" ++ inner.pp ++ ")" ; }

concrete production integerConstant_c
ic::Factor_c ::= i::IntLit_t
{ ic.pp = i.lexeme ; }

Figure 3: Some of the abstract grammar for tutorial:dc in file AbstractSyntax.sv.

nonterminal Root with pp;

synthesized attribute value :: Integer occurs on Root;

abstract production root
r::Root ::= e::Expr
{ r.pp = e.pp ;
  r.value = e.value ;
}

nonterminal Expr with pp, value;

abstract production add
sum::Expr ::= l::Expr r::Expr
{ sum.pp = "(" ++ l.pp ++ " + " ++ r.pp ++ ")";
  sum.value = l.value + r.value ;
}

abstract production mul
prd::Expr ::= l::Expr r::Expr
{ prd.pp = "(" ++ l.pp ++ " * " ++ r.pp ++ ")";
  prd.value = l.value * r.value ;
}

abstract production integerConstant
e::Expr ::= i::IntLit_t
{ e.pp = i.lexeme ;
  e.value = toInt(i.lexeme);
}

Figure 4: Concrete syntax specification of dc with higher order attribute generating the abstract syntax tree in file ConcreteSyntax.sv.

nonterminal Root_c ;
synthesized attribute pp :: String ;
synthesized attribute ast_Root :: Root;
attribute pp, ast_Root occurs on Root_c ;

concrete production root_c
r::Root_c ::= e::Expr_c
{ r.pp = e.pp;
  r.ast_Root = root(e.ast_Expr);
}

synthesized attribute ast_Expr :: Expr ;
nonterminal Expr_c with pp, ast_Expr;
nonterminal Term_c with pp, ast_Expr;
nonterminal Factor_c with pp, ast_Expr;

concrete production add_c
sum::Expr_c ::= e::Expr_c ’+’ t::Term_c
{ sum.pp = e.pp ++ " + " ++ t.pp ;
  sum.ast_Expr = add(e.ast_Expr, t.ast_Expr );
}

concrete production exprTerm_c
e::Expr_c ::= t::Term_c
{ e.pp = t.pp ;
  e.ast_Expr = t.ast_Expr ;
}

concrete production mul_c
prd::Term_c ::= t::Term_c ’*’ f::Factor_c
{ prd.pp = t.pp ++ " * " ++ f.pp ;
  prd.ast_Expr = mul(t.ast_Expr, f.ast_Expr);
}

concrete production termFactor_c
t::Term_c ::= f::Factor_c
{ t.pp = f.pp ;
  t.ast_Expr = f.ast_Expr ;
}

concrete production integerConstant_c
ic::Factor_c ::= i::IntLit_t
{ ic.pp = i.lexeme ;
  ic.ast_Expr = integerConstant(i);
}

Naming conventions

Note that concrete productions and nonterminals often have the suffix “_c” to distinguish them from their counterparts in the abstract syntax. This is only a naming convention and not part of the Silver specification language.

Inherited attributes

Inherited attributes are those that propagate information down the syntax tree. These attributes are declared using the inherited modifier instead of the synthesized one, but are otherwise the same.

The file BetterPP.sv in the tutorial grammar dc defines a synthesized attribute bpp that unparses the abstract syntax tree without the use of unnecessary parenthesis. For example, for an expression such as 1 + 2 * 3, the value of the pp attribute on the root of the abstract syntax tree is (1 + (2 * 3)) while the value of bpp on the same node is 1 + 2 * 3. To compute bpp, a couple of inherited attributes are defined. One is enclosingOpPrecedence which indicates the operator precedence of the operator on the enclosing parent of the node. Another, leftOrRight, indicates if an expression appears as the left or right child of the enclosing binary operator node. These values are used to determine if the bpp of the node should be wrapped in parenthesis or not.

While definitions of synthesized attributes compute the value of an attribute for the nonterminal node on the left hand side of a production, as in

e.pp = l.pp ++ " + " ++ r.pp ;

definitions of inherited attributes compute the value of an attribute for a nonterminal on the right hand side of a production. Four inherited attribute definitions appear in the following aspect of the abstract add production in BetterPP.sv:

aspect production add
sum::Expr ::= l::Expr r::Expr
{
 sum.bpp = if wrapInParens ( sum.enclosingOpPrecedence, 1,
                             sum.leftOrRight, "left" )
          then "(" ++ l.bpp ++ " + " ++ r.bpp ++ ")"
          else l.bpp ++ " + " ++ r.bpp ;
 l.enclosingOpPrecedence = 1 ;
 r.enclosingOpPrecedence = 1 ;
 l.leftOrRight = "left" ;
 r.leftOrRight = "right" ;
}

An aspect production simply provides a mechanism for defining additional attributes on an already existing production. The production add is defined in the file AbstractSyntax.sv and the aspect production above has the same effect as if the five attribute definitions were written on the add production in AbstractSyntax.sv. These inherited attribute definitions on add assign values for the inherited attributes of the children of the production.

Code generation

This phase of most language translators has no real analog in our simple expression language dc. A simple way to generate code is by using a string synthesized attribute to compute the translation. As an exercise the reader could define a synthesized attribute than generates the postfix version of the expression. In most programming languages other attributes that decorate the tree, such as the type of expressions, are used in performing the translation.

Code optimization is an important phase in compilation and can be performed using attribute grammars but is beyond the scope of this tutorial.

Next Section: Running Silver