Attribute Grammars
Attribute grammars are formal specifications that can be used to define the semantics, a meaning of some sort, to a program, expression, or phrase, in a language. Attribute grammars define semantics by specifying that certain semantic attributes (values) will be associated with certain nodes in the syntax tree and providing equations that specify how the values of these attributes are to be computed. Don Knuth’s 1968 paper “Semantics of Context Free Languages” [3] introduced attribute grammars and is essential reading.
Attributes may be associated with nonterminals in the grammar. Attributes that are used to propagate semantic information up the syntax tree are called synthesized attributes and attributes that propagate information down the tree are called inherited attributes. In an AG defining the semantics of a programming language, an example of a synthesized attribute may be an errors attribute that collects semantic errors and propagates them to the root of the tree so that they can be reported to the programmer. An inherited attribute may be a symbol-table or environment that associates variable names with their types. This information is propagated down the trees representing statements and expression so that it can be used to determine the types of variable uses and then be used in type checking.
In Silver, attribute declarations have the following form:
( inherited | synthesized ) attribute <attr-name> :: <attr-type> <optional-attr-clauses> ;
For example, consider the declaration of a string attribute that “unparses” the tree, generating a
string representation of it named
synthesized attribute pp :: String ;
Since this attribute is synthesized, its value on a node is computed, typically, from values of
this attribute on the node’s children. Values of attributes on nodes are written using the name of
the node as indicated in the signature of the production and the name of the attribute separated
by a dot ("."). Figure 2 fills in the definitions of the pp
attribute that were left out of Figure 1.
The attribute pp
is declared as a String
attribute and it decorates all four nonterminals. On the
root production, the value of pp
on the root node r
is the same as the value of the pp
attribute
on its single child e
. On the production for addition the string concatenation operator ++
is used
to compute the value of pp
on the node named sum
from the value of pp
on its children and
string constants. Sting constants in Silver are represented as they are in most other programming
languages. On the production integerConstant_c
we see that the value of pp
is copied from
the lexeme of the integer literal terminal symbol named i
in this production. Terminals are also
decorated with Integer
-valued attributes line
and column
indicating where in the input file they
were matched.
Silver supports attributes of various types. Primitive types supported by Silver include Integer
,
Float
, String
, and Boolean
. The expected arithmetic (+
, *
, etc) and relational (<
, !=
, etc.)
operators are supported on numeric values. Logical (&&
, ||
, !
) operators are supported on boolean
values. The string concatenation operator ++
and numerous library functions on strings are also
supported. The reader is referred to the Silver Reference Manual for a full accounting of these.
Silver also supports attributes whose values are syntax trees. These are called higher-order attributes [7] but are not as complicated as this name might indicate. Productions are essentially tree building functions that take trees or other values that become the child nodes of the root node in the constructed tree. The parser generated by Copper in essence uses concrete productions as tree building functions to construct the concrete syntax tree.
One important use of higher-order attributes is in creating an abstract syntax tree from the
concrete syntax tree. An abstract syntax tree is typically based on a simpler grammar that can omit
punctuation and operator symbols that are only needed in recognizing the structure of the program
from the linear sequence of tokens produced by the scanner. This grammar is also usually simpler
because it does not need to conform to any parser-imposed restrictions. Abstract productions are
not used by Copper in generating the parser. The abstract syntax of dc
can be seen in Figure 3.
Abstract productions are indicated by the modifier abstract
. These productions define values of
pp
and an integer valued value
attribute that computes the value of the expression.
Figure 4 shows some declarations of the higher-order attributes ast_Expr
and ast_Root
,
some of the concrete productions with their definitions of these attributes. For example, on add_c
we see that the abstract syntax tree from addition is built using the abstract production add
and
the abstract syntax trees of the two expression children; the operator symbol is not present in the
abstract syntax since it is no longer need. On the production exprTerm_c
the abstract syntax tree
of the child t
is used as the abstract syntax tree for e
without modification; there is no corresponding
exprTerm
production in the abstract syntax since this production is used in the concrete syntax
only to encode the operator precedence and associativity rules. Similarly, there are not abstract
versions of the concrete nonterminals Term_c
and Factor_c
. There is only one abstract expression
nonterminal, Expr
, and an abstract root nonterminal Root
.
Figure 2: Definition of the pp
attribute on the concrete syntax productions.
nonterminal Root_c ;
synthesized attribute pp :: String ;
attribute pp occurs on Root_c ;
concrete production root_c
r::Root_c ::= e::Expr_c
{ r.pp = e.pp;
}
nonterminal Expr_c with pp ;
nonterminal Term_c with pp ;
nonterminal Factor_c with pp ;
concrete production add_c
sum::Expr_c ::= e::Expr_c '+' t::Term_c
{ sum.pp = e.pp ++ " + " ++ t.pp ; }
concrete production sub_c
dff::Expr_c ::= e::Expr_c '-' t::Term_c
{ dff.pp = e.pp ++ " - " ++ t.pp ; }
concrete production exprTerm_c
e::Expr_c ::= t::Term_c
{ e.pp = t.pp ; }
concrete production mul_c
prd::Term_c ::= t::Term_c '*' f::Factor_c
{ prd.pp = t.pp ++ " * " ++ f.pp ; }
concrete production div_c
d::Term_c ::= t::Term_c '/' f::Factor_c
{ d.pp = t.pp ++ " / " ++ f.pp ; }
concrete production termFactor_c
t::Term_c ::= f::Factor_c
{ t.pp = f.pp ; }
concrete production nested_c
e::Factor_c ::= '(' inner::Expr_c ')'
{ e.pp = "(" ++ inner.pp ++ ")" ; }
concrete production integerConstant_c
ic::Factor_c ::= i::IntLit_t
{ ic.pp = i.lexeme ; }
Figure 3: Some of the abstract grammar for tutorial:dc
in file AbstractSyntax.sv
.
nonterminal Root with pp;
synthesized attribute value :: Integer occurs on Root;
abstract production root
r::Root ::= e::Expr
{ r.pp = e.pp ;
r.value = e.value ;
}
nonterminal Expr with pp, value;
abstract production add
sum::Expr ::= l::Expr r::Expr
{ sum.pp = "(" ++ l.pp ++ " + " ++ r.pp ++ ")";
sum.value = l.value + r.value ;
}
abstract production mul
prd::Expr ::= l::Expr r::Expr
{ prd.pp = "(" ++ l.pp ++ " * " ++ r.pp ++ ")";
prd.value = l.value * r.value ;
}
abstract production integerConstant
e::Expr ::= i::IntLit_t
{ e.pp = i.lexeme ;
e.value = toInt(i.lexeme);
}
Figure 4: Concrete syntax specification of dc
with higher order attribute generating the abstract
syntax tree in file ConcreteSyntax.sv
.
nonterminal Root_c ;
synthesized attribute pp :: String ;
synthesized attribute ast_Root :: Root;
attribute pp, ast_Root occurs on Root_c ;
concrete production root_c
r::Root_c ::= e::Expr_c
{ r.pp = e.pp;
r.ast_Root = root(e.ast_Expr);
}
synthesized attribute ast_Expr :: Expr ;
nonterminal Expr_c with pp, ast_Expr;
nonterminal Term_c with pp, ast_Expr;
nonterminal Factor_c with pp, ast_Expr;
concrete production add_c
sum::Expr_c ::= e::Expr_c ’+’ t::Term_c
{ sum.pp = e.pp ++ " + " ++ t.pp ;
sum.ast_Expr = add(e.ast_Expr, t.ast_Expr );
}
concrete production exprTerm_c
e::Expr_c ::= t::Term_c
{ e.pp = t.pp ;
e.ast_Expr = t.ast_Expr ;
}
concrete production mul_c
prd::Term_c ::= t::Term_c ’*’ f::Factor_c
{ prd.pp = t.pp ++ " * " ++ f.pp ;
prd.ast_Expr = mul(t.ast_Expr, f.ast_Expr);
}
concrete production termFactor_c
t::Term_c ::= f::Factor_c
{ t.pp = f.pp ;
t.ast_Expr = f.ast_Expr ;
}
concrete production integerConstant_c
ic::Factor_c ::= i::IntLit_t
{ ic.pp = i.lexeme ;
ic.ast_Expr = integerConstant(i);
}
Note that concrete productions and nonterminals often have the suffix “_c” to distinguish them from their counterparts in the abstract syntax. This is only a naming convention and not part of the Silver specification language.
Inherited attributes are those that propagate information down the syntax tree. These attributes
are declared using the inherited
modifier instead of the synthesized
one, but are otherwise the
same.
The file BetterPP.sv
in the tutorial grammar dc
defines a synthesized attribute bpp
that
unparses the abstract syntax tree without the use of unnecessary parenthesis. For example, for an
expression such as 1 + 2 * 3
, the value of the pp
attribute on the root of the abstract syntax tree
is (1 + (2 * 3))
while the value of bpp
on the same node is 1 + 2 * 3
. To compute bpp
, a couple
of inherited attributes are defined. One is enclosingOpPrecedence
which indicates the operator
precedence of the operator on the enclosing parent of the node. Another, leftOrRight
, indicates if
an expression appears as the left or right child of the enclosing binary operator node. These values
are used to determine if the bpp
of the node should be wrapped in parenthesis or not.
While definitions of synthesized attributes compute the value of an attribute for the nonterminal node on the left hand side of a production, as in
e.pp = l.pp ++ " + " ++ r.pp ;
definitions of inherited attributes compute the value of an attribute for a nonterminal on the right
hand side of a production. Four inherited attribute definitions appear in the following aspect of
the abstract add
production in BetterPP.sv
:
aspect production add
sum::Expr ::= l::Expr r::Expr
{
sum.bpp = if wrapInParens ( sum.enclosingOpPrecedence, 1,
sum.leftOrRight, "left" )
then "(" ++ l.bpp ++ " + " ++ r.bpp ++ ")"
else l.bpp ++ " + " ++ r.bpp ;
l.enclosingOpPrecedence = 1 ;
r.enclosingOpPrecedence = 1 ;
l.leftOrRight = "left" ;
r.leftOrRight = "right" ;
}
An aspect production simply provides a mechanism for defining additional attributes on an already
existing production. The production add
is defined in the file AbstractSyntax.sv
and the aspect
production above has the same effect as if the five attribute definitions were written on the add
production in AbstractSyntax.sv
. These inherited attribute definitions on add
assign values for
the inherited attributes of the children of the production.
This phase of most language translators has no real analog in our simple
expression language dc
. A simple way to generate code is by using a string synthesized attribute
to compute the translation. As an exercise the reader could define a synthesized attribute than
generates the postfix version of the expression. In most programming languages other attributes
that decorate the tree, such as the type of expressions, are used in performing the translation.
Code optimization is an important phase in compilation and can be performed using attribute grammars but is beyond the scope of this tutorial.
Next Section: Running Silver