Concepts
Algebraic SemanticsAn algebraic definition of a language is a definition of an algebra. An algebra consists of a domain of values and a set of operations (functions) defined on the domain.
Algebra = < set of values; operations >
Example N.1: Algebraic Definition of Peano Arithmetic;
Domains:Bool = {true, false} (Boolean values)
N in Nat (the natural numbers)
N ::= 0 | S(N)
Functions:= : (Nat, Nat) -> Bool
+ : (Nat, Nat) -> Nat
× : (Nat, Nat) -> Nat
Axioms and equations:not S(N) = 0
if S(M) = S(N) then M = N
( n + 0 ) = n
( m + S(n) ) = S( m + n )
( n × 0 ) = 0
( m × S(n)) = (( m × n) + m)where m,n in Nat
The equations define equivalences between syntactic elements; they specify the transformations that are used to translate from one syntactic form to another. The domain is often called a sort and the domain and the function sections constitute the signature of the algebra. Functions with zero, one, and two operands are referred to as nullary, unary, and binary operations.Because there is more than one domain, the algebra is called a many sorted algebra. As in this example, abstract data types may require values from several different sorts. The signature of the algebra is a set of sorts and a set of functions taking arguments and returning values of different sorts. A stack of natural numbers may be modeled as a many-sorted algebra with three sorts (natural numbers, stacks and booleans) and five operations (newStack, push, pop, top, empty).
Example N.2: Algebraic definition of an Integer Stack ADT;
Domains:
Nat (the natural numbers)
Stack (of natural numbers)
Bool (boolean values)
Functions:newStack: () -> Stack
push : (Nat, Stack) -> Stack
pop: Stack -> Stack
top: Stack -> Nat
empty : Stack -> Bool
Axioms:pop(push(N,S)) = S
top(push(N,S)) = N
empty(push(N,S)) = false
empty(newStack()) = true
Errors:pop(newStack())
top(newStack()) where N in Nat and S in Stack.
Defining Equations:newStack() = []In Example N.1, the structure of the numbers is described. In Example N.2 the structure of a stack is not defined. This means that we cannot use equations to describe syntactic transformations. Instead, we use axioms that describe the relationships between the operations. The axioms are more abstract than equations because the results of the operations are not described. To be more specific would require decisions to be made concerning the implementation of the stack data structure. Decisions which would tend to obscure the algebraic properties of stacks. The axioms impose constraints on the stack operations that are sound in the sense that they are consistent with the actual behavior of stacks reguardless of the implementation. Finding axioms that are complete, in the sense that they completely specify the behavior of the operations of an ADT, is more difficult.
push(N,S) = [N|S]
pop([N|S]) = S
top([N|S]) = NThe goal of algebraic semantics is to capture the semantics of behavior by a set of axioms with purely syntactic properties. Algebraic definitions (semantic algebras) are the favored method for defining the properties of abstract data types.
The axiomatic semantics of a programming language are the assertions about relationships that remain the same each time the program executes. Axiomatic semantics are defined for each control structure and command. The axiomatic semantics of a programming language define a mathematical theory of programs written in the language.
A mathematical theory has three components:
- Syntactic rules: These determine the structure of formulas which are the statements of interest;
- Axioms: These describe the basic properties of the system;
- Inference rules: These are the mechanisms for deducing new theorems from axioms and other theorems.
The semantic formulas are triples of the form {P} c {Q}, where c is a command or control structure in the programming language, P and Q are assertions or statements concerning the properties of program objects (often program variables) which may be true or false. P is called a pre-condition and Q is called a post-condition. The pre- and post-conditions are formulas in some arbitrary logic and summarize the progress of the computation.The meaning of {P} c {Q} is that if c is executed in a state in which assertion P is satisfied and c terminates, then c terminates in a state in which assertion Q is satisfied. We illustrate axiomatic semantics with a program to compute the sum of the elements of an array (see Example N.3).
Example N.3: Program to compute S = sumi=1nA[i] ;
S,I := 0,0
while I < n do
S,I := S+A[I+1],I+1
endThe assignment statements are simultaneous assignment statements. The expressions on the righthand side are evaluated simultaneously and assigned to the variables on the lefthand side in the order they appear. Example N.4 illustrates the use of axiomatic semantics to verify the program of Example N.3.
Example N.4: Verification of S = sumi=1nA[i] ;
Pre/Post-conditions Code
1. { 0 = Sumi=10A[i], 0 < |A| = n }
2. S,I := 0,0
3. {S = Sumi=1IA[i], I <= n }
4. while I < n do
5. {S = Sumi=1IA[i], I < n }
6. {S+A[I+1] = Sumi=1I+1A[i], I+1 <= n }
7. S,I := S+A[I+1],I+1
8. { S = Sumi=1IA[i], I <= n }
9. end
10. {S = Sumi=1IA[i], I <= n, I >= n }
11. {S = Sumi=1nA[i] }
The program sums the values stored in an array and the program is decorated with the assertions which help to verify the correctness of the code. The pre-condition in line 1 and the post-condition in line 11 are the pre- and post-conditions respectively for the program. The pre-condition asserts that the array contains at least one element zero and that the sum of the first zero elements of an array is zero. The post-condition asserts that S is sum of the values stored in the array. After the first assignment we know that the partial sum is the sum of the first I elements of the array and that I is less than or equal to the number of elements in the array.
The only way into the body of the while command is if the number of elements summed is less than the number of elements in the array. When this is the case, The sum of the first I+1 elements of the array is equal to the sum of the first I elements plus the I+1st element and I+1 is less than or equal to n. After the assignment in the body of the loop, the loop entry assertion holds once more. Upon termination of the loop, the loop index is equal to n. To show that the program is correct, we must show that the assertions satisfy some verification scheme. To verify the assignment commands, we use the Assignment Axiom:
Assignment Axiom
{P[x:E]} x:= E {P}This axiom asserts that:
"If after the execution of the assignment command the environment satisfies the condition P, then the environment prior to the execution of the assignment command also satisfies the condition P but with E substituted for x (In this and the following axioms we assume that the evaluation of expressions does not produce side effects.)."
An examination of the respective pre- and post-conditions for the asssignment statements shows that the axiom is satisfied.To verify the while command of lines 4. 7 and 9, we use the Loop Axiom:
Loop Axiom:
{I /\ B /\ V > 0 } C {I /\ V > V' >= 0}
{I} while B do C end {I /\ ¬ B}
The assertion above the bar is the condition that must be met before the axiom (below the bar) can hold. In this rule, {I} is called the loop invariant. This axiom asserts that:
"To verify a loop, there must be a loop invariant I which is part of both the pre- and post-conditions of the body of the loop and the conditional expression of the loop must be true to execute the body of the loop and false upon exit from the loop."
The invariant for the loop is: S = sumi=1IA[i], I <= n. Lines 6, 7, and 8 satisfy the condition for the application of the Loop Axiom. To prove termination requires the existence of a loop variant. The loop variant is an expression whose value is a natural number and whose value is decreased on each iteration of the loop. The loop variant provides an upper bound on the number of iterations of the loop.A variant for a loop is a natural number valued expression V whose run-time values satisfy the following two conditions:
- The value of V greater than zero prior to each execution of the body of the loop;
- The execution of the body of the loop decreases the value of V by at least one.
The loop variant for this example is the expression n - I. That it is non-negative is guaranteed by the loop continuation condition and its value is decreased by one in the assignment command found on line 7. More general loop variants may be used; loop variants may be expressions in any well-founded set (every decreasing sequence is finite). However, there is no loss in generality in requiring the variant expression to be an integer. Recursion is handled much like loops in that there must be an invariant and a variant. The correctness requirement for loops is stated in the following:
Loop Correctness Principle: Each loop must have both an invariant and a variant.
Lines 5 and 6 and lines 10 and 11 are justified by the Rule of Consequence.Rule of Consequence:
P -> Q, {Q} C {R}, R -> S
{P} C {S}
The justification for the composition the assignment command in line 2 and the while command requires the following the Sequential Composition Axiom.
Sequential Composition Axiom:
{P} C0 {Q}, {Q} C1 {R}
{P} C0; C1 {R}
This axiom is read as follows:
"The sequential composition of two commands is permitted when the post-condition of the first command is the pre-condition of the second command."
The following rules are required to complete the deductive system.Selection Axiom:
{P /\ B} C0 {Q}, {P /\ ¬ B } C1 {Q}
{P} if B then C0 else C1 fi {Q}
Conjunction Axiom:
{P} C {Q}, {P'} C {Q'}
{P /\ P' } C {Q /\ Q'}
Disjunction Axiom:
{P} C {Q}, {P'} C {Q'}
{P \/ P' } C {Q \/ Q'}
The axiomatic method is the most abstract of the semantic methods and yet, from the programmer's point of view, the most practical method. It is most abstract in that it does not try to determine the meaning of a program, but only what may be proved about the program. This makes it the most practical since the programmer is concerned with things like, whether the program will terminate and what kind of values will be computed.
Axiomatics semantics are appropiate for program verification and program derivation.
A denotational definition of a language consists of three parts: the abstract syntax of the language, a semantic algebra defining a computational model, and valuation functions. The valuation functions map the syntactic constructs of the language to the semantic algebra. Recursion and iteration are defined using the notion of a limit. the programming language constructs are in the syntactic domain while the mathematical entity is in the semantic domain and the mapping between the various domains is provided by valuation functions. Denotational semantics relies on defining an object in terms of its constituent parts. The Example N. 5 is an example of a denotational definition.
Example N.5: Denotational definition of Peano Arithmetic Abstract Syntax;
N in Nat (the Natural Numbers)
N ::= 0 | S(N) | (N + N) | (N × N)
Semantic Algebra:Nat (the natural numbers (0, 1, ...)
+ : (Nat, Nat) -> Nat
Valuation Function:D : Nat -> NatD[( n + 0 )] = D[n]
D[( m + S(n) )] = D[(m+n)] + 1
D[( n × 0 )] = 0
D[( m × S(n))] = D[ (( m × n) + m) ]where m,n in Nat
It is is a denotational definition of a fragment of Peano arithmetic. Notice the subtle distinction between the syntactic and semantic domains. The syntactic expressions are mapped into an algebra of the natural numbers by the valuation function. The denotational definition almost seems to be unnecessary. Since the syntax so closely resembles that of the semantic algebra.Programming languages are not as close to their computational model. Example N.6 is a denotational definition of the small imperative programming language Simple .
Example N.6: Denotational semantics for Simple;
Abstract Syntax:
C in Command
E in Expression
O in Operator
N in Numeral
V in VariableC ::= V := E | if E then C1 else C2 end |
while E do C3 end | C1;C2 | skip
E ::= V | N | E1 O E2 | (E)
O ::= + | - | * | / | = | < | > | <>
Semantic Algebra:Domains:
tau in T = {true, false}; the boolean values
zeta in Z = {...-1,0,1,...}; the integers
+ : (Z, Z) -> Z
...
= : (Z, Z) -> T
...
sigma in S = Variable -> Numeral; the state
Valuation Functions:C in C -> (S -> S)
E in E -> E -> (N union T)C[ skip ] sigma = sigma
C[ V := E ] sigma = sigma [ V:E[ E ] sigma
C[ C1; C2 ] = C[ C2 ] · C[ C1]
C[ if E then C1 else C2 end ] sigma
= C[ C1 ] sigma if E[ E ]sigma = true
= C[ C2 ] sigma if E[ E ]sigma = false
C[ while E do C end}]sigma
= limn -> infty C[ (if E then C else skip end)n ] sigma
E[ V ] sigma = sigma(V)
E[ N ] = zeta
E[ E1+E2 ] = E[ E ] sigma + E[ E ] sigma
...
E[ E1=E2 ] sigma = E[ E ] sigma = E[ E ] sigma
Denotational definitions are favored for theoretical and comparative programming language studies. Denotational definitions have been used for the automatic construction of compilers for the programming language. Denotations other than mathematical objects are possible. For example, a compiler writer would prefer that the object denoted would be appropriate object code. Systems have been developed for the automatic construction of compilers from the denotation specification of a programming language.
An operational definition of a language consists of two parts: an abstract syntax and an interpreter. An interpreter defines how to perform a computation. When the interpreter evaluates a program, it generates a sequence of machine configurations that define the program's operational semantics. The interpreter is an evaluation relation that is defined by rewriting rules. The interpreter may be an abstract machine or recursive functions.
Example N.7: Operational semantics for Peano arithmetic;
Abstract Syntax:
N in Nat (the natural numbers)
N ::= 0 | S(N) | (N + N) | (N × N)
Interpreter:I: N -> NI[ ( n + 0 ) ] ==> n
I[ ( m + S(n) ) ] ==> S( I[ (m+n ) ] )
I[ ( n × 0 ) ] ==> 0
I[ ( m × S(n)) ] ==> I[ (( m × n) + m) ]where m,n in Nat
The interpreter is used to rewrite natural number expressions to a standard form (a form involving only S and 0 ) and the rewriting rules show how move the + and × operators inward toward the base cases. Operational definitions are favored by language implementors for the construction of compilers and by language tutorials because operational definitions describe how the actions take place.
Example N.8: Operational semantics for Simple;
Interpreter:
I: C × Sigma -> Sigma
{nu} in E × Sigma} -> T union Z
Semantic Equations:
I(skip,sigma) = sigma
I(V := E,sigma) = sigma[V:nu(E,sigma)]
I(C1 ;C2,sigma) = E(C2,E(C1,sigma))
I(if E then C1 else C2 end,sigma) = I(C1,sigma)&if nu(E,sigma) = true}
I(C2,sigma)&if nu(E,sigma) = false}
while E do C end = if E then (C;while E do C end) else skip
nu(V,sigma) = sigma(V)
nu(N,sigma) = N
nu(E1+E2,sigma) = nu(E1,sigma) + nu(E2,sigma)
...
nu(E1=E2,sigma) = true if nu(E,sigma) = nu(E,sigma)}
false if nu(E,sigma) != nu(E,sigma)}
otherwise
...
The operational semantics are defined by using two semantic functions, I which interprets commands and nu which evaluates expressions. The interpreter is more complex since there is an environment associated with the program with does not appear as a syntactic element and the environment is the result of the computation. The environment (variously called the store or referencing environment}) is an association between variables and the values to which they are assigned. Initially the environment is empty since no variable has been assigned to a value. During program execution each assignment updates the environment. The interpreter has an auxiliary function which is used to evaluate expressions. The while command is given a recursive definition but may be defined using the interpreter instead. Operational semantics are particularly useful in constructing an implementation of a programming language.
The COBOL of formal semantics. Developed by Peter Mosses, with collaboration from David Watt, Action Semantics relies heavily on the foundation that was established by the work in Denotational Semantics. However, Action Semantics uses notation that can be understood at many different levels. Some of the pros and cons of Action Semantics are:
Pros
Cons
- Easy to read;
- Understandable at many levels;
- Easily extendable.
- A little wordy;
- Not good for representing all programming languages (but it is good for most anyway).
Action Semantics describe actions. Actions manipulate data in three forms: transients, bindings, and storage. Here is an action:Example N.9: A calculator example;
||allocate [integer]cell
|then
||bind "M" to the given cell
hence
|||||give the 6
||||and then
||||||give the 5
|||||then
|||||||store the given integer in the cell bound to "M"
||||||and
|||||||regive
|||then
||||give the sum (given integer#1, given integer#2)
||and then
|||give the integer stored in the cell bound to "M"
|then
||give the product (given integer#1, given integer#2)
Values in Action Semantics are categorized by sorts not by types. A sort is like a type. However, when using sorts we don't make any distinction between an element of a sort and a sort. Every element of a sort is a sort itself. For example:
Proper sorts (sorts with more than one member) of data include integer, truth-value, cell, lists, maps, sets, and tuples. Other sorts may be defined as needed. Data is itself a sort, called data. It is a tuple of datum. The sort datum has as its members all the singleton sorts (sorts with only one member, like 6).
- 6 is a subsort of integer, written 6 < integer;
- true < truth-value.
Transients, Bindings, and Storage are all subsorts of data. Briefly, transients are a tuple, bindings are a map of identifiers to bindables (bindables are defined by the language designer) and storage is a set of cells.
A yielder is a funtion that yields data, possibly depending on the transients, bindings, or storage. Yielders take syntactic objects and yield semantic objects. For instance:
Actions manipulate data in three forms: transients, bindings, and storage. Each action has input transients, bindings, and storage and output transients, bindings, and storage. Each of these data structures are described by a facet of Action Semantics. The three facets are the:
- 6 is a yielder that yields the integer 6;
- given integer is a yielder that depends on the current transients.
The functional facet is used to produce output transients. For example, the action: give 6 produces an output transient tuple (6). Here 6 is a yielder the evaluates to the semantic object 6. Another example is: give the given integer. Assuming that the input transient is (6) then the output transient is (6). This action is an identity if the input transient contains one integer. Another action is: give the given data. This action is the identity action for any input transient. It can be abbreviated as: regive. Assuming that the input transient is (5,6) the output transient of the following action would be (11): give sum(given integer#1,given integer#2).
- Functional facet - describes the input and output transients;
- Declarative facet - describes the input and output bindings;
- Imperative facet - describes the input and output storage.
As mentioned above, there are yielders that produce transients depending on the current (input) transients. For instance, in the action: give the given integer#1 the yielder the given integer#1 assumes that the first datum in the input transient is a subsort of integer. In contrast, the yielder the given integer assumes that the input tuple contains one datum, an integer.
One of the goals of Action Semantics is to be readable. To this end, prepositions (a, an, of, the) are defined as identity functions in Action Semantics. So, the example: give sum(given integer#1,given integer#2) can be rewritten as: give the sum of (the given integer#1,the given integer#2) without changing its meaning.
The declarative facet is used to generate bindings. Bindings are a map between identifiers (tokens) and a sort called bindable. The bindable sort is user definable, but is usually the sort join of cells and truth-values and integers. Sort join is the union of sorts.
For instance: bind "M" to the given cell is an action that expects an input transient containing one cell and generates an output binding of "M" to that cell.
As with transients, there are yielders that yield data depending on the current bindings as well. For instance, the action: give the cell bound to "M" expects there to be a cell bound to the identifier "M" in the current bindings.
The imperative facet is used to alter storage (the contents of cells). For instance, the action:
store the given integer in the cell bound to "M" alters the contents of the cell bound to "M" to contain the integer given in the current transients.As with transients and bindings, there are yielder for storage, too. The action:
give the integer stored in the cell bound to "M" yields an integer if an integer is stored in this cell in storage.Actions have outcomes. The two most common outcomes are completing and failing. Generally, actions complete. But, it is possible for an action to fail.
For instance, consider the following action: give the given integer. This action expects the current transients to contain one datum, an integer. If this expectiation is not met, then the action will fail. Otherwise, it will complete.
Actions can be combined into more complex actions. For instance, the action: give 6 and give 5 is an action made up of two actions. The actions were combined using a combinator called and. Combinators have three aspects: how they propagate transients, bindings, and storage. The combined action above consists of the and combinator and two subactions. The combinator chosen (in this case and) effects how the input transients are passed to the two subactions. It also effects how the output transients from the two subactions are combined to make the output transients of the combined action. Combinators also effect how bindings are propogated through an action and how storage is accessed and altered. Other combinators include then, and then, hence, and a number of others not mentioned here.