Introduction
Compiled Program Area
The Bytecode Format
Rool Stack Area
Object Representation
Data Types
Instruction Set
Interpreter Code
ROOL INTERPRETER

Adolfo Duran, Ana Cavalcanti and Augusto Sampaio
Federal University of Pernambuco
Centre of Informathics
Pernambuco - Brazil


The purpose here is to present an interpreter for bytecodes written using ROOL. ROOL is an object-oriented language similar to Java, but with a copy rather than a reference semantics. The bytecode stream that serves as an input for the interpreter corresponds to the target code resulting from the compilation process of an arbitrary source program written in ROOL.








Introduction

The ROOL interpreter models a cyclic mechanism which executes one instruction at a time. Each command in the source language is converted to the target code as a bytecode stream. The bytecodes corresponding to the main command of a ROOL source program serves as the starting point for the application's thread.

Only one thread will be running at a given time. Unlike the JVM, the initial thread cannot fire off other threads. The initial thread is always executing a a very simple loop which first fetches the next instruction to be executed and then invokes the corresponding method associated in order to simulate the effect of such instruction on the internal data structure of the interpreter. All instructions available for the interpreter are implemented as methods defined in the class RoolVirtualMachine class.

Target Code

The code generation phase is under development. The compilation of the source code is carried out by a series of refinement steps that entails a series of semantic-preserving transformations. Although the algebraic transformation rules are not completely specified yet, it is assumed that the ROOL Interpreter receives the target code stored in the global variable called RVM.

The global variable RVM

The global variable RVM (an instance of RoolVirtualMachine class) class) basically contains two data areas (Figure 1): the list of compiled classes (Compiled Program Area) resulting from the compilation process and the list of frames (Rool Stack Area) that will keep track of the course of the execution flow.

 

back to top of page

Compiled Program Area

The Compiled Program Area is formed by two instance variables, named FstClass and Initial. The FstClass variable is an instance of ClassList. Instances of ClassList form a list containing all compiled classes (Figure 2). Initial is an instance of ClassInfo (Figure 3) and indicates the starting point for the interpreter. To start the execution, the ROOL interpreter must retrieve and process the first bytecode in the only method found in that class. This method comprises the bytecodes corresponding to the command c that follows the class declarations in a ROOL program (cds· c).

 

 

 

Each instance of ClassInfo (Figure 3) contains the following fields:

  • Name – the class name;

  • CP – a reference to a list which holds the constant pool;

  • Super – its superclass

  • FstMethod – a reference to the list of methods declared in the class.

  • FstField – a reference to the list of variables defined in the class.

 

 

 

 

The Constant Pool

The constant pool provides much of the essential information needed by a class. A constant pool contains entries for the name of classes, methods and fields, to integer constants. For each instance of ClassInfo referenced in the global variable RVM, there must exist an associated constant pool.

A constant pool is an heterogeneous list of references used by the Class, including literals and symbolic references to types, fields, and methods. Entries in the constant pool are instances of CPList(Figure 4) and are referenced by index, much like the elements of an array.

 

A class entry holds an instance of CpEntryClass which contains the following information:

  • Class name

  • Pointer to the class type

A field entry holds an instance of CpEntryFieldref which contains the following information:

  • Pointer to the instance of class (ClassInfo) where the field is defined

  • Field name

  • Descriptor (int or <class name>)

  • The modifiers ( private or protected)

A method entry holds an instance of CpEntryMethodref which contains the following information:

  • Pointer to the instance of class (ClassInfo) where the method is defined

  • Method name

  • Descriptor (list of arguments descriptors and for each argument, the form they are being passed (value, result or value-result)

A constant entry holds an instance of DataInfoInt which contains an integer value.

 

 

Method Information

Instances of MethodList are used to hold the list of methods declared in the class (Figure 5).

As with fields, the order in which the methods are declared by the class is also recorded as well as the data. For each method declared in the class (Figure 6), the following information must be stored in the corresponding instance of MethodInfo.

  • The bytecodes. A doubly linked list is needed to store the bytecodes due to the negative branches that might occur when executing control flow instructions.

  • A reference to its entry in the constant pool

 

Field Information

Instances of FieldList are used to hold the list of fields declared in the class (Figure 7).

The order in which the fields are declared by the class is also recorded. For each field declared in the class (Figure 8), the following information must be stored in the corresponding instance of FieldInfo:

  • A reference to its entry in the constant pool

back to top of page

The Bytecode Format

Bytecodes are the machine language of the ROOL Virtual Machine. When a virtual machine executes a program, it gets one stream of bytecodes for each method in the class. The bytecode streams are stored as a list (Figure 5) in the method area of the global variable RVM. The bytecodes for a method are executed when that method is invoked during the course of running the program.

A method's bytecode stream is a list of instructions (Figure 6) for the virtual machine. Each instruction consists of a one-byte opcode followed by zero or more operands. The opcode indicates the action to take. If more information is required before the virtual machine can take the action, that information is encoded into one or more operands that immediately follow the opcode. Each item of a list of bytecode holds an integer which represents an opcode or an operand.

Each type of opcode has a mnemonic. In the typical assembly language style, streams of ROOL bytecodes can be represented by their mnemonics followed by any operand values.

All computation in the virtual machine is centred on the operand stack. Because the virtual machine has no registers for storing arbitrary values, everything must be pushed onto the stack before it can be used in a calculation. Bytecode instructions therefore operate primarily on the operand stack.

 

back to top of page

ROOL Stack Area

The ROOL Stack Area is composed by instances of FrameInfo. An instance of FrameInfo contains the state of one ROOL method invocation. When a method is invoked, the virtual machine creates a new frame onto that ROOL’s FrameList (Figure 9). When the method completes, the virtual machine discards the frame for that method.

A list of instances of the class FrameList records the order in which the pendent methods were invoked (Figure 9). The current frame is indicated by the variable TopFrame. The TopFrame is the item of a list being used by the currently executing method.

 

The state of a ROOL method invocation includes its own pc register (which points to the method’s bytecode list and indicates the next instruction to be executed), its local variables, its operand stack, the parameters with which it was invoked, their return values (if any), and intermediate calculations which are stored at the operand stack. When the ROOL interpreter starts running, the pc value in the initial frame points to the beginning of the bytecode stream which represents the command c in terms of virtual machine instructions. The execution ends when the pc reaches a null bytecode value. During the execution of the code in the initial frame, the control flow may pass to other frames, according to the method call invocations, as already explained.

Each instance of FrameInfo (Figure 10) consists of three sections - the operand stack, the execution environment, and the local variables. Local variables and the operand stack are also stored as lists.

Pushing a value onto the operand stack of the current FrameInfo is the same as creating a new instance of the operand item (OpStackInfo), set a value in its data field, and link this new item to the top (TopOpStack) of the operand stack.

All instances of LocalList are created when the method is invoked. The local variables values can be updated, but no new local variable can be created. during the execution of the code in the current frame.

 

Invoking Methods

After the invoke opcode there is an index of an entry in the constant pool of the class whose method is currently executing. This entry is an instance of CpentryMethodref and contains the name (MtdName) of the method to be executed and its descriptor.

Operand Stack:
 
 

before after
objectref result1
argN result2
... ...
arg2 resultn
arg1 ...
...  

 

Figure 11: invoking a Method

The actual method run depends on the runtime type of the object the instruction invoke is associated with. Objectref is at the top of the operand stack, and represents a reference to the object whose method is being called. The invoke instruction retrieves the ROOL class for objectref, and searches the list of methods defined by that class and then its superclasses, looking for a method with the same name.

Once a method has been found, a new frame is established on the stack frame. Then the arguments for the method, which are popped from the current operand stack, are placed in local variables of the new frame structure. Objectref is stored in the first element of the local variable list, followed by the arg1, arg2 and so on. Finally, execution continues at the first instruction in the bytecode of the new method.

When the method called by the invoke returns, the results (if any) are placed on the operand stack of the current method and execution continues at the instruction that follows the invoke instruction in the bytecode (Figure 11).

Pushing/Poping local variables onto/from the operand stack

Pushing a local variable onto the stack actually involves moving a value from the local variables section of the stack frame to the operand section. All local variables are instances of the class Object, so they can hold any object reference, including an object that holds integer values (DataInfoInt).

Although ROOL opcodes generally indicate the type of their operands, instead of having several opcodes that push a local variable onto the stack, the virtual machine has just one, the opcode load, no matter if they store references to objects or values of primitive types. Analogously, to pop a value of any type from the top of the stack to a local variable, the virtual machine uses the the opcode store.

Each local variable of a method has a unique index. The local variable section of a method's stack frame can be thought of as list of instances of LocalList. Each element in such list is addressable by the index that immediately follows the opcode load or store.

 

back to top of page

Object Representation

An instance of ObjectInfo holds an object created during the execution. Such instance is referenced from the stack frame, more precisely from the stack operand or the local variable list. The memory allocated for an object usually includes two pointers:

  • A pointer to the class type

  • A pointer to the list of instance variables declared in the object's class and all its superclasses.

Given an object reference, the virtual machine must be able to locate the instance data for the object. In addition, there must be some way to access an object's class data given a reference to the object.

This approach to object representation is implemented using instances of ObjectInfo as shown graphically in Figure 12.

The virtual machine needs to get from an object reference to the object class data for several reasons. For example, when a running program attempts to peforms an instanceof operation, the virtual machine must check whether the type is the actual class of the referenced object or one of its superclasses. In this case, the virtual machine must look into the class data of the referenced object. When a program invokes a method, the virtual machine must perform dynamic binding: it must choose the method to invoke based not on the type of the reference but on the class of the object. To do this, it must once again have access to the class data given only a reference to the object. The field Cl, defined in ObjectInfo, yields the actual class of the referenced object.

Each instance of ObjFieldInfo holds information about the corresponding field of an object, such as Data, Name and Visibility (Figure 14).

back to top of page

Data Types

In this first version, there are less primitive types of the virtual machine than primitive types of the ROOL Language. Although boolean qualifies as a primitive type of the virtual machine, the instruction set has no support for it. When a compiler translates ROOL source code into bytecodes, it uses ints to represent booleans. In the virtual machine, false is represented by integer zero and true by any non-zero integer. Operations involving boolean values use ints.

 

The reference type of the virtual machine is named reference. Values of type reference has the class type. This type has values that are references to dynamically created objects. These values are references to class instances. A special reference value is the null value, which indicates that the reference variable does not refer to any object.

back to top of page

Instruction Set

The virtual machine instruction consists of one instance of bytecode (opcode) specifying the operation to be performed followed by zero or more instances of bytecodes supplying arguments or data that are used by the operation. The number and the type of the additional operands are determined by the opcode.

Many instructions have no operands and consist only of an opcode.

Arithmetic Instructions.

The arithmetic instructions compute a result that is typically a function of two values on the operand stack, pushing the value back on the stack. There is direct suport only for integer arithmetic.

The arithmetic instructions are as follows:

Opcode Operand(s) Description
iaad value1, value2 Add ints
isub value1, value2 Subtract ints
imul value1, value2 Multiply ints
idiv value1, value2 Divide ints
ineg value1 Negate int

Branch Instructions.

The virtual machine contains a number of branch instructions including a set of conditional and non-conditional branch instruction.

The instructions if<cond> branchs if int comparison with zero succeeds. The value must be of type int. It is popped from the operand stack and compared against zero. The results of the comparisons are as follows:

Opcode Operand(s) Description
ifeq offset Branch if int value =  0
ifne offset Branch if int value <>  0
iflt offset Branch if int value <  0
ifle offset Branch if int value <=  0
ifgt offset Branch if int value >  0
ifge offset Branch if int value >=  0

The instructions ificmp<cond> branchs if int comparison succeeds. Two int values are popped from the operand stack and compared. The results of the comparisons are as follows:

Opcode Operand(s) Description
ificmpeq offset Branch if  value1 =  value2
ificmpne offset Branch if  value1 <>  value2
ificmplt offset Branch if  value1 <  value2
ificmple offset Branch if  value1 <=  value2
ificmpgt offset Branch if  value1 >  value2
ificmpge offset Branch if  value1 >=  value2

The goto instruction branchs always.

Opcode Operand(s) Description
goto offset Branch to the instruction indicated by the offset

The offset, which can be a positive or negative integer value, is used to reach the next instruction to be executed. The new pc value is adjusted by moving through the doubly linked list of bytecodes according to the value indicated by the offset. A positive value indicates a forward shift, a negative value indicates backward shift

 

Pushing constants onto the stack


Some opcodes can push constants onto the stack. Opcodes indicate the constant value to push in three different ways. The constant value is either implicit in the opcode itself or is taken from the constant pool.

Some opcodes by themselves indicate a type and constant value to push. For example, the iconst_2 opcode tells the virtual machine to push integer value two. They increase the efficiency of bytecode execution and reduce the size of bytecode streams. The opcodes that push ints are iconst_0, iconst_1, iconst_2, iconst_3, iconst_4 and iconst_5, which pushes the value which appears as suffix to their names, and iconst_m1, which pushes the constant –1 into the stack. None of these opcodes has operands.

Opcode Operand(s) Description
iconst_m1  (none) pushes int -1 onto the stack
iconst_0 (none) pushes int 0 onto the stack
iconst_1 (none) pushes int 1 onto the stack
iconst_2 (none) pushes int 2 onto the stack
iconst_3 (none) pushes int 3 onto the stack
iconst_4 (none) pushes int 4 onto the stack
iconst_5 (none) pushes int 5 onto the stack

Another opcode pushes an implicit constant value onto the stack. The aconst_null opcode, which has no operands, pushes a null object reference onto the stack. The aconst_null opcode is used in the process of assigning null to an object reference variable.

Opcode Operand(s) Description
aconst_null (none) pushes a null object reference onto the stack

Another opcode pushes constants from the constant pool. Opcodes that push constants from the constant pool have operands that indicate which constant to push by specifying a constant pool index. The virtual machine will look up the constant given the index and push it onto the stack.

The constant pool index is a value that immediately follows the opcode in the bytecode stream. Opcode lcd (load constant) has one operand and loads an int constant onto the stack as shown in the following table.

Opcode Operand(s) Description
ldc indexbyte pushes a constant_pool entry specified by index onto the stack

Object Creation and Manipulation

To create a new class instance:

Opcode Operand(s) Description
new Index to constant pool Create new object

To access field of class instances

Opcode Operand(s) Description
getfield Index to constant pool Fetch field from object
putfield Index to constant pool Set field in object

Method Invocation and return Instructions

One instruction invoke methods:

Opcode Operand(s) Description
invoke Index to constant pool Invoke a method

The method return instruction is:

Opcode Operand(s) Description
return Index to constant pool Fetch field from object

Operand Stack Management Instructions

To pop the top item off the operand stack and discard it:

Opcode Operand(s) Description
pop none discard  top item on operand stack

To pop the top value from the operand stack and push that value twice:

Opcode Operand(s) Description
dup none Duplicate the top item on the operand stack

Miscellaneous

To test whether an object reference belongs to a given class:

Opcode Operand(s) Description
instanceof Index  Test if object belongs to a given class

To do nothing:

Opcode Operand(s) Description
nop none Do nothing

Type conversions

The virtual machine does not have any opcode that convert from one primitive type to another.

back to top of page








Interpreter Code

Main Program

Classes

Instructions