Compiled Program Area
The Bytecode Format
Rool Stack Area
Federal University of Pernambuco
Centre of Informathics
Pernambuco - Brazil
The purpose here is to present an interpreter for bytecodes written using ROOL. ROOL is an object-oriented language similar to Java, but with a copy rather than a reference semantics. The bytecode stream that serves as an input for the interpreter corresponds to the target code resulting from the compilation process of an arbitrary source program written in ROOL.
The ROOL interpreter models a cyclic mechanism which executes one instruction at a time. Each command in the source language is converted to the target code as a bytecode stream. The bytecodes corresponding to the main command of a ROOL source program serves as the starting point for the application's thread.
Only one thread will be running at a given time. Unlike the JVM, the initial thread cannot fire off other threads. The initial thread is always executing a a very simple loop which first fetches the next instruction to be executed and then invokes the corresponding method associated in order to simulate the effect of such instruction on the internal data structure of the interpreter. All instructions available for the interpreter are implemented as methods defined in the class RoolVirtualMachine class.
The code generation phase is under development. The compilation of the source code is carried out by a series of refinement steps that entails a series of semantic-preserving transformations. Although the algebraic transformation rules are not completely specified yet, it is assumed that the ROOL Interpreter receives the target code stored in the global variable called RVM.
The global variable RVM
The global variable RVM (an instance of RoolVirtualMachine class) class) basically contains two data areas (Figure 1): the list of compiled classes (Compiled Program Area) resulting from the compilation process and the list of frames (Rool Stack Area) that will keep track of the course of the execution flow.
back to top of page
The Compiled Program Area is formed by two instance variables, named FstClass and Initial. The FstClass variable is an instance of ClassList. Instances of ClassList form a list containing all compiled classes (Figure 2). Initial is an instance of ClassInfo (Figure 3) and indicates the starting point for the interpreter. To start the execution, the ROOL interpreter must retrieve and process the first bytecode in the only method found in that class. This method comprises the bytecodes corresponding to the command c that follows the class declarations in a ROOL program (cds· c).
Each instance of ClassInfo (Figure 3) contains the following fields:
The Constant Pool
The constant pool provides much of the essential information needed by a class. A constant pool contains entries for the name of classes, methods and fields, to integer constants. For each instance of ClassInfo referenced in the global variable RVM, there must exist an associated constant pool.
A constant pool is an heterogeneous list of references used by the Class, including literals and symbolic references to types, fields, and methods. Entries in the constant pool are instances of CPList(Figure 4) and are referenced by index, much like the elements of an array.
A class entry holds an instance of CpEntryClass which contains the following information:
A field entry holds an instance of CpEntryFieldref which contains the following information:
A method entry holds an instance of CpEntryMethodref which contains the following information:
A constant entry holds an instance of DataInfoInt which contains an integer value.
Instances of MethodList are used to hold the list of methods declared in the class (Figure 5).
As with fields, the order in which the methods are declared by the class is also recorded as well as the data. For each method declared in the class (Figure 6), the following information must be stored in the corresponding instance of MethodInfo.
Instances of FieldList are used to hold the list of fields declared in the class (Figure 7).
The order in which the fields are declared by the class is also recorded. For each field declared in the class (Figure 8), the following information must be stored in the corresponding instance of FieldInfo:
back to top of page
Bytecodes are the machine language of the ROOL Virtual Machine. When a virtual machine executes a program, it gets one stream of bytecodes for each method in the class. The bytecode streams are stored as a list (Figure 5) in the method area of the global variable RVM. The bytecodes for a method are executed when that method is invoked during the course of running the program.
A method's bytecode stream is a list of instructions (Figure 6) for the virtual machine. Each instruction consists of a one-byte opcode followed by zero or more operands. The opcode indicates the action to take. If more information is required before the virtual machine can take the action, that information is encoded into one or more operands that immediately follow the opcode. Each item of a list of bytecode holds an integer which represents an opcode or an operand.
Each type of opcode has a mnemonic. In the typical assembly language style, streams of ROOL bytecodes can be represented by their mnemonics followed by any operand values.
All computation in the virtual machine is centred on the operand stack. Because the virtual machine has no registers for storing arbitrary values, everything must be pushed onto the stack before it can be used in a calculation. Bytecode instructions therefore operate primarily on the operand stack.
back to top of page
The ROOL Stack Area is composed by instances of FrameInfo. An instance of FrameInfo contains the state of one ROOL method invocation. When a method is invoked, the virtual machine creates a new frame onto that ROOLs FrameList (Figure 9). When the method completes, the virtual machine discards the frame for that method.
A list of instances of the class FrameList records the order in which the pendent methods were invoked (Figure 9). The current frame is indicated by the variable TopFrame. The TopFrame is the item of a list being used by the currently executing method.
The state of a ROOL method invocation includes its own pc register (which points to the methods bytecode list and indicates the next instruction to be executed), its local variables, its operand stack, the parameters with which it was invoked, their return values (if any), and intermediate calculations which are stored at the operand stack. When the ROOL interpreter starts running, the pc value in the initial frame points to the beginning of the bytecode stream which represents the command c in terms of virtual machine instructions. The execution ends when the pc reaches a null bytecode value. During the execution of the code in the initial frame, the control flow may pass to other frames, according to the method call invocations, as already explained.
Each instance of FrameInfo (Figure 10) consists of three sections - the operand stack, the execution environment, and the local variables. Local variables and the operand stack are also stored as lists.
Pushing a value onto the operand stack of the current FrameInfo is the same as creating a new instance of the operand item (OpStackInfo), set a value in its data field, and link this new item to the top (TopOpStack) of the operand stack.
All instances of LocalList are created when the method is invoked. The local variables values can be updated, but no new local variable can be created. during the execution of the code in the current frame.
After the invoke opcode there is an index of an entry in the constant pool of the class whose method is currently executing. This entry is an instance of CpentryMethodref and contains the name (MtdName) of the method to be executed and its descriptor.
Figure 11: invoking a Method
The actual method run depends on the runtime type of the object the instruction invoke is associated with. Objectref is at the top of the operand stack, and represents a reference to the object whose method is being called. The invoke instruction retrieves the ROOL class for objectref, and searches the list of methods defined by that class and then its superclasses, looking for a method with the same name.
Once a method has been found, a new frame is established on the stack frame. Then the arguments for the method, which are popped from the current operand stack, are placed in local variables of the new frame structure. Objectref is stored in the first element of the local variable list, followed by the arg1, arg2 and so on. Finally, execution continues at the first instruction in the bytecode of the new method.
When the method called by the invoke returns, the results (if any) are placed on the operand stack of the current method and execution continues at the instruction that follows the invoke instruction in the bytecode (Figure 11).
Pushing/Poping local variables onto/from the operand stack
Pushing a local variable onto the stack actually involves moving a value from the local variables section of the stack frame to the operand section. All local variables are instances of the class Object, so they can hold any object reference, including an object that holds integer values (DataInfoInt).
Although ROOL opcodes generally indicate the type of their operands, instead of having several opcodes that push a local variable onto the stack, the virtual machine has just one, the opcode load, no matter if they store references to objects or values of primitive types. Analogously, to pop a value of any type from the top of the stack to a local variable, the virtual machine uses the the opcode store.
Each local variable of a method has a unique index. The local variable section of a method's stack frame can be thought of as list of instances of LocalList. Each element in such list is addressable by the index that immediately follows the opcode load or store.
back to top of page
An instance of ObjectInfo holds an object created during the execution. Such instance is referenced from the stack frame, more precisely from the stack operand or the local variable list. The memory allocated for an object usually includes two pointers:
Given an object reference, the virtual machine must be able to locate the instance data for the object. In addition, there must be some way to access an object's class data given a reference to the object.
This approach to object representation is implemented using instances of ObjectInfo as shown graphically in Figure 12.
The virtual machine needs to get from an object reference to the object class data for several reasons. For example, when a running program attempts to peforms an instanceof operation, the virtual machine must check whether the type is the actual class of the referenced object or one of its superclasses. In this case, the virtual machine must look into the class data of the referenced object. When a program invokes a method, the virtual machine must perform dynamic binding: it must choose the method to invoke based not on the type of the reference but on the class of the object. To do this, it must once again have access to the class data given only a reference to the object. The field Cl, defined in ObjectInfo, yields the actual class of the referenced object.
Each instance of ObjFieldInfo holds information about the corresponding field of an object, such as Data, Name and Visibility (Figure 14).
back to top of page
In this first version, there are less primitive types of the virtual machine than primitive types of the ROOL Language. Although boolean qualifies as a primitive type of the virtual machine, the instruction set has no support for it. When a compiler translates ROOL source code into bytecodes, it uses ints to represent booleans. In the virtual machine, false is represented by integer zero and true by any non-zero integer. Operations involving boolean values use ints.
The reference type of the virtual machine is named reference. Values of type reference has the class type. This type has values that are references to dynamically created objects. These values are references to class instances. A special reference value is the null value, which indicates that the reference variable does not refer to any object.back to top of page
The virtual machine instruction consists of one instance of bytecode (opcode) specifying the operation to be performed followed by zero or more instances of bytecodes supplying arguments or data that are used by the operation. The number and the type of the additional operands are determined by the opcode.
Many instructions have no operands and consist only of an opcode.
The arithmetic instructions compute a result that is typically a function of two values on the operand stack, pushing the value back on the stack. There is direct suport only for integer arithmetic.
The arithmetic instructions are as follows:
The virtual machine contains a number of branch instructions including a set of conditional and non-conditional branch instruction.
The instructions if<cond> branchs if int comparison with zero succeeds. The value must be of type int. It is popped from the operand stack and compared against zero. The results of the comparisons are as follows:
The instructions ificmp<cond> branchs if int comparison succeeds. Two int values are popped from the operand stack and compared. The results of the comparisons are as follows:
The goto instruction branchs always.
The offset, which can be a positive or negative integer value, is used to reach the next instruction to be executed. The new pc value is adjusted by moving through the doubly linked list of bytecodes according to the value indicated by the offset. A positive value indicates a forward shift, a negative value indicates backward shift
Pushing constants onto the stack
Some opcodes by themselves indicate a type and constant value to push. For example, the iconst_2 opcode tells the virtual machine to push integer value two. They increase the efficiency of bytecode execution and reduce the size of bytecode streams. The opcodes that push ints are iconst_0, iconst_1, iconst_2, iconst_3, iconst_4 and iconst_5, which pushes the value which appears as suffix to their names, and iconst_m1, which pushes the constant 1 into the stack. None of these opcodes has operands.
Another opcode pushes an implicit constant value onto the stack. The aconst_null opcode, which has no operands, pushes a null object reference onto the stack. The aconst_null opcode is used in the process of assigning null to an object reference variable.
Another opcode pushes constants from the constant pool. Opcodes that push constants from the constant pool have operands that indicate which constant to push by specifying a constant pool index. The virtual machine will look up the constant given the index and push it onto the stack.
The constant pool index is a value that immediately follows the opcode in the bytecode stream. Opcode lcd (load constant) has one operand and loads an int constant onto the stack as shown in the following table.
Object Creation and Manipulation
To create a new class instance:
To access field of class instances
Method Invocation and return Instructions
One instruction invoke methods:
The method return instruction is:
Operand Stack Management Instructions
To pop the top item off the operand stack and discard it:
To pop the top value from the operand stack and push that value twice:
To test whether an object reference belongs to a given class:
To do nothing:
The virtual machine does not have any opcode that convert from one primitive type to another.back to top of page