tinySelfEE 2021-07; let's throw away the Symbolic eval code
This will be a weird blogpost, here is some context:
I've been working on tinySelfEE for some time, but basically, I didn't touch it for more than six months now. And I feel like I don't really know how to start, or what I was even working on before. That weird feeling, when you find ancient artifacts left by strange civilization, which was yourself in the past.
This means its exploration time. I am writing this blogpost to capture my thought process and to make myself work on the tinySelfEE again. It should provide me some context of where I am and what am I doing.
cloc utility tells me that there are 3779 lines of java. I've somehow persuaded Idea to give me this diagram of classes used in the project:
From what I remember, tokenizer is working fine. Parser is mostly done, there is still stuff that could be improved, but at the moment, it works.
I have the project set up so that when I run it, it parses the input file called simple_send.tself
:
(|
slot: parameter = (| var = 1. |
(parameter + 1) printString.
parameter + 1.
).
|) slot: 1
It defines an object, which has one keyword slot called poetically slot:
, which takes one parameter
and defines one unused local variable var
. When called at the last line, it should print parameter + 1
(⇒ 2
) and then also return the resulting value 2
.
When I run it, it prints the AST:
Send{obj=Obj{parents=null, slots={slot:=Obj{parents=null, slots={var=NumberInt{value=1}}, arguments=[parameter], code=[Send{obj=Send{obj=Send{obj=Self{}, message=MessageUnary{message_name='parameter'}}, message=MessageBinary{message_name='+', parameter=NumberInt{value=1}}}, message=MessageUnary{message_name='printString'}}, Send{obj=Send{obj=Self{}, message=MessageUnary{message_name='parameter'}}, message=MessageBinary{message_name='+', parameter=NumberInt{value=1}}}]}}, arguments=null, code=null}, message=MessageKeyword{message_name='slot:', parameter=[NumberInt{value=1}]}}
AST is basically a tree consisting of Send, which takes as a first parameter the definition of the object, and as a second parameter definition of a message.
Man, I really wish I implemented printing to the plantuml syntax because this is pretty unreadable. Lets put it into TODO.
That seems to be working. Then there is a result of compilation to the symbolic structure:
SymbolicSend(
SymbolicObject(
id: 0,
version: 1,
slots = [
"slot:" = SymbolicObject(
id: 1,
version: 1,
arguments = [
"parameter",
],
slots = [
"var" = 1,
],
code = [
SymbolicSend(
SymbolicSend(
SymbolicSend(
(default) self,
SymbolicMessage(
message: "parameter",
),
),
SymbolicMessage(
message: "+",
arguments = [
1,
],
),
),
SymbolicMessage(
message: "printString",
),
),
SymbolicSend(
SymbolicSend(
(default) self,
SymbolicMessage(
message: "parameter",
),
),
SymbolicMessage(
message: "+",
arguments = [
1,
],
),
),
],
),
],
)
SymbolicMessage(
message: "slot:",
arguments = [
1,
],
),
)
Hm. Lovely. I think I wanted to "compile" the AST into a more simple structure of lists of SymbolicMessage
-s and nested SymbolicSend
-s.
Which seems to be working, at least for this short example. I remember that I almost lost myself in the process because it required me to implement compile methods for all AST items, and it got weird with inheritance and interfaces.
I had to create a parallel structure for the compiled AST tree, consisting of Symbolic*
classes.
SymbolicObject
SymbolicBlock
SymbolicSend
SymbolicResend
SymbolicMessage
SymbolicReturn
Each of them have an .accept()
method for SymbolicVisitor
, which is used by the debug printers and also by the SymbolicCompiler
. They also have an .evaluate()
method which is not yet implemented and should be used for symbolic evaluation. Then there are classes working with these:
SymbolicCompiler
, which takes AST and outputs more linearized structure consisting of symbolic classes.
SymbolicFrame
, which should be used for evaluation.
SymbolicPrinter
for printing the debug output you could see above.
I vaguely remember that I've implemented most of the SymbolicObject
, so that the slot lookup and parent lookup worked.
Let's work on it
Ok. So, what is missing. From what I understand now, the evaluation should be just a matter of calling .evaluate()
on the root of the output from SymbolicCompiler
. Lets try that.
My Main
class calles this method:
private static void runFile(String file_path) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(file_path));
ArrayList<ASTItem> ast = parseSourceAndPrintErrors(new String(bytes, StandardCharsets.UTF_8));
printRawAst(ast);
SymbolicCompiler compiler = new SymbolicCompiler();
compiler.compile(ast);
printSymbolicRepresentation(compiler.getCode());
}
Let's add evaluation:
System.out.println("---");
System.out.println("Symbolic evaluation time:\n");
SymbolicObject namespace = new SymbolicObject();
SymbolicFrame frame = new SymbolicFrame();
for (SymbolicEvalProtocol item : symbolic_code) {
item.evaluate(namespace, frame);
}
I've added an empty namespace
object, which will in time hold default objects required for the interpreter to be able to do anything useful.
And I've added toplevel frame
. I call it and it does .. nothing. I can't say I expected more. Let's look at the implementation of the SymbolicSend
's evaluate method:
public void evaluate(SymbolicObject namespace, SymbolicFrame frame) {
}
Unsurprisingly, it's empty. Ok.
Symbolic evaluation
Hmm, how should it work. I can see, that the SymbolicObject
has this implementation of the .evaluate()
method:
@Override
public void evaluate(SymbolicObject namespace, SymbolicFrame frame) {
frame.push(this);
}
When called, it adds itself on top of the stack in the frame
. It seems that it would be a good idea to call .evaluate()
on the first object (receiver
) in the symbolic send. This will also work for the case when the receiver
is other SymbolicSend
. But it won't work for the cases, where receiver
is self
.
Because in the case we want to push self
on the top of the stack, we have to know what is self
. Hm.
When I look at the code, I can see that the Self
AST is not compiled into the symbolic representation, but it is recognized in the SymbolicSend
that it is meant for self
. This means that I just have to have that self
stored somewhere.
Okay, think about it. When the message is sent to the object, it should be looked in the object itself, and if not found, in all the parents. But if it is not found in the parent tree, it should look into the namespace. Which means:
- Object should somehow have context.
- In tinySelf, I've injected this by the
IntermediateParamsObject
. I think. Maybe.
- Or I could inject the namespace as the frame at the top of the execution stack. Hm.
- In tinySelf, I've injected this by the
Bleh. I am starting to be anxious, and I feel great feel to go procrastinate. I solved and invented so much stuff in the tinySelf, that it feels really, really bad to reinvent it again in the tinySelfEE. Let's look at the ._do_send()
implementation in tinySelf:
obj = self.process.frame.pop()
self._set_scope_parent_if_not_already_set(obj, code)
and:
def _set_scope_parent_if_not_already_set(self, obj, code):
if obj.scope_parent is None:
obj.scope_parent = self.universe
Aha. The object to which the message is sent is taken from the top of the stack frame. And the global namespace is inserted into the .scope_parent
property if it is not already set. That makes sense.
And there is _do_push_self()
instruction, which takes the self
from the frame. So the frame knows what is self
. Ok.
def _do_push_self(self, bc_index, code_obj):
self.process.frame.push(self.process.frame.self)
return ONE_BYTECODE_LONG
And since it is bytecode inrepreter and objects are literals, when pushing new object to the frame, it stores the self
:
elif literal_type == LITERAL_TYPE_OBJ:
assert isinstance(boxed_literal, ObjBox)
obj = boxed_literal.value.clone()
if self.process.frame.self is None:
self.process.frame.self = obj
Ok, this is an ugly solution. For one, I don't understand why I didn't solve this in the frame itself. I've updated the push()
method of the frame to this:
ObjectRepr self;
boolean has_self = false;
public void push(ObjectRepr obj) {
obj_stack.add(obj);
pointer++;
if (! has_self) {
self = obj;
has_self = true;
}
}
public void pushSelf() {
push(self);
}
What next. Update .evaluate()
of the SymbolicSend
to call object's .evaluate()
to push it on top of the frame and make it self
. Ok. It now looks like this:
public void evaluate(SymbolicObject namespace, SymbolicFrame frame) {
if (send_to_self) {
frame.pushSelf();
} else {
receiver.evaluate(namespace, frame);
}
}
Now it would be a good idea to see if it really works and how the frame looks like when this is executed.
*After some bugfixes*
Frames:
SymbolicFrame(
depth: 0,
self: Object,
Object,
)
Revelation
Now I look at the result and think why I even need stack in the symbolic evaluation. Did I just used it for symbolic evaluation because I had it in tinySelf? When crunching bytecodes, it makes perfect sense, but in the symbolic evaluation, I think I can work without it.
Hm. And I study the code, and I try to implement the symbolic eval, when I realize .. why am I even doing this?
I mean the whole symbolic execution. I know that I want to have a working bytecode interpreter, like I had in tinySelf. And I think I was thinking at some point, that making symbolic execution work will be easy, and it should allow me to prototype some things before I get to the harder part.
Now I see that I'll have to do a bunch of nonsense, create a whole parallel set of classes just for easier symbolic evaluation, and I'll still need to write all the hard stuff anyway. And then when I'll get to the bytecode interpreter, I'll have this stuff all over the place and just getting in the way and not being useful at all.
So let's just .. throw it away.
And I did that. I've commited everything, created new tag symbolic_execution
and then went to delete files.
TODO
-
☐ Implement
plantuml
debug printers because the debug printouts are basically unreadable.
-
☐ Suffix for the tinySelfEE scripts:
tse
? I've usedtself
now, but I don't like it anymore.