Reasons why I decided to abandon RPython in tinySelf

@2021/10/29

My toy programming language 📂tinySelf, inspired by 📂Series about Self, was written in the RPython toolkit, made by the creators of PyPy (alternative and pretty fast python interpreter). This toolkit transcompiles python source code into C, and also gives you some benefits, like garbage collector and JIT compiler for the language you are implementing in it.

So, what are my reasons to abandon it?

RPython

Long story short, the biggest reason why I decided to abandon it is definitely RPython. I've written about some of my unhappiness with it before (Newsletter 2020-09-12; Waves of productivity and Lifelog 2020-05-25; Work in progress everywhere), but here is a recapitulation:

Long compilation times

Average compilation time of the tinySelf on my system (Intel Core i5-4590 with 32G RAM) is something like one minute. And the version with JIT takes five or more minutes.

It's not that my project is that big because frankly, it is tinySelf, it has only ~6300 source lines of code (via cloc). It's because rpython is horribly slow.

Non-existent documentation

There is official documentation on the read the docs, and that's pretty much it. There are some articles, mostly about user experience with it. The official documentation is everything, but comprehensive. It touches pretty much everything, but most of the parts are only mentioned once, and in one context, and not really explained.

tinySelf was started @2017/04/09. I hoped, that with time, there would be more. But in the three and half years since, there was really no change.

I've read some topics many times, and I still have no idea what to think about them. For example, the part about JIT and JIT debugging was so incomprehensible, that I just copied piece of code from other projects, and to this day, I have no idea why are there some things that are there.

Horrible debugging experience

Let me tell you, I've never seen anything with such weird and cryptic error messages like rpython's compiler. There is a great article The Magic of RPython about it. I've worked for telco company and in National library which used a system that was part COBOL, part Java. I've seen some shit you wouldn't believe. But not this.

This is an older example. There is nothing special about it, and this is really how the errors look like. I've saved it some time ago just to be able to talk about it:

[translation:info] 2.7.10 (5.1.2+dfsg-1~16.04, Jun 16 2016, 17:37:42)
[PyPy 5.1.2 with GCC 5.3.1 20160413]
[platform:msg] Set platform with 'host' cc=None, using cc='gcc', version='Unknown'
[translation:info] Translating target as defined by src/tinySelf/target
[translation] translate.py configuration:
[translation] [translate]
    targetspec = src/tinySelf/target
[translation] translation configuration:
[translation] [translation]
    gc = incminimark
    gctransformer = framework
    list_comprehension_operations = True
    withsmallfuncsets = 5
[translation:info] Annotating&simplifying...
[33] {translation-task
starting annotate
[translation:info] with policy: rpython.annotator.policy.AnnotatorPolicy
[f5] translation-task}
[Timer] Timings:
[Timer] annotate                       --- 6.1 s
[Timer] ========================================
[Timer] Total:                         --- 6.1 s
[translation:info] Error:
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/goal/translate.py", line 318, in main
    drv.proceed(goals)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/driver.py", line 551, in proceed
    result = self._execute(goals, task_skip = self._maybe_skip())
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/tool/taskengine.py", line 114, in _execute
    res = self._do(goal, taskcallable, *args, **kwds)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/driver.py", line 278, in _do
    res = func()
   File "/home/bystrousak/Plocha/tests/pypy/rpython/translator/driver.py", line 315, in task_annotate
    s = annotator.build_types(self.entry_point, self.inputtypes)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 92, in build_types
    return self.build_graph_types(flowgraph, inputs_s, complete_now=complete_now)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 140, in build_graph_types
    self.complete()
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 229, in complete
    self.complete_pending_blocks()
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 224, in complete_pending_blocks
    self.processblock(graph, block)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 398, in processblock
    self.flowin(graph, block)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 501, in flowin
    self.consider_op(op)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/annrpython.py", line 653, in consider_op
    resultcell = op.consider(self)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/flowspace/operation.py", line 104, in consider
    return spec(annotator, *self.args)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/unaryop.py", line 118, in simple_call_SomeObject
    return s_func.call(argspec)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/unaryop.py", line 978, in call
    return bookkeeper.pbc_call(self, args)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/bookkeeper.py", line 535, in pbc_call
    s_result = unionof(*results)
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/model.py", line 771, in unionof
    s1 = pair(s1, s2).union()
   File "/home/bystrousak/Plocha/tests/pypy/rpython/annotator/binaryop.py", line 93, in union
    raise UnionError(obj1, obj2)
[translation:ERROR] UnionError:

Offending annotations:
  SomeInstance(can_be_None=True, classdef=rply.token.BaseBox)
  SomeTuple(items=(SomeString(const='$end', no_nul=True), SomeList(listdef=<[SomeString(const='$end', no_nul=True)]>)))


Occurred processing the following simple_call:
  function at_the_top_of_the_root_is_just_expression <src/tinySelf/parser.py, line 65> returning

  function multiple_expressions_make_code <src/tinySelf/parser.py, line 70> returning

  function self_parser <src/tinySelf/parser.py, line 82> returning

  function expression_number <src/tinySelf/parser.py, line 88> returning

  function expression_string <src/tinySelf/parser.py, line 94> returning

  function expression_strings_numbers <src/tinySelf/parser.py, line 106> returning

  function unary_message <src/tinySelf/parser.py, line 113> returning

  function unary_message_to_expression <src/tinySelf/parser.py, line 118> returning

  function binary_message_to_expression <src/tinySelf/parser.py, line 124> returning

  function keyword_message <src/tinySelf/parser.py, line 133> returning

  function keyword_message_to_obj <src/tinySelf/parser.py, line 138> returning

  function keyword <src/tinySelf/parser.py, line 143> returning

  function keyword_multiple <src/tinySelf/parser.py, line 148> returning

  function keyword_message_with_parameters <src/tinySelf/parser.py, line 158> returning

  function keyword_message_to_self_with_parameters <src/tinySelf/parser.py, line 185> returning

  function keyword_message_to_obj_with_parameters <src/tinySelf/parser.py, line 212> returning

  function all_kinds_of_messages_are_message <src/tinySelf/parser.py, line 239> returning

  function expression_is_message <src/tinySelf/parser.py, line 245> returning

  function cascade <src/tinySelf/parser.py, line 267> returning

  function cascades <src/tinySelf/parser.py, line 281> returning

  function expression_cascade <src/tinySelf/parser.py, line 297> returning

  function slot_names <src/tinySelf/parser.py, line 316> returning

  function nil_slot_definition <src/tinySelf/parser.py, line 332> returning

  function slot_definition <src/tinySelf/parser.py, line 339> returning

  function slot_definition_rw <src/tinySelf/parser.py, line 362> returning

  function nil_argument_definition <src/tinySelf/parser.py, line 370> returning

  function slot_name_kwd_one <src/tinySelf/parser.py, line 378> returning

  function slot_name_kwd_multiple <src/tinySelf/parser.py, line 383> returning

  function slot_name_kwd <src/tinySelf/parser.py, line 393> returning

    value_0 = simple_call(v0, targ_0)

In <FunctionGraph of (rply.parser:67)LRParser._reduce_production at 0x6f6e1e0>:
Happened at file /home/bystrousak/.local/lib/pypy2.7/site-packages/rply/parser.py line 80

==>             value = p.func(targ)

Known variable annotations:
 v0 = SomePBC(can_be_None=True, descriptions={...29...}, knowntype=function, subset_of=None)
 targ_0 = SomeList(listdef=<[SomeInstance(can_be_None=False, classdef=rply.token.Token)]mr>)

Processing block:
 block@164[targ_0...] is a <class 'rpython.flowspace.flowcontext.SpamBlock'>
 in (rply.parser:67)LRParser._reduce_production
 containing the following operations:
       v0 = getattr(p_0, ('func'))
       value_0 = simple_call(v0, targ_0)
 --end--
[translation] start debugger...
> /home/bystrousak/Plocha/tests/pypy/rpython/annotator/binaryop.py(93)union()
-> raise UnionError(obj1, obj2)

Can you spot the offending piece of code? No? Let me select it for you:

Offending annotations:
  SomeInstance(can_be_None=True, classdef=rply.token.BaseBox)
  SomeTuple(items=(SomeString(const='$end', no_nul=True), SomeList(listdef=<[SomeString(const='$end', no_nul=True)]>)))

"Offending annotations", what a great error. And where is the error happening? It is somewhere in my code, where I am probably putting string and list into the tuple. Sometimes it will tell you where, most of the time not.

The best way how to not get crazy is to run compilation often, so you know you've made a mistake right when you've made it. Did I mention that one compilation run takes something like a minute?

Magical blackboxes everywhere

Python is a dynamic high-level language, with garbage collection and stuff. RPython compiles it into C. And your interpreter / the code you are writing is somewhere in the middle.

I was often annoyed, that I didn't understand what the compiler does with my code. When I create a python class, and it is translated to C via RPython, what do I really get there?

RPython documentation offers some answers, but not all. It is also everything but easily understandable. RPython's philosophy can be summed to "hey, don't worry, RPython is cool and it will take care of the lowlevel stuff for you". Which sounds great, and it mostly works, but when it doesn't, then you've got a big problem.

I've got to the point, where my interpreter worked as expected, and I wanted to optimize for speed, but I was constantly running into not knowing what is really happening. When I create a class which has the last item declared as a list, and then put instances of this class into another list, how it will actually look like in the memory? What will it do? Documentation mentions that there is a way how to allocate a continuous block of memory in this way, but I couldn't really make sense of it.

JIT is not the magic I thought it would be

I mean, the fact that you get JIT basically for free is great, but not so great as I thought. To make it effective, you still have to do a lot of fine-tuning, and that's where I've hit the wall.

I've got some help from cfbolz, (a guy behind pypy, really great programmer), and I still didn't really understand what am I doing. You have to capture the log from the JIT run, and then visualize it using a really weird script in pygame, and it is all mixed with rpython's annotation system and there is not much explanation of what are you really looking at.

WTF is this code? I can make no sense of it, and I know the code of my interpreter line by line, to the point that I am sometimes programming in my head when I am showering and then just go and rewrite it to the computer.

I don't want to say that there is something wrong with the RPython's JIT. It is definitely powerful, if you know what are you doing. But I don't, and not because I didn't read the docs. I've read them multiple times.

Python 2.7

What seems like a minor annoyance is that RPython only supports Python2.7. This version is deprecated since 2020, and also misses some of my favorite features, like ordered dicts by default. And also..

Typed Python is annoying without type hints

And there are no type hints in Python 2.7. RPython uses type hints in the form of assert isinstance() calls, which are actually not asserts, but expressions parsed by RPython's annotator. Sad is, that even pretty good IDE's like PyCharm don't really get this.

I've tried to supply most of the types in the docstrings, but this is really annoying, and it doesn't really work well for nested types, like list of some objects.

So.. why stick with RPython?

When you think about it, you don't really write Python, you write some strongly typed class-based language in Python's syntax. Everything even slightly dynamic, like accessing the range in a list is a pain in the ass, and you have to use all kinds of patterns known from statically typed OOP languages, like Java and C++. I had to create container, or "box" classes many times, where I would never need them in Python.

What is left is theoretical compatibility with Python. But RPython usually requires a lot of type checks and changes in the code, so you can't really use much of the Python's gigantic software pool. It works the other way around; your RPython code is still valid Python, which may be useful for debugging, but not much more.

At least there is a one good library for writing parsers (rply), but you won't find any support for any kind of GUI libraries. Only C FFI, which is usually not what you want.

Just to recap:

Pros:

It's python. I like python. I have huge experience with python.

Cons:

Long compilation times.

Horrible debugging experience.

Very little documentation.

Bad IDE support for inferring type information (I've used Sublime, VS Code and PyCharm).

This is not some newbie feeling, I've arrived at this after three years of occasional hobby programming, and after several hundreds of commits (704 at the moment).

GraalVM

I went to talks about GraalVM in local university here in Prague, and I must say that I've been really impressed by it. More specifically, by the speed and by the language interop. This allows you to transparently interface with other languages, and for example call JavaScript generators from C and so on.

If I implemented that, I could call other languages, like if it was a code in my interpreted language and vice versa.

Java

The last draw came when I was reading Crafting interpreters book (finally also available as a book!), which has examples in Java. I've tried to do some of the examples myself, and even though I don't like Java, and I didn't touch it for almost ten years, I could do it easily.

What shocked me most was really how painless and streamlined was the whole experience in comparison with RPython. I've realized that I've been writing something like Java hidden in Python syntax all the time because RPython forced me to. Only, everything was really underdeveloped and user-unfriendly. So much, that writing Java now seems like fun.

And the tools. I can have interactive debugger? Wow. I could have it when I interpreted the code with pure Python, but it really wasn't the same. And specifically, performance measuring was a nightmare. And the compilation time is now 800 milliseconds? And I can compile it to native binaries and have language interop?

I think you can see where I am going with this. RPython is an extremely cool project, and I have huge respect for the people involved with it. But I don't have a beard long enough to shield me from all the frustrations from it. So, the decision is simple.

Future of the tinySelf's development

At the moment, I am recreating tinySelf in Java. And it is really fun for me, after a long time when it wasn't.

I've created handmade tokenizer rapidly, and then I've handmade parser, guided partially by the Crafting interpreters book, and partially by what, I thought, would work best. At the moment, the parser can parse everything the original tinySelf's parser could, with some improvements in look-ahead parsing. Now it can parse my alternative object syntax, where it is possible to omit part of the object separators. For example, it is possible to write objects like (slots|), (|code) and (slots | code), instead of the original syntax, which permits only (|slots|), (|| code) and (| slots | code).

At the moment, I am working on parser error handling, which was nonexistent in the tinySelf, so you usually just got something like "Parser error", and no information about where it happened. Now you not only get a nice report with line and exact position with visual highlight where the error happened, but also error recovery. In theory, this should be usable in IDE / editors, to highlight multiple errors and so on.

New posts

Links to this page:

Tags

Blog categories