Skip to main content

Spearheading your way into making a programming language without prior knowledge is hell

· 11 min read

I am in pain.

This is not a guide explaining how to make even a component of a programming language, nor explaining the internals of RCaron. This is a rant or something.

The Beginning

So, in May of 2022 I just started making RCaron(ř). My motivation was to have a nice long-going project that gives me some bragging rights. Also, I knew nothing about making a programming language, only basic stuff you kind of understand after programming for a while. The beginning was simple. The first commit to the Git repo was 727 additions(includes whitespace, comments, 93 lines of some benchmark). I am pretty sure that I did think I was creating horrors, but I still didn't have as much of a defeatist attitude as I do now. One of the first things that gave me hell was doing math with parentheses. I procrastinated a little on implementing other things not because they seemed difficult but just a lot of work. Stuff went well. Implemented functions, even modules quite early too(we are already in June), function arguments. Some days I would think "I'll go do some progress", start up my IDE, turn on some music and stare at the code for a bit not knowing where to start.

By doing some code borrowing and inspiration from PowerShell I was able to bind RCaron code to a .NET event. It made stuff feel real. Then summer came and the only thing I did was learn of the existence of LINQ expression trees and of dadhi/FastExpressionCompiler, which I then forgot about.

In September I restarted work on RCaron. I Decided that I should do some performance stuff, like not allocating strings so I can use them in a switch case because C#11 supported matching on a Span. And my IDE(Jetbrains Rider) at the time still did not support it, though it was a couple months before the release of C#11 and .NET 7 so it's fair, I guess. By not supporting I mean it would be underlined as an error in the editor, but it would compile, but I still decided to wait on this change. I did do one thing and that is stop creating tokens for comments and whitespace only for me to ignore them afterwards.

Then I restarted working on new stuff and started with classes, switch statements, fooling around with JSON. And also started work on an extension for VSCode, which I previously thought was impossible because the OmniSharp docs for using their language server protocol library were uh, non-existant? When showing off a screenshot to someone, the question of "discord client when" came up and I realized it should be technically possible. And it was.

screenshot of RCaron in VSCode with a Discord bot running

Note that the extension had only TextMate syntax highlighting and no language server yet. I also reworked strings at this point in time. Implemented else if and if else, quite later than I would've originally expected. I then also proceeded to redo comparison and logical operations(e.g. $h == 2 || $h == 3) and try to have them be done in the correct order. Went to sleep that day with >30% failing unit tests. And then next day you're working on it and they start passing before you expect it.

Also decided to try using a year-old project to make a Discord music bot with RCaron(though only with a play command). I ran into issues with Discord.Net running events on its gateway thread, meaning when an event is being handled, other events wouldn't fire. I also had to experiment with running an RCaron motor from RCaron and having it execute a specific code block.

Decided to start writing documentation(not making the docs website) for RCaron and so I did. I also figured out that I implemented the break statement wrongly. Another day I experimented a little with compiling RCaron to an expression tree(all it could do was println 1 + 1 and maybe variables). Then continued with writing documentation and found out that I forgot to make return work with nothing to return.

And now November. Implemented implicit numeric conversions when calling .NET methods because it was starting to annoy me and also learned of the magic of ArrayPool. Made an icon for RCaron, which resulted in these amazing screenshots:

VSCode with the ř icon looking funny 1 VSCode with the ř icon looking funny 2

I feel like I would've kept it like this if it didn't to that in the sidebar(2nd screenshot). Already had symbol highlighting at this point in time in the extension and had a language server which only did symbol lookup. And I proceeded to making that language server do something useful - formatting. Was less hell than I anticipated. And made the RCaron documentation website, without everything that existed but it was at least so I had the setup done. I also had to make prismjs highlighting for RCaron for the website. And continued with implementing multi-file support and somehow making stack traces.

DLRing

And then I read this long article and felt like I understand the minimum amount required to start reading into further sources on compiling to an expression tree. And so, I did that. Real fun when the documentation is a PDF from 2014. Oh wait, that's the copyright date, it was written in June 2010, even better. So, I made ClassInstance implement IDynamicMetaObjectProvider with GetMember and SetMember in the DynamicMetaObject, and then used that from C#. Also established a separate unit testing project that just uses all the exact same files from the original RCaron.Tests project using <Compile Include="..\RCaron.Tests\*.cs"/>. And I started making the actual JITter. Started counting down the failing tests in my diary:

  • 27. 11. - 60(out of ~75) failing
  • 30. 11. - 56
  • 8. 12. - 48
  • 1. 1. - 37
  • 9. 1. - 35
  • 14. 1. - 32(57% passing)
  • 15. 1. - 31
  • 22. 1. - 25
  • 26. 1. - 23
  • 27. 1. - 21
  • 29. 1. - 19
  • 30. 1. - 12
  • 1. 2. - 10

And on the 4th of February, after 2+ months of work, I have all tests passing, well except for 2 which I just made xUnit skip:

  • GateDumb, which had the goto_line statement, that I used to make a loop at the ultimate beginning
  • GetLineNumber, because I thought that you would use the interpreter when you were debugging

Because I have been a good boy for once working on a separate branch, I was able to make a pull request to the master branch to squash all the 36 commits, which weren't following conventional commits, into one. The pull request was 2 666 additions and 405 deletions. Before we continue, I would just like to quote someone commenting on me experiencing the debugger ignoring my breakpoints because I had a stackalloc in my code:

bros code is so broken the debugger shlt itself
- skybird23333, 13th of January 2023

I of course ran some benchmarks so I can enjoy looking at the fruit of my unpaid labor. This was one of the 2 benchmarks running:

rcaron
// it literally runs only 2 cycles of Fibonacci
$a = 0; $b = 1; $c = 0;
for ($i = 0; $i < 2; $i++) {
$c = $a + $b;
$a = $b;
$b = $c;
}
rcaron
// it literally runs only 2 cycles of Fibonacci
$a = 0; $b = 1; $c = 0;
for ($i = 0; $i < 2; $i++) {
$c = $a + $b;
$a = $b;
$b = $c;
}

This sped up almost 15 times from 2 771.6ns to 185.4ns and the allocations went down from 1248B to 264B. These numbers are of course for preparsed and precompiled(wet run). The benchmark that measures the whole parsing and compilation time(dry run) went from 12 564.2ns to 863 414.6ns. Amazing. As you have probably realized by now, a single person making 2 "runtimes" for a programming language they are making in their free time does not sound fun. And I remembered that you can have the expression tree be interpreted by already included APIs. So, I benchmarked that. On a dry run, the expression tree interpreter was around 4 times slower than the original RCaron interpreter, and of course with more allocations. 4 times slower was enough to make me feel the need to keep the original interpreter. And now I was in the dilemma of slow, or not so slow and painful amount of work.

The storm after the storm

And then I had the idea of sharing binders between the expression tree RCaron and original interpreter RCaron, which would make it an acceptable amount of work. At first, it went well. I could now abandon the Horrors class of autogenerated code that just did math for not even every possible combination of numeric types. And then the InvokeMember binder hit. And my brain just kind of imploded. How do I call an RCaron function with the original interpreter from the binder, which is supposed to be shared among the JITter and original interpreter? Abstract class and abstract methods, which must be implemented by the specific "runtime". Good enough. How do I do the arguments for executing that said function? Well, uh, just pass the argumentTokens and callToken parameters to the already-existing FunctionCall method on the Motor. Hell. And then, how do I make the callsite for this? I can't just pass in a Func<object[], object> as the type parameter because it takes that object[] as a single argument and not as a list of them. And dynamically generating a delegate type to be what I want just doesn't seem possible. I could use the binder without the callsite but that leaves me with an Expression and it seems like it would be hell to handle this.

And now, a bit of religious background into me growing up in the capital of the Czech Republic, Prague. Only 9.3% Czech people declared themselves catholic in the 2021 census, and 2.4% other Christian. Which is quite a fall from the 39.1% of catholic in 1991 after communism fell. But even when growing up irreligious you would be told that Christmas presents are given by baby Jesus(Ježíšek), if you don't behave devils will go after you and you'll end up in hell, if you are good you go to heaven. And other fun stuff. But you don't really believe in Christianity, but there's got to be something, right? I am now convinced hell is real. And I am creating it. Or I am just going insane and creating insanity.

Conclusion 1

Enough of that. So another day I ran the benchmarks with the expression tree interpreter against the original interpreter again, because I didn't store the results of the first one. Reassured myself that dry runs were 4 times slower, but also noticed something I didn't before, the wet runs were faster and allocated less. 2 950.4ns to 887.3ns and 1248B to 336B. So the expression tree interpreter could be viable for medium work scripts while the original interpreter is best for run-once scripts. Also the dry run for the expression tree interpreter is only 48µs, but that would still be worrying when increased for larger scripts.

And thus, I am presented with the following choices:

  • Throw away the original RCaron interpreter and cope with the 4x slowdown
  • Do the binder sharing only partially
  • Create hell on earth and do the binder sharing fully somehow

And also, another choice for the last 2 ones: should I keep the original interpreter as the default RCaron experience?

Conclusion 2

I will probably take a break from RCaron. And maybe check for optimizations I can do on the JITter. This experience gave me an excuse to make a blog post after 2 years. Also if you plan to make your own programming language just prepare yourself and don't expect too much I guess.

Notes

  1. "expression tree interpreter"(Microsoft.Scripting.Generation.CompilerHelpers.LightCompile) does not interpret the expression tree, it actually compiles the expression tree to IL and then interprets the IL

Sources

  1. 2021 Czech Republic census data for religion