Thorsten Ball is the author of the book Writing an Interpreter in Go. I recently found out about his book and I reached out to him to ask a few questions about his ideas on creating programming languages, writing a book and discuss his approach.
Let’s meet Thorsten
Hi Thorsten can you start by telling us a little bit about you?
I have seen that you have been working with Ruby for some time. In the Ruby community there is a strong interest for internal DSLs: did it have a role in making you interested in languages?
Not really, no. I didn’t get interested in programming languages because I wanted to create my own or a DSL, but because I wanted to better understand how existing languages work and how they’re implemented. I’ll also always fall for a bit of mysticism in programming and the reputation and aura compilers and virtual machines have certainly got me hooked. Steve Yegge’s “Rich Programmer Food” did the rest in convincing me that on my personal journey to being a great programmer I can’t cheat my way around learning about parsing, lexing, interpreters and compilers.
I read you had your “lisp period”. I can relate to that: I went through a Clojure period and a Haskell one myself. Is there any other language that caught your attention lately?
There’s actually quite a few. The first one is Elixir/Erlang. As a Rails developer by day I’m intrigued by the Phoenix framework and the Erlang ecosystem and tooling. The functional aspect, the VM, the deployment story, the stability — that all sounds too good to be true. I already spent some time working through the Elixir book, but I haven’t done any “serious” programming with it. Then there’s Rust, which has been on my list for at least a year now and it’s high time I finally learn it for good. I also think I want to learn one language from the ML family, just to broaden my horizons. I’m currently flirting a bit with OCaml by working through the excellent “Real World OCaml” book. And then there’s Forth, which I want to finally grokk, Clojure, which I tried to learn last year but still want to use for a “real” project, and a bunch of frontend languages that sound really interesting: Elm, PureScript and TypeScript.
Let’s ignore for a moment the constraints of your daily job: what is the programming language you would use all time if you could? Would you like to use more Go or Lisp or something else?
Phew! That’s a good question and I don’t think I have a clear answer. Ruby certainly feels like home to me, just because I’ve been using it for so long and can practically write it in my sleep. Go is not a perfect language and certainly has a few downsides, but it also has great ergonomics: really, really great tooling, easy deployment and it’s fast. And then there’s Lisp, which is certainly a temptation, just because I love it so much and it has a special place in my heart, but I’m not all too experienced with using it on a daily basis. In the end I have to say that my perfect language would need to have first-class functions, great tooling, a REPL, a type system that can help you, a testing culture, a striving, welcoming community and enough libraries/frameworks to be useful… I haven’t found it yet. If you have, please tell me. Until I do, I’ll use anything that comes closest to that.
Federico: it seems you have just described Kotlin…
About building languages
Which languages do you enjoy more writing? Small DSLs for developers? Or something more tailored to end-users?
I haven’t built a language for end users and production use yet, even though some of the Ruby metaprogramming I did would certainly count as a DSL. I rather build to better understand how something works. But I think in the future the skills I’ve acquired by writing “Writing An Interpreter In Go” and building interpreters along the way will certainly come in handy. That could be when parsing configuration files, building a custom query language or just modifying something existing.
What is your opinion on tool-support: do you think notepad is enough or should every language comes with a full-blown IDE? I see that in your book you discuss building a REPL: do you think it is a critical component for supporting a language?
Oh boy, I’ve certainly stepped on some toes in the past when talking about IDEs, so here’s my attempt to approach this topic with the necessary tactfulness. First of all: I’m not a huge fan of IDEs. I think they hide a lot of complexity, not by abstracting it away, but just by stuffing it into some 3rd level context menu. And as soon as something goes wrong, you’re sitting there, wondering “what the heck did just happen?”. In the worst case, they give you some buttons to push to make some other buttons change their color, without you knowing what’s happening. But in the best case, combined with the right language, they can offer tremendous leverage and power. Compared to that a Vim user (which I am!) would look like a caveman rubbing sticks together.
That being said, I don’t think a full-blown IDE is necessary. But great tooling certainly is. I’m thinking of CLI tools here, primarily. It’s what can make or break a language. Just look at Go. On paper it’s the most boring language, but in practice, people are ecstatic about the great tooling it provides and how much easier it makes their daily work.
A REPL can be one of those great tools. It’s not a silver bullet, but it can make a language much more approachable and usable. I personally love working with REPLs, since they allow me to get feedback really fast and give me a place to sketch out ideas real quick. If I don’t have a REPL I need something else that can give me that.
About Writing an Interpreter in Go
You chose to write your interpreters using Go. Go is a language I am not familiar with, can you help us understand which characteristics of the language made you choose it?
Go is a really small language with a great standard library. It’s easy to understand and really easy to read. If you’ve used another “curly braces language” before, you can read Go code. Combined with the fact that Go code tends to be rather simple and not use many advanced language constructs (because there aren’t any in Go! Besides channels…), this makes Go the perfect choice as a teaching language.
My goal with the book was to make the topic of interpreters as approachable as possible and Go helps with that. You can understand the code in the book and translate it to your favorite language of choice, even if you’re not an advanced Go user.
Do you think writing an interpreter with another language, like Ruby, would be very different?
Besides performance and tooling, I don’t think writing an interpreter in Ruby would be vastly different from using Go. But using Ruby as the language in the book would’ve certainly made things much more interesting. It would be less lines of code, sure, but languages with metaprogramming capabilities like Ruby also make it easy to write code that no one else can understand, let alone translate it to their favorite language.
You chose the approach of writing your own lexer and parser, instead of using a parser generator like ANTLR. Why? Was more to satisfy your curiosity or do you think there are benefits in writing your lexer and parser?
That was a conscious choice I made. Before writing the book I was really frustrated by other resources on the topic taking the “let’s just use a third party tool” approach. I’m here to learn, damn it! Show me the code, show me the parser! So I chose to implement a full Pratt parser, step by step, in the book. That way the reader can always use another tool later on, but at least they’ll know which problems it solves, since they solved it by hand before. It’s easy to say “oh, parsing is a solved problems, just use yacc/ANTLR/etc”, but if your goal is to learn, why should you skip learning about parsers?
You wrote a book specifically about interpreters: can you tell us when do you think is it better to build an interpreter instead of a compiler?
If you’re out to learn about programming languages, interpreters and compilers, I think you should start by writing an interpreter. They’re generally easier to understand and easier to write. Later on, you can always extend the interpreter and even turn its tree-walking mechanism into a compilation step. Then you can compile to bytecode or to native machine code. Or you can build lazy-evaluation into the interpreter, or you can try your hand at JITing. And then the line between interpreters and compilers starts to blur anyway. And if you’re starting out, I think it’s important to get results quickly – to keep the motivation up – and that’s easier to do with an interpreter than, say, a compiler compiling C to x86 machine code.
If someone was about to write several languages and several interpreters, how much code do you think could be reused? How much of the code presented in your book is general and reusable?
There’s a separation of concerns in most programming language implementations that feels almost natural: there’s a parsing part, one or more independent internal representations, and components that either translate between these representation or evaluate them. Of course, the details differ between implementations, but it’s safe to say, that most follow this separation. That makes it easy to reuse certain parts. As an example, I took the Monkey interpreter presented in the book and turned it into a bytecode compiler and virtual machine, without touching the lexer, the parser, the AST or the tokens at all. All I did was switch out the evaluation process with a compiler that emits bytecode and added a virtual machine that interprets this new internal representation, the bytecode. So in general, there are a lot of possibilities to reuse that type of code. That’s why projects like LLVM exist. Of course, the final decision – whether to reuse something or start from scratch – has to be made on an individual basis.
The code presented in the book is not meant for production use and is not even the most elegant/performant/reusable code. That wasn’t the goal. The goal was to make the code as easy to understand and readable as possible. I won’t guarantee what happens when you try to use it in production 🙂
It seems that “Structure And Interpretation Of Computer Programs” had quite an influence on you. Who should read this book? Any other book you want to recommend?
Yes, it’s a fantastic book. If you’re a self-taught developer and want to get a solid computers science foundation on your own, or if you want to know why someone would call Lisp/Scheme beautiful: read this book! You can’t go wrong doing it. It asks a lot from you, the exercises are incredibly tough and the pace is high, but you also get a lot back.
I can talk about and recommend books all day, but if you want to dig deep and build a foundation for your understanding of computers, here’s my top three: the Code book by Charles Petzold, Programming From The Ground Up and The Elements of Computing Systems (From Nand to Tetris). And then there’s the fantastic The Soul Of A New Machine as a companion.
About writing a book
How was your experience writing the book?
Fun, despair, weeks of productivity and motivation, weeks of frustration — all of it. It took me nearly a year to write the book, to go from “I want to do this” to “Wow, it’s out there now!”, and of course I went through up and downs. But all in all, it was a very rewarding experience. I learned a ton — about the topic at hand, of course, but also a lot about myself — and I’m incredibly proud of what I did.
I read with interest your blog post “What I didn’t do to write a book“. It seems a human approach to writing a book. Have you any other advice you would like to share with someone trying to write a book for the first time?
Everyone’s path is different. You need to be completely honest with yourself and find out what works for you. But if there’s one general advice I would give, it’s the one I got from a friend of mine: don’t start yak shaving, start writing. Don’t obsess over the file format, over whether you use gitbooks or leanpub or emacs or vim, just start to write, try to create something. You can always reformat it or switch a provider later on.
Many authors of technical books struggle to the promotion of the book. What are you doing to let people know about your book? What did it work for you and what did not?
I haven’t done any “classic” advertisement yet: I didn’t spend any money on ads. All I did was to set up a mailing list on which people can sign up to get notified when the book is released. And then I just tried to write blog posts around the topic, to get people interested. But I don’t want to write clickbait articles. I want real content that can convince a reader that I know what I’m talking about and that I can write. And if they are so inclined, they can buy my book. I shared these blog posts on Reddit, Hacker News and Twitter and, well, the response has been quite good. I think being honest and not trying to trick people into buying something they don’t want works best.
What is the best way to follow you?
Probably @thorstenball on Twitter. Then there’s my blog at thorstenball.com, where you can subscribe to the feed and also subscribe to my mailing list on which I send out occasional updates about the book or new blog posts.
Do you plan to give talks in the near future?
Nothing planned yet, but I’m sure I’ll talk at one or two user groups. I’m currently working on a print version of the book and want to extend it some more.
Federico: It was really nice for me to discuss with someone who shares the passion for languages. I liked very much his curiosity and his interest in understanding the mechanisms behind parsers, interpreters and compilers. I also got some nice ideas about what to read next. He has also convinced me to include a REPL in the next languages I am going to build. I hope you had fun too. If you like interviews there a few you could take a look at:
- Interview to Vaclav Pech on Jetbrains MPS: the community and the future
- Interview with Jan Köhnlein on TypeFox, DSLs and Xtext
- Interview to Erik Dietrich on Static Analysis and a data driven approach to refactoring
Download the guide with 68 resources on Creating Programming Languages
Receive the guide to your inbox to read it on all your devices when you have time