1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
|
---
title: A Frontend Programmer's Guide to Languages
route: /programming-languages
date: 2019-04-13
description: Today, we're going to make a programming language. No, it won't take a PhD, or years of your time (or even days for that matter). You and I, together, are going to learn what a programming language is and actually build one that can evaluate programs.
---
Today, we're going to make a programming language.
No, it won't take a PhD, or years of your time (or even days for that matter). You and I, together, are going to learn what a programming language _is_ and actually build one that can evaluate programs.
By the end of this post we'll have written a JavaScript function, `evaluate`, which interprets a small programming language with strings and variables. You won't be going off and rewriting your stack in it (though you're certainly welcome to try!), but I hope it's a fun exercise.
I encourage you to copy the code examples in your editor of choice and actually run them. You should be able to make it through this post in a single listen of [Lights - Little Machines](https://open.spotify.com/album/1u8OmwItT46Y1gD2xKAK9D?si=CcY4GgT4TGyOa7eNcnLqTA).
## Hello, world!
To kick things off, we'll write an interpreter for a new HelloWorld language, and use it write a [Hello, world! program](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program).
Here's our goal:
```js
const program = { type: "HelloWorld" }
console.log(evaluate(program))
// => Hello, world!
```
I'm sure you have questions. _Real quick_, let's introduce some terms that we'll more clearly define as we go.
- Our "**program**" is represented by the JavaScript object `{ type: "HelloWorld" }`. Throughout this post, we'll be adding more and more things to this object. For now, there's not much to it.
- Our "**evaluator**" (or "interpreter" - it's your call) is a single JavaScript function that accepts a "**program**."
- We receive a "**value**" from our "**evaluator**," which we send to the wonderful `console.log` function you're all familiar with.
Now let's build our **evaluator**.
```js
function evaluate(node) {
switch (node.type) {
case "HelloWorld":
return "Hello, world!"
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
```
This function is not terribly interesting, but I'd like to call out a few details.
- Our function accepts a parameter that we call `node` as in, a **node** of a tree. (_dramatic foreshadowing_)
- The crux of this function is a single `switch` statement that operates on the `type` field of our node. Our language is very simple, so we only have a single "node type" (namely, "HelloWorld").
- For the "HelloWorld" **expression** - we return the string "Hello, world!"
- If we see something we don't recognize - we throw an error. The programmer messed up!
At this point we have 8 lines of code that evaluate a simple HelloWorld language. I'll emphasize the _simple_ here, there are no variables, no loops, no modules, not even numbers - but it's a language. Our language has a very small **grammar**, but it's a language.
Let's make things more interesting.
## Strings
In this section we'd like to make a new `HelloStrings` language (marketing wants us to be able to print more than just "Hello, world!") that can produce strings and run two operations on them.
Here's our goal for this section:
```js
console.log(
evaluate({
type: "String",
content: "Apple",
})
)
// => Apple
console.log(
evaluate({
type: "Excite",
expression: {
type: "String",
content: "Banana",
},
})
)
// => Banana!
console.log(
evaluate({
type: "Append",
first: {
type: "String",
content: "Apple",
},
second: {
type: "Excite",
expression: {
type: "String",
content: "Banana",
},
},
})
)
// => AppleBanana!
```
Some initial observations:
- We ditch our "HelloWorld" **expression** in favor of "String," which has a "content" field in addition to the "type" field we used in the previous section.
- We introduce "Excite" which adds exclamation points to expressions.
- We introduce "Append," which also contains expressions in its definition - two of 'em in fact!
Let's start off by creating an **evaluator** to operate on "String" expressions.
```js
function evaluate(node) {
switch (node.type) {
case "String":
return node.content
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
```
Great, so we replace "HelloWorld" with "String" as the node's type - and return its "content" value instead of the string "Hello, world!"
**_You just wrote a programming language with strings, let's celebrate!_**
But if we continue with our desired goal above, we see the following:
```
Apple
evaluate -- unknown node type Excite
```
We can evaluate `{ type: "String", content: "Apple" }` no problem, but we don't know what to do with this "Excite" thing just yet.
So how might we evaluate "Excite" expressions? Based on how it produces "Banana!" in our desired output, we may be inclined to say that "Excite" _takes a string and adds an exclamation point to it_. Simple enough, right? But let's take a closer look.
```js
{
type: "Excite",
expression: {
type: "String",
content: "Banana"
}
}
```
The "expression" field of our Excite expression node isn't a string per se, but an expression in the HelloStrings language! **Our expression contains an expression inside of it!** (Are you beginning to see why our evaluator accepts a `node` as in "nodes of a tree"?)
To illustrate this further, allow me to propose the following, _valid_ program for the HelloStrings language.
```js
console.log(
evaluate({
type: "Excite",
expression: {
type: "Excite",
expression: {
type: "String",
content: "Banana",
},
},
})
)
```
In the above example, we'll evaluate the string "Banana" - excite it to get "Banana!" - than excite _that_ to get "Banana!!" (how very exciting!)
Let's begin adding Excite expressions to our language.
```js
function evaluate(node) {
switch (node.type) {
case "String":
return node.content;
case "Excite":
// We have this node.expression thing, but what do
// we do with it?
???
default:
throw `evaluate -- unknown node type ${node.type}`;
}
}
```
Rather than just returning a "value" like we did for String expressions (`return node.content`), we'll need to use `node.expression` somehow. As the name suggests, `node.expression` is an expression, and what do we do with expressions?
We evaluate them!
```js
case "Excite":
return evaluate(node.expression) + "!";
```
`evaluate` is now a recursive function which operates on String and Excite expressions. Here's the function in all its glory and an invitation to implement `Append` yourself.
```js
function evaluate(node) {
switch (node.type) {
case "String":
return node.content
case "Excite":
return evaluate(node.expression) + "!"
case "Append":
// Now it's your turn, how might we evaluate
// Append expressions?
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
```
## Variables
At this point we have a small language which can produce strings with "Excite" and "Append" statements. Nice job if you've made it this far!
Let's use this momentum and expand our language to support a very popular language construct - **variables**.
By the end of this section, we'll have a language which can declare and retrieve the value of variables in HelloStrings expressions.
```js
console.log(
evaluate(
{
type: "Let",
name: "x",
value: {
type: "String",
content: "Hello, world",
},
expression: {
type: "Excite",
expression: {
type: "Variable",
name: "x",
},
},
},
{}
)
)
// => Hello, world!
```
There's a lot there, so let's break it down.
- `evaluate` now takes a second argument (_dramatic foreshadowing_)
- We introduce a new "Variable" expression, which will **lookup** a variable by its name.
- We introduce a "Let" expression, which makes a new variable with a "name" and a "value" (the value being an expression in our language!), and uses that when evaluating an inner **expression**.
In our case, we're setting "x" to the evaluation of a "Hello, world" String, then retrieving the value of "x" to Excite it.
Let's start by adding "Variable" expressions to `evaluate`.
```js
function evaluate(node) {
switch (node.type) {
case "Variable":
// We have `node.name`, but what to do with it?
...
}
}
```
Consider the following code:
```js
evaluate({ type: "Variable", name: "x" })
```
Here we want to evaluate the variable `x`, but there doesn't seem to be much to work with. In a typical programming language, what we might we do with a variable?
We look it up!
```js
function evaluate(node) {
switch (node.type) {
case "Variable":
return lookup(node.name);
...
}
}
```
In order for `lookup` to do anything meaningful, we need to provide it with some sort of **environment** - a collection of variable names and values. We'll call this `env`.
```js
function evaluate(node) {
switch (node.type) {
case "Variable":
return lookup(env, node.name);
...
}
}
```
And where does `env` come from? Rather unfortunately we can't make it out of thin air (believe me, I tried for many hours), but we _can_ **pass it in to our evaluator.**
```js
function evaluate(node, env) {
switch (node.type) {
case "Variable":
return lookup(env, node.name)
case "String":
return node.content
case "Excite":
return evaluate(node.expression, env) + "!"
case "Append":
// Left as an exercise to the reader
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
```
_Remember to add `env` to our recursive calls in Excite and Append (you did implement Append, right?)_
Let's define `lookup`. For simplicity's sake, we'll say an environment is a JavaScript object with `name` keys and `value` values.
```js
function lookup(env, name) {
return env[name]
}
```
Simple, right?
```js
console.log(evaluate({ type: "Variable", name: "x" }, { x: "Hello, world!" }))
// => Hello, world!
```
**Congratulations you just added variables to your language!**
Last but not least, we'll want an easy way to create new variables. Without this, we'll be evaluating Excite's and Append's with nothing more than a bunch of global variables - and the marketing department _definitely_ doesn't want that.
Let's start by listing what a "Let" expression actually contains:
```js
{
type: "Let",
name: "x",
value: {
type: "String",
content: "Hello, world"
},
expression: {
type: "Excite",
expression: {
type: "Variable",
name: "x"
}
}
}
```
- A "Let" type
- A "name" field for naming our new variable
- A "value" field for giving our new variable a value
- An "expression" field to use our new variable in!
Excellent, now let's get started on implementing the thing.
```js
function evaluate(node, env) {
switch (node.type) {
case "Variable":
return lookup(env, node.name);
case "Let":
let newEnv = ???
return evaluate(node.expression, newEnv);
...
}
}
```
We'll be **adding a new variable to our environment**, and using that to evaluate our expression. For simplicity's sake, let's give ourselves an `extendEnv` function to do this.
```js
function evaluate(node, env) {
switch (node.type) {
case "Variable":
return lookup(env, node.name);
case "Let":
let name = ???
let value = ???
let newEnv = extendEnv(env, name, value)
return evaluate(node.expression, newEnv);
...
}
}
```
`name` is simple, that's just `node.name`. For `value`, however, we'll be given a _HelloStrings expression_ to evaluate.
```js
{
type: "Let",
name: "x",
value: {
type: "String",
content: "Hello, world"
},
...
}
```
Still, it's not much code :)
```js
function evaluate(node, env) {
switch (node.type) {
case "Variable":
return lookup(env, node.name);
case "Let":
let name = node.name;
let value = evaluate(node.value, env);
let newEnv = extendEnv(env, name, value);
return evaluate(node.expression, newEnv);
...
}
}
```
Finally, `extendEnv`.
Our `env` is a simple JavaScript object that maps `names` to `values`. We'll need to extend an `env` with a _new_ name and value pair. We can do so with `Object.assign`:
```js
function extendEnv(env, name, value) {
let envAddition = {}
envAddition[name] = value
return Object.assign({}, env, envAddition)
}
```
Or, using some of the latest and greatest JavaScript features:
```js
function extendEnv(env, name, value) {
return {
...env,
[name]: value,
}
}
```
Putting it all together, the combination of `lookup`, `extendEnv`, and `evaluate` completes the language.
```js
function lookup(env, name) {
return env[name]
}
function extendEnv(env, name, value) {
return {
...env,
[name]: value,
}
}
function evaluate(node, env) {
switch (node.type) {
case "String":
return node.content
case "Excite":
return evaluate(node.expression, env) + "!"
case "Append":
// Left as an exercise to the reader
case "Variable":
return lookup(env, node.name)
case "Let":
let inner = node.expression
let value = evaluate(node.value, env)
let newEnv = extendEnv(env, node.name, value)
return evaluate(node.expression, newEnv)
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
```
## Closing notes and things to ponder
If you made it to the end of this post, congratulations and thank you :)
We developed a language and an evaluator that processes [Abstract Syntax Trees](https://en.wikipedia.org/wiki/Abstract_syntax_tree) (or ASTs for short, but I just called 'em nodes for simplicity). Our language has [constants](https://en.wikipedia.org/wiki/Constant_(computer_programming), some small expressions to operate on strings, and [variables](https://en.wikipedia.org/wiki/Variable_(computer_science).
But this is only the beginning! You are ending this article with an `evaluate` function that can grow to support all sorts of common language features (numbers, comments, functions), the only limit is your imagination.
I'd like to leave you with a few concrete exercises, should you wish to revisit your new language on a rainy day:
- How might we add Numbers to our language? How about operations to Add and Multiply them? What about _rational_ numbers (fractions with a numerator and a denominator)?
- As we saw earlier, we can kickstart our `env` by populating it with global values. Could we add functions to this environment? How might we call them from our language?
- Can we take our abstract syntax tree and use it to generate JavaScript code? How about Ruby code?
- Writing our programs in as JavaScript objects doesn't seem to scale well. How might we go about creating a **syntax** for our language and converting that to an AST?
I'll also go ahead and plug some of my favorite resources on Programming Languages.
- Dan Grossman's [Programming Languages on Coursera](https://www.coursera.org/learn/programming-languages) is an entirely free, entirely amazing three-part course for learning the ins and outs of different types of programming languages. I can't recommend it enough.
- This article is inspired in no short part by [@jamiebuilds's super tiny compiler](https://github.com/jamiebuilds/the-super-tiny-compiler). I encourage you to check out the code to see some of the concepts introduced in this article in more detail.
- [Crafting Interpreters](https://craftinginterpreters.com/) is an effort to build "a handbook for making programming languages." An ambitious goal, but it's completely free and chock full of different interpreter concepts.
That's all for now. If you liked this article feel free to share it and follow me on Twitter while you're at it: [@jdan](https://twitter.com/jdan).
You should also come work with me at [Stripe](https://stripe.com/jobs). We're hiring all over the place.
Thanks again for reading,<br />
Jordan
|