Chapter 10: The AST and Macros

Where magic turns into science.


I've talked about many macros in this book.

There's a macro for this and a macro for that.

You use macros for defining stuff, for making types and functions and lists, for doing pattern-matching, and for control-flow.

There's a macro for everything.
Yet, I haven't even shown a macro being defined yet.

Quiet your mind, young grasshopper. You're about to be enlightened.

But first, you need to learn a few things.

The AST

The word AST stands for Abstract Syntax Tree.

An AST is a representation of the syntax of a programming language, and compilers use them for the sake of analyzing the source-code (like, by type-checking it), and then generating the binary/byte-code output.

You might think that's none of your business.
Only compiler writers have to worry about that stuff, right?

Oh, you have much to learn, young grasshopper.

You see, the power of macros lies in the fact that (to some extent) users of the language can play the role of language designers and implementers.

Macros allow you to implement your own features in the language and to have them look and feel just like native features.

I mean, beyond the native syntax for writing numbers, text, tuples, variants and records, every single thing you have written so far has been macros.

Module statements? Yep, macros.

Definition statements? Yep, macros.

Function expressions? Yep, macros.

And you'd have never suspected those weren't native Lux features had I not told you they were macros.

Now, just imagine making your own!

But macros work with the Lux AST, so that's the first thing you need to master.

Check it out:

(type: (Meta m v)
  {#meta m
   #datum v})

(type: Cursor
  {#module Text
   #line Int
   #column Int})

(type: (AST' w)
  (#BoolS Bool)
  (#IntS Int)
  (#RealS Real)
  (#CharS Char)
  (#TextS Text)
  (#SymbolS Ident)
  (#TagS Ident)
  (#FormS (List (w (AST' w))))
  (#TupleS (List (w (AST' w))))
  (#RecordS (List [(w (AST' w)) (w (AST' w))])))

(type: AST
  (Meta Cursor (AST' (Meta Cursor))))

These types are all in the lux module.

The AST type is the one you'll be interacting with, but all it does is wrap (recursively) the incomplete AST' type, giving it some meta-data to know where each AST node comes from in your source-code.

The real magic is in the AST' type, where you can see all the alternative syntactic elements.

Most of it is self-explanatory, but you may not recognize #SymbolS.
A symbol is lisp-speak for what is called an identifier in most other programming languages.
map is a symbol, as is lux/data/struct/list;reverse.
They are the things we use to refer to variables, types, definitions and modules.

The Ident type (from the lux module), is just a [Text Text] type.
The first part holds the module/prefix of the symbol/tag, and the second part holds the name itself. So lux/data/struct/list;reverse becomes ["lux/data/struct/list" "reverse"], and map becomes ["" "map"].

list;reverse would become ["lux/data/struct/list" "reverse"] anyway, because aliases get resolved prior to analysis and macro expansion.

Forms are (syntactic structures delimited by parentheses), and tuples are [syntactic structures delimited by brackets].
Records {#have lists #of pairs} of ASTs instead of single ASTs, because everything must come in key-value pairs.

Quotations

We know everything we need to extract information from the AST type, but how do we build AST values?

Do we have to build it with our bare hands using variants and tuples?

That sounds... exhausting.

Well, we don't have to. There are actually many nice tools for making our lives easier.

One nice resource within our reach is the lux/macro/ast module, which contains a variety of functions for building AST values, so we don't have to worry about cursors and variants and all that stuff.

But, even with that, things would get tedious.
Imagine having to generate an entire function definition (or something even larger), by having to call a bunch of functions for every small thing you want.

Well, don't fret. The Lux Standard Library already comes with a powerful mechanism for easily generating any code you want and you don't even need to import it (it's in the lux module).

## Quotation as a macro.
(' "YOLO")

Quotation is a mechanism that allows you to write the code you want to generate, and then builds the corresponding AST value.

The ' macro is the simplest version, which does exactly what I just described.

This would turn the text "YOLO" into [{#;module "" #;line -1 #;column -1} (#;TextS "YOLO")].
If you want to know what that would look like with the tools at lux/macro/ast, it would be: (text "YOLO").

The beautiful thing is that (' (you can use the #"'" #macro [to generate {arbitrary ASTs} without] worrying (about the "complexity"))).

## Hygienic quasi-quotation as a macro. Unquote (~) and unquote-splice (~@) must also be used as forms.
## All unprefixed macros will receive their parent module's prefix if imported; otherwise will receive the prefix of the module on which the quasi-quote is being used.
(` (def: (~ name)
     (lambda [(~@ args)]
       (~ body))))

This is a variation on the ' macro that allows you to do templating with the code you want to generate.

Everything you write will be generated as is, except those forms which begin with ~ or ~@.

~ means: evaluate this expression and use it's AST value.

~@ means: the value of this expression is a list of ASTs, and I want to splice all of them in the surrounding AST node.

With these tools, you can introduce a lot of complexity and customization into your code generation, which would be a major hassle if you had to build the AST nodes yourself.

You may be wondering what does "hygienic" means in this context.
It just means that if you use any symbol in your template which may refer to an in-scope definition or local variable, the symbol will be resolved to it.

Any symbol that does not correspond to any known in-scope definition or variable will trigger a compile-time error.

This ensures that if you make a mistake writing your template code, it will be easy to spot during development.
Also, it will be harder to collide (by mistake) with user code if you, for instance, write the code for making a local variable named foo, and then the person using your macro uses a different foo somewhere in their code.

## Unhygienic quasi-quotation as a macro. Unquote (~) and unquote-splice (~@) must also be used as forms.
(`' (def: (~ name)
      (lambda [(~@ args)]
        (~ body))))

Finally, there is this variation, which removes the hygiene check.

Out of the 3 variations, the one you'll most likely use is the 2nd one, since it provides both safety and power.

Macros

Now that you know how to generate code like a pro, it's time to see how macros get made.

First, let's check the type of macros:

(type: Macro
  (-> (List AST) (Lux (List AST))))

From the lux module.

You might remember from the previous chapter that you can only access the Compiler state inside of macros.
Now, you can see how everything connects.

You define macros by using the macro: macro (so meta...):

(macro: #export (ident-for tokens)
  (case tokens
    (^template [<tag>]
     (^ (list [_ (<tag> [prefix name])]))
     (:: Monad<Lux> wrap (list (` [(~ (ast;text prefix)) (~ (ast;text name))]))))
    ([#;SymbolS] [#;TagS])

    _
    (compiler;fail "Wrong syntax for ident-for")))

Here's another example:

(macro: #export (default tokens)
  (case tokens
    (^ (list else maybe))
    (do Monad<Lux>
      [g!temp (compiler;gensym "")]
      (wrap (list (` (case (~ maybe)
                       (#;Some (~ g!temp))
                       (~ g!temp)

                       #;None
                       (~ else))))))

    _
    (compiler;fail "Wrong syntax for default")))

You may want to read Appendix C to learn about the pattern-matching macros used in these examples.

As you can see, I'm using both quotation and the functions from the lux/macro/ast module to generate code here.

I'm also using the gensym function from lux/compiler, which generates unique symbols for usage within code templates in order to avoid collision with any code provided by the user of the macro.

The macro receives the raw list of AST tokens and must process them manually to extract any information it needs for code generation.
After that, a new list of AST tokens must be generated.

If there are any macros in the output, they will be expanded further until only primitive/native syntax remains that the Lux compiler can then analyze and compile.


You have learned how to use one of the greatest superpowers that Lux has to offer.

But, if you're like me, you might be getting the nagging feeling that something is not right here.

I mean, if I have to pattern-match against the code I receive; what happens when my macros have complex inputs?

Clearly, analyzing the input code is far more difficult than generating it with the quoting macros.

Don't worry about it.
Because in the next chapter, you will meet a more sophisticated method of macro definition that will make writing complex macros a breeze.

See you in the next chapter!

results matching ""

    No results matching ""