Code Writing Code: An Introduction to the Theory and Practice of Modern Metaprogramming


Whenever I think about the best way to explain macros, I remember a Python program I wrote when I first started programming. I couldn’t organize it the way I wanted to. I had to call a number of slightly different functions, and the code became cumbersome. What I was searching for—though I didn’t know it then—was metaprogramming.

Any technique by which a program can treat code as data.

We can construct an example that demonstrates the same problems I faced with my Python project by imagining we’re building the back end of an app for pet owners. Using the tools in a library, pet_sdk, we write Python to help the pet owners purchase cat food:

import pet_sdk

cats = pet_sdk.get_cats()
print(f"Found {len(cats)} cats!")
for cat in cats:
    pet_sdk.order_cat_food(cat, amount=cat.food_needed)
Snippet 1: Order Cat Food

After confirming that the code works, we move on to implement the same logic for two more kinds of pets (birds and dogs). We also add a feature to book vet appointments:

# An SDK that can give us information about pets - unfortunately, the functions are slightly different for each pet
import pet_sdk

# Get all of the birds, cats, and dogs in the system, respectively
birds = pet_sdk.get_birds()
cats = pet_sdk.get_cats()
dogs = pet_sdk.get_dogs()

for cat in cats:
    print(f"Checking information for cat {cat.name}")

    if cat.hungry():
        pet_sdk.order_cat_food(cat, amount=cat.food_needed)
    
    cat.clean_litterbox()

    if cat.sick():
        available_vets = pet_sdk.find_vets(animal="cat")
        if len(available_vets) > 0:
            vet = available_vets[0]
            vet.book_cat_appointment(cat)

for dog in dogs:
    print(f"Checking information for dog {dog.name}")

    if dog.hungry():
        pet_sdk.order_dog_food(dog, amount=dog.food_needed)
    
    dog.walk()

    if dog.sick():
        available_vets = pet_sdk.find_vets(animal="dog")
        if len(available_vets) > 0:
            vet = available_vets[0]
            vet.book_dog_appointment(dog)

for bird in birds:
    print(f"Checking information for bird {bird.name}")

    if bird.hungry():
        pet_sdk.order_bird_food(bird, amount=bird.food_needed)
    
    bird.clean_cage()

    if bird.sick():
        available_vets = pet_sdk.find_birds(animal="bird")
        if len(available_vets) > 0:
            vet = available_vets[0]
            vet.book_bird_appointment(bird)
Snippet 2: Order Cat, Dog, and Bird Food; Book Vet Appointment

It would be good to condense Snippet 2’s repetitive logic into a loop, so we set out to rewrite the code. We quickly realize that, because each function is named differently, we can’t determine which one (e.g., book_bird_appointment, book_cat_appointment) to call in our loop:

import pet_sdk

all_animals = pet_sdk.get_birds() + pet_sdk.get_cats() + pet_sdk.get_dogs()

for animal in all_animals:
    # What now?
Snippet 3: What Now?

Let’s imagine a turbocharged version of Python in which we can write programs that automatically generate the final code we want—one in which we can flexibly, easily, and fluidly manipulate our program as though it were a list, data in a file, or any other common data type or program input:

import pet_sdk

for animal in ["cat", "dog", "bird"]:
    animals = pet_sdk.get_{animal}s() # When animal is "cat", this
                                      # would be pet_sdk.get_cats()

    for animal in animal:
        pet_sdk.order_{animal}_food(animal, amount=animal.food_needed)
        # When animal is "dog" this would be
        # pet_sdk.order_dog_food(dog, amount=dog.food_needed)
Snippet 4: TurboPython: An Imaginary Program

This is an example of a macro, available in languages such as Rust, Julia, or C, to name a few—but not Python.

This scenario is a great example of how it could be useful to write a program that’s able to modify and manipulate its own code. This is precisely the draw of macros, and it’s one of many answers to a bigger question: How can we get a program to introspect its own code, treating it as data, and then act on that introspection?

Broadly, all techniques that can accomplish such introspection fall under the blanket term “metaprogramming.” Metaprogramming is a rich subfield in programming language design, and it can be traced back to one important concept: code as data.

Reflection: In Defense of Python

You might point out that, although Python may not provide macro support, it offers plenty of other ways to write this code. For example, here we use the isinstance() method to identify the class our animal variable is an instance of and call the appropriate function:

# An SDK that can give us information about pets - unfortunately, the functions
# are slightly different

import pet_sdk

def process_animal(animal):
    if isinstance(animal, pet_sdk.Cat):
        animal_name_type = "cat"
        order_food_fn = pet_sdk.order_cat_food
        care_fn = animal.clean_litterbox 
    elif isinstance(animal, pet_sdk.Dog):
        animal_name_type = "dog"
        order_food_fn = pet_sdk.order_dog_food
        care_fn = animal.walk
    elif isinstance(animal, pet_sdk.Bird):
        animal_name_type = "bird"
        order_food_fn = pet_sdk.order_bird_food
        care_fn = animal.clean_cage
    else:
        raise TypeError("Unrecognized animal!")
    
    print(f"Checking information for {animal_name_type} {animal.name}")
    if animal.hungry():
        order_food_fn(animal, amount=animal.food_needed)
    
    care_fn()

    if animal.sick():
        available_vets = pet_sdk.find_vets(animal=animal_name_type)
        if len(available_vets) > 0:
            vet = available_vets[0]
            # We still have to check again what type of animal it is
            if isinstance(animal, pet_sdk.Cat):
                vet.book_cat_appointment(animal)
            elif isinstance(animal, pet_sdk.Dog):
                vet.book_dog_appointment(animal)
            else:
                vet.book_bird_appointment(animal)


all_animals = pet_sdk.get_birds() + pet_sdk.get_cats() + pet_sdk.get_dogs()
for animal in all_animals:
    process_animal(animal)
Snippet 5: An Idiomatic Example

We call this type of metaprogramming reflection, and we’ll come back to it later. Snippet 5’s code is still a little cumbersome but easier for a programmer to write than Snippet 2’s, in which we repeated the logic for each listed animal.

Challenge

Using the getattr method, modify the preceding code to call the appropriate order_*_food and book_*_appointment functions dynamically. This arguably makes the code less readable, but if you know Python well, it’s worth thinking about how you might use getattr instead of the isinstance function, and simplify the code.


Homoiconicity: The Importance of Lisp

Some programming languages, like Lisp, take the concept of metaprogramming to another level via homoiconicity.

homoiconicity (noun)

The property of a programming language whereby there is no distinction between code and the data on which a program is operating.

Lisp, created in 1958, is the oldest homoiconic language and the second-oldest high-level programming language. Getting its name from “LISt Processor,” Lisp was a revolution in computing that deeply shaped how computers are used and programmed. It’s hard to overstate how fundamentally and distinctively Lisp influenced programming.

Emacs is written in Lisp, which is the only computer language that is beautiful. Neal Stephenson

Lisp was created only one year after FORTRAN, in the era of punch cards and military computers that filled a room. Yet programmers still use Lisp today to write new, modern applications. Lisp’s primary creator, John McCarthy, was a pioneer in the field of AI. For many years, Lisp was the language of AI, with researchers prizing the ability to dynamically rewrite their own code. Today’s AI research is centered around neural networks and complex statistical models, rather than that type of logic generation code. However, the research done on AI using Lisp—especially the research performed in the ’60s and ’70s at MIT and Stanford—created the field as we know it, and its massive influence continues.

Lisp’s advent exposed early programmers to the practical computational possibilities of things like recursion, higher-order functions, and linked lists for the first time. It also demonstrated the power of a programming language built on the ideas of lambda calculus.

These notions sparked an explosion in the design of programming languages and, as Edsger Dijkstra, one of the greatest names in computer science put it, […] assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.”

This example shows a simple Lisp program (and its equivalent in more familiar Python syntax) that defines a function “factorial” that recursively calculates the factorial of its input and calls that function with the input “7”:

Lisp Python
(defun factorial (n)
(if (= n 1)
1
(* n (factorial (- n 1)))))

(print (factorial 7))

def factorial(n):
    if n == 1:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(7))

Code as Data

Despite being one of Lisp’s most impactful and consequential innovations, homoiconicity, unlike recursion and many other concepts Lisp pioneered, did not make it into most of today’s programming languages.

The following table compares homoiconic functions that return code in both Julia and Lisp. Julia is a homoiconic language that, in many ways, resembles the high-level languages you may be familiar with (e.g., Python, Ruby).

The key piece of syntax in each example is its quoting character. Julia uses a : (colon) to quote, while Lisp uses a ' (single quote):

Julia Lisp
function function_that_returns_code()
return :(x + 1)
end
(defun function_that_returns_code ()
    '(+ x 1))

In both examples, the quote beside the main expression ((x + 1) or (+ x 1)) transforms it from code that would have been evaluated directly into an abstract expression that we can manipulate. The function returns code—not a string or data. If we were to call our function and write print(function_that_returns_code()), Julia would print the code stringified as x+1 (and the equivalent is true of Lisp). Conversely, without the : (or ' in Lisp), we would get an error that x was not defined.

Let’s return to our Julia example and extend it:

function function_that_returns_code(n)
    return :(x + $n)
end

my_code = function_that_returns_code(3)
print(my_code) # Prints out (x + 3)

x = 1
print(eval(my_code)) # Prints out 4
x = 3
print(eval(my_code)) # Prints out 6
Snippet 6: Julia Example Extended

The eval function can be used to run the code that we generate from elsewhere in the program. Note that the value printed out is based on the definition of the x variable. If we tried to eval our generated code in a context where x wasn’t defined, we’d get an error.

Homoiconicity is a powerful kind of metaprogramming, able to unlock novel and complex programming paradigms in which programs can adapt on the fly, generating code to fit domain-specific problems or new data formats encountered.

Take the case of WolframAlpha, where the homoiconic Wolfram Language can generate code to adapt to an incredible range of problems. You can ask WolframAlpha, “What’s the GDP of New York City divided by the population of Andorra?” and, remarkably, receive a logical response.

It seems unlikely that anyone would ever think to include this obscure and pointless calculation in a database, but Wolfram uses metaprogramming and an ontological knowledge graph to write on-the-fly code to answer this question.

It’s important to understand the flexibility and power that Lisp and other homoiconic languages provide. Before we dive further, let’s consider some of the metaprogramming options at your disposal:

  Definition Examples Notes
Homoiconicity A language characteristic in which code is “first-class” data. Since there is no separation between code and data, the two can be used interchangeably.
  • Lisp
  • Prolog
  • Julia
  • Rebol/Red
  • Wolfram Language
Here, Lisp includes other languages in the Lisp family, like Scheme, Racket, and Clojure.
Macros A statement, function, or expression that takes code as input and returns code as output.
  • Rust’s macro_rules!, Derive, and procedural macros
  • Julia’s @macro invocations
  • Lisp’s defmacro
  • C’s #define
(See the next note about C’s macros.)
Preprocessor Directives (or Precompiler) A system that takes a program as input and, based on statements included in the code, returns a changed version of the program as output.
  • C’s macros
  • C++’s # preprocessor system
C’s macros are implemented using C’s preprocessor system, but the two are separate concepts.

The key conceptual difference between C’s macros (in which we use the #define preprocessor directive) and other forms of C preprocessor directives (e.g., #if and #ifndef) is that we use the macros to generate code while using other non-#define preprocessor directives to conditionally compile other code. The two are closely related in C and in some other languages, but they’re different types of metaprogramming.

Reflection A program’s ability to examine, modify, and introspect its own code.
  • Python’s isinstance, getattr, functions
  • JavaScript’s Reflect and typeof
  • Java’s getDeclaredMethods
  • .NET’s System.Type class hierarchy
Reflection can occur at compile time or at run time.
Generics The ability to write code that’s valid for a number of different types or that can be used in multiple contexts but stored in one place. We can define the contexts in which the code is valid either explicitly or implicitly.

Template-style generics:

Parametric polymorphism:

Generic programming is a broader topic than generic metaprogramming, and the line between the two isn’t well defined.

In this author’s view, a parametric type system only counts as metaprogramming if it’s in a statically typed language.

A Reference for Metaprogramming

Let’s look at some hands-on examples of homoiconicity, macros, preprocessor directives, reflection, and generics written in various programming languages:

# Prints out "Hello Will", "Hello Alice", by dynamically creating the lines of code
say_hi = :(println("Hello, ", name))

name = "Will"
eval(say_hi)

name = "Alice"
eval(say_hi)
Snippet 7: Homoiconicity in Julia
int main() {
#ifdef _WIN32
    printf("This section will only be compiled for and run on windows!\n");
    windows_only_function();
#elif __unix__
    printf("This section will only be compiled for and run on unix!\n");
    unix_only_function();
#endif
    printf("This line runs regardless of platform!\n");
    return 1;
}
Snippet 8: Preprocessor Directives in C
from pet_sdk import Cat, Dog, get_pet

pet = get_pet()

if isinstance(pet, Cat):
    pet.clean_litterbox()
elif isinstance(pet, Dog):
    pet.walk()
else:
    print(f"Don't know how to help a pet of type {type(pet)}")
Snippet 9: Reflection in Python
import com.example.coordinates.*;

interface Vehicle {
    public String getName();
    public void move(double xCoord, double yCoord);
}

public class VehicleDriver<T extends Vehicle> {
    // This class is valid for any other class T which implements
    // the Vehicle interface
    private final T vehicle;

    public VehicleDriver(T vehicle) {
        System.out.println("VehicleDriver: " + vehicle.getName());
        this.vehicle = vehicle;
    }

    public void goHome() {
        this.vehicle.move(HOME_X, HOME_Y);
    }

    public void goToStore() {
        this.vehicle.move(STORE_X, STORE_Y);
    }
    
}
Snippet 10: Generics in Java
macro_rules! print_and_return_if_true {
    ($val_to_check: ident, $val_to_return: expr) => {
        if ($val_to_check) {
            println!("Val was true, returning {}", $val_to_return);
            return $val_to_return;
        }
    }
}

// The following is the same as if for each of x, y, and z,
// we wrote if x { println!...}
fn example(x: bool, y: bool, z: bool) -> i32 {
    print_and_return_if_true!(x, 1);
    print_and_return_if_true!(z, 2);
    print_and_return_if_true!(y, 3);
}
Snippet 11: Macros in Rust

Macros (like the one in Snippet 11) are becoming popular again in a new generation of programming languages. To successfully develop these, we must consider a key topic: hygiene.

Hygienic and Unhygienic Macros

What does it mean for code to be “hygienic” or “unhygienic”? To clarify, let’s look at a Rust macro, instantiated by the macro_rules! function. As the name implies, macro_rules! generates code based on rules we define. In this case, we’ve named our macro my_macro, and the rule is “Create the line of code let x = $n”, where n is our input:

macro_rules! my_macro {
    ($n) => {
        let x = $n;
    }
}

fn main() {
    let x = 5;
    my_macro!(3);
    println!("{}", x);
}
Snippet 12: Hygiene in Rust

When we expand our macro (running a macro to replace its invocation with the code it generates), we would expect to get the following:

fn main() {
    let x = 5;
    let x = 3; // This is what my_macro!(3) expanded into
    println!("{}", x);
}
Snippet 13: Our Example, Expanded

Seemingly, our macro has redefined variable x to equal 3, so we may reasonably expect the program to print 3. In fact, it prints 5! Surprised? In Rust, macro_rules! is hygienic with respect to identifiers, so it would not “capture” identifiers outside of its scope. In this case, the identifier was x. Had it been captured by the macro, it would have been equal to 3.

hygiene (noun)

A property guaranteeing that a macro’s expansion will not capture identifiers or other states from beyond the macro’s scope. Macros and macro systems that do not provide this property are called unhygienic.

Hygiene in macros is a somewhat controversial topic among developers. Proponents insist that without hygiene, it is all too easy to subtly modify your code’s behavior by accident. Imagine a macro that is significantly more complex than Snippet 13 used in complex code with many variables and other identifiers. What if that macro used one of the same variables as your code—and you didn’t notice?

It’s not unusual for a developer to use a macro from an external library without having read the source code. This is especially common in newer languages that offer macro support (e.g., Rust and Julia):

#define EVIL_MACRO website="

int main() {
    char *website = "
    EVIL_MACRO
    send_all_my_bank_data_to(website);
    return 1;
}
Snippet 14: An Evil C Macro

This unhygienic macro in C captures the identifier website and changes its value. Of course, identifier capture isn’t malicious. It’s merely an accidental consequence of using macros.

So, hygienic macros are good, and unhygienic macros are bad, right? Unfortunately, it’s not that simple. There’s a strong case to be made that hygienic macros limit us. Sometimes, identifier capture is useful. Let’s revisit Snippet 2, where we use pet_sdk to provide services for three kinds of pets. Our original code started out like this:

birds = pet_sdk.get_birds()
cats = pet_sdk.get_cats()
dogs = pet_sdk.get_dogs()

for cat in cats:
    # Cat specific code
for dog in dogs:
    # Dog specific code
# etc…
Snippet 15: Back to the Vet—Recalling pet sdk

You will recall that Snippet 3 was an attempt to condense Snippet 2’s repetitive logic into an all-inclusive loop. But what if our code depends on the identifiers cats and dogs, and we wanted to write something like the following:

{animal}s = pet_sdk.get{animal}s()
for {animal} in {animal}s:
    # {animal} specific code
Snippet 16: Useful Identifier Capture (in Imaginary “TurboPython”)

Snippet 16 is a bit simple, of course, but imagine a case where we would want a macro to write 100% of a given portion of code. Hygienic macros might be limiting in such a case.

While the hygienic versus unhygienic macro debate can be complex, the good news is that it’s not one in which you have to take a stance. The language you’re using determines whether your macros will be hygienic or unhygienic, so bear that in mind when using macros.

Modern Macros

Macros are having a bit of a moment now. For a long time, the focus of modern imperative programming languages shifted away from macros as a core part of their functionality, eschewing them in favor of other types of metaprogramming.

The languages that new programmers were being taught in schools (e.g., Python and Java) told them that all they needed was reflection and generics.

Over time, as those modern languages became popular, macros became associated with intimidating C and C++ preprocessor syntax—if programmers were even aware of them at all.

With the advent of Rust and Julia, however, the trend has shifted back to macros. Rust and Julia are two modern, accessible, and widely used languages that have redefined and popularized the concept of macros with some new and innovative ideas. This is especially exciting in Julia, which looks poised to take the place of Python and R as an easy-to-use, “batteries included” versatile language.

When we first looked at pet_sdk through our “TurboPython” glasses, what we really wanted was something like Julia. Let’s rewrite Snippet 2 in Julia, using its homoiconicity and some of the other metaprogramming tools that it offers:

using pet_sdk

for (pet, care_fn) = (("cat", :clean_litterbox), ("dog", :walk_dog), ("dog", :clean_cage))
    get_pets_fn = Meta.parse("pet_sdk.get_${pet}s")
    @eval begin
        local animals = $get_pets_fn() #pet_sdk.get_cats(), pet_sdk.get_dogs(), etc.
        for animal in animals
            animal.$care_fn # animal.clean_litterbox(), animal.walk_dog(), etc.
        end
    end
end
Snippet 17: The Power of Julia’s Macros—Making pet_sdk Work for Us

Let’s break down Snippet 17:

  1. We iterate through three tuples. The first of these is ("cat", :clean_litterbox), so the variable pet is assigned to "cat", and the variable care_fn is assigned to the quoted symbol :clean_litterbox.
  2. We use the Meta.parse function to convert a string into an Expression, so we can evaluate it as code. In this case, we want to use the power of string interpolation, where we can put one string into another, to define what function to call.
  3. We use the eval function to run the code that we’re generating. @eval begin… end is another way of writing eval(...) to avoid retyping code. Inside the @eval block is code that we’re generating dynamically and running.

Julia’s metaprogramming system truly frees us to express what we want the way we want it. We could have used several other approaches, including reflection (like Python in Snippet 5). We also could have written a macro function that explicitly generates the code for a specific animal, or we could have generated the entire code as a string and used Meta.parse or any combination of those methods.

Julia is perhaps one of the most interesting and compelling examples of a modern macro system but it’s not, by any means, the only one. Rust, as well, has been instrumental in bringing macros in front of programmers once again.

In Rust, macros feature much more centrally than in Julia, though we won’t explore that fully here. For a bevy of reasons, you cannot write idiomatic Rust without using macros. In Julia, however, you could choose to completely ignore the homoiconicity and macro system.

As a direct consequence of that centrality, the Rust ecosystem has really embraced macros. Members of the community have built some incredibly cool libraries, proofs of concept, and features with macros, including tools that can serialize and deserialize data, automatically generate SQL, or even convert annotations left in code to another programming language, all generated in code at compile time.

While Julia’s metaprogramming might be more expressive and free, Rust is probably the best example of a modern language that elevates metaprogramming, as it’s featured heavily throughout the language.

An Eye to the Future

Now is an incredible time to be interested in programming languages. Today, I can write an application in C++ and run it in a web browser or write an application in JavaScript to run on a desktop or phone. Barriers to entry have never been lower, and new programmers have information at their fingertips like never before.

In this world of programmer choice and freedom, we increasingly have the privilege to use rich, modern languages, which cherry-pick features and concepts from the history of computer science and earlier programming languages. It’s exciting to see macros picked up and dusted off in this wave of development. I can’t wait to see what a new generation’s developers will do as Rust and Julia introduce them to macros. Remember, “code as data” is more than just a catchphrase. It’s a core ideology to keep in mind when discussing metaprogramming in any online community or academic setting.

‘Code as data’ is more than just a catchphrase.

Metaprogramming’s 64-year history has been integral to the development of programming as we know it today. While the innovations and history we explored are just a corner of the metaprogramming saga, they illustrate the robust power and utility of modern metaprogramming.



منبع

Matthew Newman

Matthew Newman Matthew has over 15 years of experience in database management and software development, with a strong focus on full-stack web applications. He specializes in Django and Vue.js with expertise deploying to both server and serverless environments on AWS. He also works with relational databases and large datasets
[ Back To Top ]