Constructing quine loops with QCC


[This is a follow-up on a previous blog post on QCC, the Quining C Compiler which is designed to transform any single-file C program into a valid quine.]

Recall that a quine is a program that can print its own source code, or more generally, perform some meaningful computation that includes its own source code as an input. One of the logical questions once we have an application for automatically constructing quines is asking whether it can be generalized to multiple programs. The objective is creating programs A and B such that:

  • A can print the source code for B
  • B functions as the mirror image and can print the source code for A

This is true in a trivial way when A and B are the same program, so there is an implied assumption of A≠B to make it worthwhile. Of course, the criteria “unequal” itself is subjective: if the two programs differ in a trivial way, the problem can reduce to the case of the single-quine. We will not try to formalize this definition of A and B being “meaningfully different” but the following examples will clarify the intent.

1. Simple quine-loop with multiple languages

Here we construct a simplified version of the well-known quine Ouroboros. limiting our cycle to two languages for simplicity: C and Python. The objective is to come up with two program A and B such that:

  • A is a valid C program that prints out the source code of B
  • B is a valid Python program that prints out the source code of A

The use of different languages here serves as the separation between A and B. 1

The construction proceeds along the same lines of using QCC in general: first we write a “prequine”— an almost-valid C program relying on a nonexistent get_self() function, assumed to return its own source code. QCC converts that bogus C program into a proper quine by applying source-level transformations and supplying an implementation of the mystery function consistent with the modified source.

Prerequisite: writing programs that output strings

There is one more building block necessary for this construction. It happens to be quite straightforward, involving no recursive self-reference or strange loops: we need to be able to convert a string into a Python program that prints that string when invoked. That is, we want a C program that takes as input a string and outputs a valid Python program whose only function is to print that string. This may sound complicated, but notice the end product is a minor variation on the canonical Hello World application for Python.

Main difference: instead of printing a greeting, our target Python program is hard-wired to print some other constant string determined at construction time. About the only tricky aspect involves escaping special characters from that input. The text we are supposed to output may contain arbitrary ASCII or even Unicode symbols. So we cannot simply enclose it in single quotes as in the hello world example and call it a day. If there was a single quote present in the included string, it would terminate the Python string prematurely and cause parsing errors on the remainder. Instead, we process the input one character at a time and special-case some characters consistent with the way Python expects string escaping to work.

With a little help from Claude 4.8, the result is a small C program characterized by this behavior:

  • Expects to receive one command line argument as string
  • Writes a Python3 program to stdout
  • Where this Python code is constructed to print that exact string to stdout when invoked by a Python interpreter

From prequine to quine

Stepping back, we have created a pair of programs A and B in two different languages:

A: C program that outputs a Python program (call it “B”) based on external input S
B: Python program that outputs this hard-coded string S

This is starting to resemble the A-B quine loop defined in the objectives. The missing piece holding back the closure of the loop is that free-floating, external string S provided as input. If we could replace S by the source code of A, the cycle will be complete. Looked another way, we need to modify A such that it ignores the command line arguments and runs the same Python-generation logic on its own source code.

This is exactly what QCC solves for. First, we create the prequine, by rewiring A to use its own source code get_self() as the input. This is also an opportunity to clean up the now redundant niceties around checking command line arguments and printing helpful error messages. Next, we run QCC on the prequine to generate the final quine. Compiling that and running through the steps proves the output from the final Python program is identical to the original C code:

Expanding the cycle

It is straightforward to expand this cycle to include multiple languages, by modifying only the starting C program. Recall that the driving force behind A is a function that writes Python programs. What if we add a second function that writes Rust in an analogous manner? That is we define a function that:

  • Receives as input an arbitrary string S
  • Returns a valid Rust program R such that when compiled and executed, R will print S

This function will be similar to the Python example, but instead use the hello-world template for Rust and apply Rust-specific string escaping rules.

Armed with this additional capability, we can extend the cycle by taking the output of the Python-writer and feeding it as input to the Rust-writer. Drawing out the sequence of transformations, the original chain looked like:

Self source-code ⟶ String-to-Python ⟶ Final output

Adding one more link to the chain:

Self source-code ⟶ String-to-Python ⟶ String-to-Rust ⟶ Final output

Instead of being a valid Python program, the output from the initial invocation is now a valid Rust program. When compiled and executed, it prints the Python program— its predecessor in the sequence— which in turn will print the original C code if invoked.

In principle there is no limit to the number of additional links that can be added here. In practice, one may need to watch for line limits of certain languages, as each transformation adds space overhead to the string being passed as input to the “printf” equivalent. At some point it may be necessary to break up the string into multiple lines.

2. Quine loops with useful programs

One limitation with the above example is that all programs in the cycle after A perform no “useful” work: their behavior is strictly predetermined to print something to stdout and exit. Meanwhile A as the starting point of the cycle can include arbitrary functionality, since QCC makes no assumptions about what its input program is doing. That raises a logical question: is there a way to apply QCC transformation to a pair of arbitrary programs A and B?

More precisely, suppose we have two programs A and B with arbitrary function— maybe A retrieves weather forecasts while B reports on World Cup scores. The objective is to create a quine loop out of these such that:

  • A and B retain their existing functionality
  • Both are augmented with the ability to print the source-code of the other app (That behavior would become conditional on some external input, such as a specific command line argument. This is similar to how the quine-version of QCC selects between doing its usual job of converting C programs to quines or printing its own code.)

This is straightforward with QCC alone provided both programs are written in C. The core idea is to create the prequine by “splicing” the two programs, interleaved with preprocessor macros to control which one actually gets compiled. Conceptually the merged version looks like:

#ifdef COMPILE_A
/* source code for A... */
#elifdef COMPILE_B
/* source code for B... */
#endif

Absent any other macro definitions, this code is equivalent to an empty file. But by prefixing the contents by a single line of code containing a #define directive, it can be cajoled to act as A or B. Both A and B are then developed under the assumption that get_self() will return this single, merged version regardless of whether it is called from A or B. By prepending one of the above macro definitions to the returned string, each app can choose to print its own code or that of its peer.

The Github repository for QCC has an example quine-loop constructed with this pattern.

  • Invoked without any arguments, program A prints the standard greeting, along its own identity “Alfa” (This is the intrinsic, non-quine related functionality.)
  • If the first argument is non-zero, A prints its own source code
  • Otherwise it prints the source code for B

Program B is the mirror image: it identifies itself as “Bravo” in the greeting and features conditional logic to print out either A or B based on the supplied command line argument.

There is a helper script in the same repository to merge the files, producing the combined A/B chameleon referenced above. That combination becomes the input to QCC, which supplies the necessary get_self() implementation. Finally the actual A and B quines are constructed from the QCC output by prepending a single line of code that defines the appropriate macro for selecting either A or B.

Putting it all together:

Two observation about this construction:

  1. Quines A and B differ by a single line— in fact a single letter in that line— even if their behavior at runtime can have arbitrary differences. This is an artifact of a shortcut taken here: recall that each program is “carved out” of a single block of code returned by get_self() using preprocessor macros and conditional compilation. As expedient as that approach is, it results in each program containing a lot of dead-code that is never compiled. With a little bit more work, we can avoid this by moving the carving-out process to runtime. Instead of relying on preprocess macros to select which block of code is visible to the compiler, we can have A and B perform basic text processing: locate the starting #ifdef line and corresponding #endif directive, then capture the section between those lines as a substring. (Note we still have to include the trailer following the conditional blocks, as the quine machinery is located there.) This will remove redundant. uncompiled code and make the underlying differences between A and B more prominent.
  2. The trick also extends naturally to more than two programs— in fact, the combination script is designed to accept any number of source files as input, automatically assigning successive letters of the alphabet to each program when crafting preprocessor macros. Once that merged file is run through QCC and individual quines created by prepending the appropriate macro definition, we end up with something more than a cycle: the collection resembles a complete graph. Every program in the collection can output the source-code for any other program in that collection. Contrast that with the standard Ouroboros cycle, where movement goes strictly in one direction. Here one can navigate the graph freely by asking any program for the source code of any other program.

CP

1 While it is possible to create C/Python polyglots to satisfy this trivially, this example will not be using that escape hatch.

Leave a comment