Domain Specific Languages

I want to make the argument that what makes domain specific languages useful is not the parts you see (visially, like special built-in functions) but rather the parts you do not see - specifically scopes, or the environment that functions execute in. Consider this program:

#include <stdio.h>

int i = 1000;

void printi(void) {
    printf("i == %d\n", i);
}

int main(int argv, char** argc) {
    int i = 20;
    for (int i = 0; i < 5; i++) {
        printf("i == %d\n", i);
    }
    printf("i == %d\n", i);
    printi();
    return 0;
}

$ ./test
i == 0
i == 1
i == 2
i == 3
i == 4
i == 20
i == 1000

You see how "i" refers to different things at different times? The machinery that decides which "i" is which is invisible to the programmer. The programmer must simply learn these rules - they are not explicit. Another example from the bizzare "var" in Javascript (copied from here):

bla = 2;
var bla;
console.log(bla); // 2

This is valid code even though "bla" is not declared until "after" it is set (in the order of operations). Because there is actually hidden processing that the Javascript interpreter does before "executing" (or compiling) the code. If you have a simple model in your head that lines are executed one after the other, and nested expressions are executed inside out, things like "var" would seem weird and confusing. But you learn these rules and internalize them.

Hidden stuff is good

It might be tempting to think that hidden things are bad and everything should be explicit. However I believe that hidden things can be very good. In human languages we have idioms or "turn of phrase" (I think "turn of phrase" might be a turn of phrase itself...). Good ol Wikipedia:

An idiom is a common word or phrase with a figurative, non-literal meaning
that is understood culturally and differs from what its composite words'
denotations would suggest

Using them can communicate a fairly large amount of subtle information, provided both parties understand them. That information is not encoded explicitly in the idiom so it must be communicated separately some other way and memorised by both parties. This is clunky and frustrating, but once done it allows a significant depth and speed of communication. It creates fluency.

What does this have to do with DSL's?

Consider some hypothetical 2D drawing DSL:

function mySquare() {
    square(10, 10);
}

translate(20, 40) {
    mySquare();
}

You can imagine that it would "output" a 10x10 square, at position 20x40. But there are a few things about this:

  • What is the output of square(10, 10) exactly? Where does it go?
  • Is it returned by the mySquare function? Or is it more like a print function that is IO to some channel?
  • What does the translate function modify exactly? Is it a hidden context?
  • Is that context used to modify what square (and anything else inside mySquare if it were in there)?

The answers to these would all be needed by the programmer to understand the hidden idioms of the DSL. Once learned and internalized the programmer would be able to fluently use the DSL to express shapes with a small amount of code.

But a good idiom also hints at it's meaning. To say: "But he was barking up the wrong tree" when you are in a conversation about someone who was not able to acomplish something, you can make a good guess as to the meaning of the idiom. Like the 2D code above, you can make a good guess as to what the code does without knowing the hidden details.