Creating Your Own Languages

in Scala

Titus Stone

Part 1:

About Languages & Parsing

There are very few professions where the skills of the profession are the same skills to building tools for that profession

  • Machinist
  • Programming
...the main reason we take the trouble to develop high-level languages is to get leverage, so that we can say (and more importantly, think) in 10 lines of a high-level language what would require 1000 lines of machine language.

-- Paul Graham, Succinctness is Power

We're often driven to think up our own languages in pursuit of something that allows us to reason about code in an easier way

But how?

"I'll just use regex"

Good Luck With That

{{for r in row}}
  <tr class="{{}}">
    {{for c in column}}
        {{if c.isNumeric}}
          <td class="numeric">{{c}}</td>

How do we know which {{end}} closes which {{for}}?

Designing a Language to Think In:

SMACSS as a Language ==


in SMACSS...

Modules are re-usable UI components

/* the "bar-button" module */
.m-bar-button {

If we make modules a first-class citizen, we create a language that lets us think in terms of modular components instead of an upchuck of CSS smeared all over various files.

Language Rules:

  • Styles are defined within the scope of a module
  • Style are written in standard CSS
  • Modules can express a lineage that will be reflected in their rendered class names.
  • Any property values within a module can be referenced by it's location in the tree relative to the module.
  • Selectors within a module can only be a single expression (compound selectors are a code smell indicating that you probably need a child module).

Smack Lang Sample

@module title >> heading {
  this {
    font-size: 12px;

  h1 {
    font-size: @this.font-size * 2;

  h2 {
    font-size: @this.h1.font-size - 4;


.heading-title {
  font-size: 12px;

.heading-title h1 {
  font-size: 24px;

.heading-title {
  font-size: 20px;

What is a Scalable Approach?

  1. Parse -- Turn the blob of text into a tree structure
  2. Transform -- Turn the tree nodes into something meaningful
  3. Act -- Do something with the meaningful representation

Option #1:

Formal Language Specification: Backus–Naur Form (BNF) or Extended Backus–Naur Form (EBNF)

Language descriptions can be fed to a parser generator.

The parser generator generates code for a given language, giving you a parser. The parser will generate a syntax tree which can be transformed and acted on.

EBNF Sampler (Battleship)

bomb location = alphabetic character, number ;
number = "0" | "1" ... | "10" ;
alphabetic character = "A" | "B" | "C" ... | "J" ;

, is a sequence -- a follows b

| is alternation -- a or b could be here



  • Language agnostic / portable
  • Most performant
  • Allows others to study and reason about the language
  • Copy/paste BNF from other languages that inspired yours


  • Long and tedious language specs (ie. bad for just trying things out)
  • More difficult transformation step*

Option #2:

Write a parser from scratch using your own parser combinators functions

From Scratch


  • Fun and interesting on a geeky computer science level
  • Most creative, allows really outside-the-box choices
  • Better opportunity for humane error messages
  • Complete control over everything


  • Likely the worst performance
  • Potentially large investment in boilerplate/setup

Real World Example: Scala compiler

Option #3:

Write a parser using a parser combinator library



  • Still rather fun and interesting
  • Enjoyable DSL
  • Extremely fast to build and use
  • Easier transformation step*
  • Decent performance
  • Easy to make changes


  • Language specific / not portable
  • Some control given up for convinience depending on the library

Part 2:

Live Coding a CSS Preprocessor

Using Scala's Parser Combinator Library

Let's start simple...

@module breadcrumb {
  div {
    font-size: 12px;

  ul {
    list-style-type: none;

Pattern Matching:
Exact Value

"foo" match {
  case "foo" => "bar"
  case _     => "nada"

Roughly same as...

switch("foo") {
  case "foo":
    return "bar";

    return "nada";

Pattern Matching:

val x: Any = 1      // x is actually an Int

x match {
  case s: String => s
  case a         => a.toString

Variables declared in the case are scoped within the block that's executed for that case.

Pattern Matching:

val age = 16

age match {
  case a if a > 65          => relax(p)
  case 65                   => retire(p)
  case a if a <= 2          => sleepAndPoop(p)
  case a                    => work(p)

Pattern Matching:

Types can be represented as case class'es

case class Person(name: String, age: Int)

Case classes can be pattern matched against their constructor

Person("Titus", 31) match {
  case p: Person("Titus", _)   => titusSlideshow(
  case Person(name: String, _) => genericSlideshow(name)

Note that this is possible,
but it's ugly and verbose

Person("Titus", 31) match {
  // meh
  case p: Person if == "Titus" => titusSlideshow(

  // preferred
  case p: Person("Titus", _)          => titusSlideshow(

Thinking in Scala: Think patterns, not equality

Understanding the Tilde:

Imagine a class named ~

case class Sequence(left: Parser, right: Parser)
case class ~(left: Parser, right: Parser)

It could be pattern matched just like any other class

val tilde = new ~(x, y)
tilde match {
  case ~(a, b)        => ...
  case Sequence(a, b) => ...

What exactly is this:
"foo" ~ "bar"?

Like Ruby, in Scala operators are simply methods on the left most object written in infix notation

// equivalent
"foo" ~ "bar"

...but String doesn't have a ~ method

And even if it did how would it know to return Parser[String]?

Within the parser library is something that looks like this:

implicit def stringToRichString(s: String) = new RichString(s)

class RichString(left: String) {
  def ~(right: String) =
    new Parser[String](left) + new Parser[String](right)
  1. When Scala compiles* "foo" ~ "bar", it first looks to see if String has a ~ method.
  2. When that fails it looks in the current scope for any ~ method
  3. It finds one on RichString but "foo" isn't of that type.
  4. So it checks for an implicit method that will convert String => RichString and finds one.
  5. It runs "foo" through stringToRichString, making it a RichString then calls the ~ function, resutling in a new Parser[String]

* = this whole process would be horribly unperformant in an interpreted language like Python or Ruby