Lucas Sifoni

Hosting a small language (Ovo2) from scratch in Elixir, pt 1
gathering requirements from a previous experiment.

elixir parsing explorations


full code hosted on github

I wanted to try for a while to have a (another, really) mini-language with a very small footprint, not allowing anything else than small transformations of data, in a visual programming environment. My language had to fill these needs :

And my language should not have :

These goals would allow me to create a simple visual editor operating on a simplified graphical representation of the language’s AST. Ovo2 worked enough for a proof of concept, but was hosted in Typescript. I want to steer this project towards an end-to-end Elixir environment and its side-exploration project status makes this possible.

A data labelling UI, from which you define data accessors in Ovo2

The language also had a few quirks in its parsing strategy, as I wanted to “re-discover” parser combinators without documentation as an exercise, so expression nesting wasn’t totally supported. Having read in-between about many parsing and lexing strategies, I should now be able to complete that task without resorting to top-level assignments.

An Ovo2 data accessor being defined.

Syntax samples

Here would be a sample of the syntax of my language :

add1 = \w -> w + 1 end

add3 = \w ->
    z = w + 1
    z + 2
end

map([1,2], add3)

fn = data°first_name
ln = data°last_name

join2 = \strs, spacer ->
 reduce(strs, \out, s ->
    out <> spacer <> s
 end)
end

join2([fn, ln], ` `)

This should look quite familiar, and really is the current state of a small language I called ovo2 and implemented in typescript for my no-code design automation environment side-project presented here in french.

What can we learn about ovo2 from these excerpts ?

add1 = \w -> w + 1 end

We have statements, in the form of assignment of the result of an expression (here a lambda) to a symbol.

add3 = \w ->
    z = w + 1
    z + 2
end

Lambdas seem to return the result of the last evaluated expression. In fact, the result of an ovo2 program is the last evaluated expression.

map([1,2], add3)

We have function calls, made with a symbol followed by expressions, separated with commas, between parentheses.

fn = data°first_name

An oddity : ° is the access operator. It works on maps, so ovo2 has a kind of dictionary or map. The symbol data is pre-filled with the program input data and has a special status.

Not shown in the above snippets is the ability to have shadowing.

foo = 5
addFoo = \n -> n + foo end
foo = 6
addFoo(2)

Will return 7, and not 8, since the lambda properly captured the value of foo in its environment.

join2 = \strs, spacer ->
 reduce(strs, \out, s ->
    out <> spacer <> s
 end)
end

join2([fn, ln], ` `)

We can see here that we also have syntax for lists, that can include primitive values or symbols, and strings, delimited by `. String concatenation is the operator <>. I wanted to make clear that ovo2 is strongly typed, and there are no implicit conversions between numbers and strings.

Ovo2 also has booleans, in the form of T and F, and conditions, which are expressions. Since an expression must produce a value, you cannot have single-branch ifs and will always have to provide an alternative result.

if (w == 1) do
    w + 1
else do
    w + 2
end

This means we have infix operators, most of which are for arithmetic or comparison. All comparisons are done by value, and in the case of complex data, are done by comparing the full data.

That’s already quite a lot of requirements actually, but it worked well in practice. Another (funky) feature is the ability to switch the language that should be used to write the program, with a second special symbol called lang, on which you can assign one of three supported values.

As such, the builtin environment for ovo2 in the typescript implementation had translations in french, english, and “lengthy french” for various built-in functions :

    join: {
        fr: 'coller',
        en: 'join',
        long_fr: 'rassembler',
        _fn: (env: Ovo2Env, list: Ovo2Node, glue: Ovo2Node) : [Ovo2Env, Ovo2Node] => {
            let checked_list = argToList(list, env); // evaluation to a list, throws type errors if needed.
            let checked_glue = argToString(glue, env); // evaluation to a string, throws type errors if needed.
            const reduced_string = checked_glue.children
                .map(node => do_evalOvo2([node], env).value)
                .join(checked_glue.value);

            return [
                env,
                {
                    kind: 'string',
                    value: reduced_string,
                }
            ];
        }
    }

So, if you had called lang = long_fr, you could write rassembler([`foo`, `bar`], `glue`) instead of join([`foo`, `bar`], `glue`). Why not ? :^) .

We’ll leave this behind though ! In the next part, we’ll see how we can formalize this syntax and draft a parsing strategy.


Previous post : Raphaël's 300mm f/3.9 mirror
Next post : Hosting a small language (Ovo2) from scratch in Elixir, pt 2
Tokenization