Lucas Sifoni

Variations on the "leverage language from elixir" pattern.

programmingelixir


It seems that a pattern emerged in my recent months or years of Elixir, and it looks like I am now knee-deep in this pattern : leveraging other languages from Elixir.

This leverage takes, it seems, three forms :

And it all depends on managing <language>‘s runtimes from Elixir, which takes two forms :

Note that I won’t talk of Rustler which is already covered a bit on the blog and now well-known. It is a fantastic tool but does not offer the “code-in-code” approach described here. Nevertheless, a Rustler NIF in a monorepo still achieves the benefits I’m talking about here.

First Form : Writing Elixir code that generates and runs <language> code.

I was looking for a library that would allow read & write access to the three main Ms Office document types, meaning Word, Powerpoint, and Excel. From a previous life, I had used PHPOffice and it provided the required functionality.

But I would not like to truly add PHP to my stack. Chatting with another developer, I mocked what kind of reduction would happen if we leveraged PHPOffice from elixir (yes, there’s an error on line 12).

So I set out to build a POC for just that. It would not be used because I finally chose to leverage Java and the Apache POI family of libraries instead of PHPOffice (more on that later).

You first need to be able to run PHP code and ideally somehow get its output :

# Check if PHP and composer are accessible :
iex> Xhpoffice.Prerequisites.clear?()
true

# Note : this is the second pattern, inlining code in Elixir !
iex> script = ~PHP"""
$foo = 5;
elixir_return(1 + 2 + $foo);
"""
iex> result = Xhpoffice.Php.run!(script)
8

After having that, I tried to reproduce the PHPOffice basic example :

import Xhpoffice.Word
output_path = Path.join(File.cwd!(), "output.docx")
doc = make_document()
{doc, section} = add_section(doc)
{doc, _section, _object} = add_text(doc, section, "How convenient !")
doc |> write(output_path) |> run!

This rather idiomatic API goes around the mutability requirements of the original library by making the calls take a document or section explicitly, and returning them.

In the background, the document accumulates a tree of calls specific to PHPWord, generating variable IDS and keeping track of them :

[
  [
    {:assign, "941995C2B5135245424CA0615E6CBDE7",
     {:static, "\\PhpOffice\\PhpWord\\IOFactory", "createWriter",
      [{:var, "674317FBD969EA74B1FF85C880B2CB92"}, "Word2007"]}},
    {:method, {:var, "941995C2B5135245424CA0615E6CBDE7"}, "save",
     ["<cwd>/output.docx"]}
  ],
  [
    {:assign, "536A810991F2B5C6E5C2CDE3EC99C921",
     {:method, {:var, "739CFE7AD01797F26E5C8B305A674886"}, "addText",
      ["How convenient !"]}}
  ],
  [
    {:assign, "739CFE7AD01797F26E5C8B305A674886",
     {:method, {:var, "674317FBD969EA74B1FF85C880B2CB92"}, "addSection", []}}
  ],
  {:assign, "674317FBD969EA74B1FF85C880B2CB92",
   {:new, "\\PhpOffice\\PhpWord\\PhpWord", []}}
]

It is then compiled to PHP :

<?php
$varF4F605C1FFBCF3CDC59374EBDF8B0DB8 = new \PhpOffice\PhpWord\PhpWord();
$var83F1F620C6BD472063B55A504626B8C8 = $varF4F605C1FFBCF3CDC59374EBDF8B0DB8->addSection();
$varF8440D2829E71AAC14AAFF423F0C15F9 = $var83F1F620C6BD472063B55A504626B8C8->addText("How convenient !");
$var65ADDF5337DA12DC1D21C4C819200176 = \PhpOffice\PhpWord\IOFactory::createWriter($varF4F605C1FFBCF3CDC59374EBDF8B0DB8,"Word2007");
$var65ADDF5337DA12DC1D21C4C819200176->save("<cwd>/xhpoffice/output.docx");
?>

The whole compiler is here : it does not support object values, but for this POC, did not need to. We can see that the pattern does not consist to be able to cover the full PHP language, but only the subset useful to imperatively build documents with PHPWord.


  def do_compile(calls) do
    spells = calls |> Enum.reverse() |> List.flatten()

    output =
      for spell <- spells do
        eval(spell)
      end

    joined = Enum.join(output, "\n")

    joined
  end

  def map_args(args), do: Enum.map_join(args, ",", &eval/1)
  def dollar(id), do: "$var#{id}"
  def eval({:assign, id, expr}), do: "#{dollar(id)} = #{eval(expr)};"
  def eval({:new, class, args}), do: "new #{class}(#{map_args(args)});"
  def eval({:var, id}), do: dollar(id)
  def eval({:method, subject, name, args}), do: "#{eval(subject)}->#{name}(#{map_args(args)});"
  def eval({:static, class, name, args}), do: "#{class}::#{name}(#{map_args(args)});"
  def eval(value) when is_binary(value), do: Jason.encode!(value)

There’s a bit of additionnal housekeeping related to how the PHP script is ran, its dependencies injected, but irrelevant here. The idea is to only cover the needed bases for your library. You’re not going to leverage the whole of <language>‘s ecosystem anyway.

Second Form : Embedding (à la Zigler) <language> code in Elixir and running it.

I’ve shown it above with the ~PHP sigil and this is what Zigler provides, so this section will be quite short.

I love multi-letter sigils (elixir > 1.15) as they provide two convenient features :

If you never used custom sigils, here is the general outline :

~CUSTOM"""
custom content here
"""
# is equivalent to
sigil_CUSTOM(contents, opts)

You are free to define the sigil_CUSTOM function and it does not need to return a binary.

As an example, the ~PHP sigil above returns a temporary file path. In DtsBuddy, a runtime overlay loader library I’m slowly working on for my hobby projects, the ~DTS sigil returns an {:ok, "file", "name"} 3-tuple.

In Alzo, my ~JS sigil only returns its content, but it is used to hook a custom formatter to the sigil. That way, editor format-on-save or mix format properly format the sigil’s content.

defmodule Alzo.SigilJS do
  @behaviour Mix.Tasks.Format

  def features(_opts), do: [sigils: [:JS], extensions: []]

  def format(contents, opts) do
    c = """
    <<'EOF'
    #{contents}
    EOF
    """
    command = "node #{@prettier_path} --parser acorn #{c}"
    port = Port.open({:spawn, command}, [:binary])
    receive do
      {^port, {:data, d}} -> d
      _ -> contents
    after
      1_000 -> contents
    end
  end

  def sigil_JS(str, _opts) do
    str
  end
end

We can see that prettier is called in a Port. In case it fails, the content is left un-formatted. Note that I am not passing user-generated content but my own source code only.

In sheerlox’s Nodelix, I’m working to add the ~NODE and ~PACKAGE sigils to add one-off JS script execution and dependencies installation to the library.

A few caveats if you’re willing to provide multi-letter sigils :

Note that you can circumvent the interpolation problem with a simple macro : for example, I can provide a nodescript macro in which a regular heredoc block can be used :

defmacro node_script(do_block) do
  {:sigil_NODE, [delimiter: "\"\"\"", context: Elixir.Nodelix.SigilNode, imports: []],
   [
     {:<<>>, [indentation: 0],
      [
        Keyword.get(do_block, :do)
      ]},
     []
   ]}
end

target = "world"
node_script do
"""
console.log("Hello #{target}")
"""

# equivalent to
sigil_NODE(
  """
  console.log("Hello world !")
  """,
  []
)

# equivalent to
~NODE"""
  console.log("Hello world !")
"""

My point with this section is that this pattern is very useful to me. If you are considering it, please make it opt-in (provide another way to achieve the outcome) and explicit (require manual handling of the final side effects).

Bonus : Syntax highlighting in the sigils

Syntax highlighting was on my plans, but I did not get to it yet. This discussion on ElixirForum made me lookup how to create a VSCode / Zed plugin to highlight foreign languages in sigils.

https://elixirforum.com/t/script-tags-in-heex-and-app-js/

The general, non-editor-specific way is to create a tree-sitter grammar that will match the opening and closing forms of the sigils, then delegate to an existing language. This is easier to do than, for example, a complete grammar for HEEX, that looks like HTML but is not HTML. If you like to write lexers and parsers you should feel right at home :-).

I’ve put up a repo with the scaffolded VSCode extension and still have to do the same for Zed which is my daily driver.

https://github.com/alzo-archi/vscode-sigil-js-highlighting-sample

Frank Dugan also shows in this thread that this is doable with a lot less ceremony in NeoVim : https://elixirforum.com/t/script-tags-in-heex-and-app-js/67992/19

Third Form : writing Elixir code that generates instructions for <language>

I particularly like this one. It is similar to the first one (writing elixir code that generates <language> code) but shifts the responsibility of the instructions interpreter to the target language.

This is what I was doing in those two posts : Canvas from Elixir and What if Liveview had DOM Access ?.

In the canvas example, I composed higher level functions from canvas primitives to render a game screen :

def render(%Game{} = game, options \\ [width: 800, height: 800]) do
  w = options[:width]
  h = options[:height]

  operations = [
    clear_screen(w, h),
    draw_players(game, w, h),
    draw_ball(game, w, h)
  ]

  commands = render_commands(operations, :html5)
end

```elixir
defp clear_screen(w, h) do
    [
      {:fill_style, "black"},
      {:fill_rect, [0, 0, w, h]}
    ]
  end

  defp draw_players(%Game{} = game, w, h) do
    for player <- game.players do
      [
        :begin_path,
        {:line_width, 20},
        {:stroke_style, "red"},
        {:arc, [w / 2, h / 2, Enum.min([w, h]) / 2 * 0.7, player.start_pos, player.end_pos]},
        :stroke,
        :close_path
      ]
    end
  end

The JS side received a list of commands and their arguments :

    const commands = [4, "black", 7, 0, 0, 800, 800, 0, 5, 20, 6, "red", 8, 400.0, 400.0, 280.0, 0, 0.6283185307179586, false, 2, 3, 0, 5, 20, 6, "red", 8, 400.0, 400.0, 280.0, 2.0943951023931957, 2.7227136331111543, false, 2, 3, 0, 5, 20, 6, "red", 8, 400.0, 400.0, 280.0, 4.188790204786391, 4.81710873550435, false, 2, 3, 0, 4, "white", 8, 400.0, 400.0, 10, 0, 6.283185307179586, false, 1, 3];

Those commands were interpreted by a small interpreter mapping an integer to a command and consuming the correct number of arguments :

const run = (context, commands) => {
  let len = commands.length;
  while (len > 0) {
      const item = commands.shift();
      len--;
      switch (item) {
      case 8:
          args = commands.splice(0, 6);
          context.arc(...args);
          len -= 6;
          break;

Since this is tedious to write, the interpreter code itself was generated from Elixir based on a simple mapping :

@call_map %{
    begin_path: {0, :call, 0},
    fill: {1, :call, 0},
    stroke: {2, :call, 0},
    close_path: {3, :call, 0},
    fill_style: {4, :prop},
    line_width: {5, :prop},
    stroke_style: {6, :prop},
    fill_rect: {7, :call, 4},
    arc: {8, :call, 6}
  }

This is the pattern 1 finding its way in pattern 3 !

In the liveview example, I provided an elixir runtime and a JS handler to handle commands over time

(Note : this has been cleaned up and is actually used !) The idea was to be able to query or set DOM properties from Elixir, leveraging sequential execution and batching of calls.

def handle_event("sequential_example", _unsigned_params, socket) do
    alias TestWeb.DOMStuff.Ops
    box_1 = "#some_box"
    box_2 = "#some_other_box"
    {:noreply,
     TestWeb.DOMStuff.seq_exec(
       socket,
       [
          Ops.ignore(box_2),
          Ops.get_bounding_client_rect(box_1, fn s, v -> update(s, :box_dimensions, fn _ -> v end) end),
          Ops.set_height(box_2, fn s -> "#{2 * s.assigns.box_dimensions["width"]}px" end),
          Ops.get_bounding_client_rect(box_2, fn s, v -> update(s, :blue_box_dimensions, fn _ -> v end) end),
          Ops.un_ignore(box_2),
       ],
       fn s -> update(s, :got_everything, fn _ -> true end) end
     )}
  end

Here, the Ops module provides functions over manual writing of common operation tuples but is totally optional :

{:get_bounding_client_rect, [], "#some_other_box",
       fn s, v -> update(s, :blue_box_dimensions, fn _ -> v end) end}

Again, it’s only tuples and lists all the way down, building a tree of calls, until the actual execution. That also means that this pattern is super easy to test since it’s only made of small pure functions building more and more complex expressions.

Managing language runtimes from Elixir

This seems a good idea to me from a developer’s perspective. In Xava(work in progress) I’m making progress on installing a JDK and Maven on-demand from Elixir. I will also take advantage of the jinterface-build repo I opened to provide prebuilt OtpErlang.jar for the last three versions of OTP.

(JInterface is a Java library to make a Java app behave as an Erlang node. It is now considered legacy but stable and should be fine for my Elixir-driven office documents handling instead of doing the same ceremony over HTTP or another protocol.)

I’m emulating sheerlox’s Nodelix for Xava. There’s a bit of plumbing and verification to be sure that a runtime-downloaded language runtime is safe to use, but nothing terrible.

The opposite, which is commonly done in Nerves helper libraries, is to provide a few runtime checks or diagnostic modules that will provide you feedback on how additional components must be installed. This makes a lot of sense for embedded systems.

What’s the point ?

For dynamic systems that behave like modular monoliths, I think the runtime-install of another language runtime is a great solution. It enables lighter app nodes and heavier worker nodes with a single codebase.

But the biggest upside to me is the same you get by switching from, for example, a React SPA + Phoenix API to a LiveView app : defragmentation.

Elixir is amazing at safe orchestration of processes, so leveraging that capacity to bring third-party language solutions in a single coherent codebase is useful. I do not have to switch from an Elixir project to a Node project to a Java project to leverage those languages, and do not have to abandon other ecosystem’s strengths either.

It takes a bit of work, with the downside of having an abstraction layer built and tested, but it is possible to provide either a functional API over a third party language+library, or just an execution layer to embed a third party language in your Elixir code.

What am I doing anyway ?

The “write an interpreter and runtime-everything” pattern seems to appear before my eyes a few times a year. I think that’s an Elixir-caused behavior that I would not have developed if I had stayed in another ecosystem. But fantastic resources exist outside of our Elixir community and it would be a shame to be unable to reach for them.


Previous post : 2024 amateur optics / astronomy updates