Multi Domain OCaml on Raspberry Pi Pico 2 Microcontroller
Mark Elvers
2 min read

Categories

  • ocaml,pico

Tags

  • tunbury.org

Running OCaml 5 with multicore support on bare-metal Raspberry Pi Pico 2 W (RP2350, ARM Cortex-M33).

The OCaml Arm32 backend, which I updated to OCaml 5 Domains, generates ARMv7-A code (Application profile), but the Pico 2’s Cortex-M33 is ARMv8-M (Microcontroller profile). These instruction sets are compatible (both using Thumb-2), but the object file metadata differs. The linker will not mix “A” and “M” profiles.

error: hello.o: conflicting architecture profiles A/M

Initially, I worked with the existing Arm32 support, compiling to assembly files from OCaml and then patching them with sed and reassembling with arm-none-eabi-as to get a Cortex-M compatible object file.

sed -e 's/.arch[[:space:]]*armv7-a/.arch armv8-m.main/' \
    -e 's/.fpu[[:space:]]*softvfp/.fpu fpv5-sp-d16/' \
    hello.s.orig > hello.s

After a while, I decided to add a new architecture to the ARM backend to avoid the external processing. The Cortex-M33 has a single-precision only FPU. OCaml’s float type is double-precision (64-bit), so the hardware FPU cannot accelerate OCaml floats. The default Pico SDK linker script copies some code to RAM for faster execution, including the soft FPU. I have used a custom linker script to put everything in flash to maximise the memory available for the OCaml heap.

Creating a minimal runtime was relatively simple. OCaml’s calling convention puts the function pointer in r7 and calls caml_c_call. My function calls blx r7 to invoke the actual C function. OCaml expects r8, r10, r11 to hold runtime state, so these are initialised with minimal structures.

  • r8 - trap_ptr (exception handler)
  • r10 - alloc_ptr (allocation pointer)
  • r11 - domain_state_ptr (runtime state)

Thus, creating a simple program using OCaml syntax was now possible. It was also possible to have recursive functions to calculate a factorial; however, there was no garbage collector, no exception handling, no standard library and no multicore/domain support.

external pico_print : string -> unit = "pico_print"

let () = pico_print "Hello from OCaml!"

This limited success, though, was enough to inspire me to push on to the second phase. I added per-core thread-local storage and provided a mapping between pthread and Pico SDK primitives. The Pico SDK does not provide condition variables, so I implemented a simple polling solution.

OCaml’s Domain.spawn calls pthread_create(), which now calls multicore_launch_core1_with_stack() from the Pico SDK. OCaml creates a backup thread which handles stop-the-world GC synchronisation when a domain’s main thread is blocked. On the Pico, I fake the creation of the backup thread by only creating a thread on every other call to pthread_create(). Since there is no backup thread, during pthread_cond_wait(), pthread_mutex_lock, even in _write, I poll the status of the STW interrupt flag to simulate what the backup thread would do on a real OS.

All of Stdlib compiles, but I only initialise 25 modules, which don’t have extensive OS dependencies.

  • CamlinternalFormatBasics, Stdlib, Either, Sys, Obj, Type
  • Atomic, CamlinternalLazy, Lazy, Seq, Option, Pair, Result
  • Bool, Char, Uchar, List, Int, Array, Bytes, String, Unit
  • Mutex, Condition, Domain

The curry functions are generated at link time by the OCaml linker. I am using Pico SDK linker, arm-none-eabi-ld and therefore the curry functions are not generated automatically. The workaround was to create a dummy OCaml file that uses enough partial applications to force the generation of caml_curry2-8, then extract them to assembly, curry.s, and add that to libstdlib_pico.a for linking.

As a test, I used the prime number benchmark I used for the original Arm32 work to count the number of prime numbers less than 1 million and compared the single-core and dual-core performance.

Test Time Primes
Single-core 21,166 ms 78,498
Dual-core 12,350 ms 78,498
Speedup 1.71x  

The code for this project is available in mtelvers/pico_ocaml and mtelvers/ocaml.