Raw access to C structures from OCaml

I read on Paul Graham’s site an article in which Lisp programmer Carl de Marcken of ITA says:

Because we have about 2 gigs of static data we need rapid access to, we use C++ code to memory-map huge files containing pointerless C structs (of flights, fares, etc), and then access these from Common Lisp using foreign data accesses. A struct field access compiles into two or three instructions, so there’s not really any performance. penalty for accessing C rather than Lisp objects.

This has an application to some features I want to add to OCI*ML, so I whipped up a quick proof-of-concept in OCaml and C (with a few suggestions on SO, none of which I followed in the end!). First the C code:

#include <stdio.h>
#include <string.h>
#include <caml/mlvalues.h>
#include <caml/alloc.h>
#include <caml/memory.h>
#include <caml/custom.h>

typedef struct {
  void* ptr;
} c_heap_t;

#define C_heap_val(v) (*((c_heap_t*) Data_custom_val(v)))

/* callback function to free memory, called by the OCaml GC */
void c_free_heap_t(value ch) {
  CAMLparam1(ch);
  c_heap_t x = C_heap_val(ch);
  free(x.ptr);
  CAMLreturn0;
}

/* associate callback with datatype */
static struct custom_operations c_heap_t_custom_ops = {
  "c_heap_t_custom_ops", &c_free_heap_t, NULL, NULL, NULL, NULL};  

/* test struct */
typedef struct {
  int x;
  int y;
  int z;
} triple_t;

/* allocate a block of b bytes and wrap it in a c_heap_t struct */
value c_alloc_heap_memory(value bytes) {
  CAMLparam1(bytes);
  int b = Int_val(bytes);

  c_heap_t c = {NULL};
  c.ptr = malloc(b);

  value v = caml_alloc_custom(&c_heap_t_custom_ops, sizeof(c_heap_t), 0, 1);
  C_heap_val(v) = c;
  
  CAMLreturn(v);
}

/* get the native size of an int */
value c_get_size_of_int(value unit) {
  CAMLparam1(unit);

  CAMLreturn(Val_int(sizeof(int)));
}

/* write an int at offset bytes from cht.ptr */
value c_write_at_offset(value cht, value offset, value intdata) {
  CAMLparam3(cht, offset, intdata);
  c_heap_t c = C_heap_val(cht);
  int o = Int_val(offset);
  int i = Int_val(intdata);
  int* pi = &i;
  
  memcpy(c.ptr + o, pi, sizeof(int));

  CAMLreturn(Val_unit);
}

/* read an int at offset bytes from cht.ptr */
value c_read_at_offset(value cht, value offset) {
  CAMLparam2(cht, offset);
  c_heap_t c = C_heap_val(cht);
  int o = Int_val(offset);
  int* pi = malloc(sizeof(int));
  
  memcpy(&pi, c.ptr + o, sizeof(int));
  CAMLreturn(Val_int(pi));
}

value c_dump_test_struct(value cht) {
  CAMLparam1(cht);
  c_heap_t c = C_heap_val(cht);
  
  triple_t t;
  memcpy(&t, c.ptr, sizeof(triple_t));
  printf("C:\tx=%d, y=%d, z=%d\n", t.x, t.y, t.z);

  CAMLreturn(Val_unit);
}

Pay attention to the struct at lines 26-31 – my strategy is to write “helper” routines in C to get and set data at an offset from the start of a heap-allocated chunk of memory, which I will hold a pointer to on the OCaml side (in practice, this would probably be shared memory); that structure is the only thing that I should need to “know” to write a new application, so that with the helpers in place, any new code can be pure OCaml.

open Printf

type c_ptr

(* allocate n bytes of memory and return a pointer *)
external alloc_heap_memory: int -> c_ptr = "c_alloc_heap_memory"

(* find out what C thinks the size of an int is *)
external get_size_of_int: unit -> int = "c_get_size_of_int"

(* write an int into a preallocated block of memory *)
external write_at_offset: c_ptr -> int -> int -> unit = "c_write_at_offset"

(* test that the values arrive as expected *)
external dump_test_struct: c_ptr -> unit = "c_dump_test_struct"

(* dismantle the struct again *)
external read_at_offset: c_ptr -> int -> int = "c_read_at_offset"
  
(* write values 3, 5 and 7 to C *)
let () = 
  let soi = get_size_of_int () in
  let cht = alloc_heap_memory (3 * soi) in
    write_at_offset cht (0 * soi) 3;
    write_at_offset cht (1 * soi) 5;
    write_at_offset cht (2 * soi) 7;
    dump_test_struct cht;

    (* now read them back again *)
    let x = read_at_offset cht (0 * soi) in
    let y = read_at_offset cht (1 * soi) in
    let z = read_at_offset cht (2 * soi) in
    printf "OCaml:\tx=%d, y=%d, z=%d\n" x y z

So knowing only that I have a structure of three int values in C, I construct and dissect it by finding the size of each one in bytes and then stepping through it in OCaml:

$ ocamlc -c c_side.c
$ ocamlc -custom -o test ml_side.ml c_side.o
$ ./test 
C:	    x=3, y=5, z=7
OCaml:	x=3, y=5, z=7

Trivial perhaps, but it validates the approach, and will serve as a foundation to build on with more interesting data structures.

5 Responses to Raw access to C structures from OCaml

Matías Giovannini says:

May 18, 2011 at 01:37

Essentially you’re writing your marshaling functions in C, except that there they are simply unaligned reads and writes. I personally wouldn’t be OK with this loss of portability (I am a long-time developer on 68k and PowerPC architectures), but if you are aware of the trade-off, it is a good solution, I think.

- Gaius says:
  
  May 18, 2011 at 05:46
  
  Hmm, I have tried to be fairly clean; I get the native size of an int on the fly, then rely on Val_int() and Int_val() to do the actual conversion. I don’t have access to any PPC hardware unfortunately, if you have a moment would you mind running it and letting me know what it says? Does the PPC pad to 8-byte boundaries perhaps? Thanks!
  
  - Matías Giovannini says:
    
    May 18, 2011 at 11:40
    
    No need to test, really, since *_at_offset can take arbitrary offsets. On an architecture without unaligned reads this will give a bus error inside OCaml. As I said, if you’re only going to run this on x86, there’s no need to worry.
    
    - Gaius says:
      
      May 18, 2011 at 13:25
      
      Cool. I’ll see if I can find a SPARC to try it on too, but we’ve sadly abandoned Solaris wholesale here to go with Red Hat. Cheers!
      
Yappan Mojita says:

July 26, 2017 at 17:41

I have a C++ struct and a pointer to it. Can you tell me if I can pass that pointer to OCaml module so that the OCaml module can read the objects written in the object by C++.