Dictionary format strings

There is quite a nice feature of Python that allows dictionaries to be applied to format strings, rather than just relying on the position of each conversion specification. To give a trivial example:

>>> print "Hello %s my name is %s." % ("Gaius", "Python")
Hello Gaius my name is Python.


>>> print "Hello %(name)s my name is %(lang)s." % {"name":"Gaius", "lang":"Python"}
Hello Gaius my name is Python.

I use this technique all the time for building complex strings from templates, e.g. generating configuration files, or composing email from a program, where it would be unmanageable to rely on just position, especially when the body is very likely to change. I couldn’t see this facility in OCaml, so I quickly knocked something up (based on a couple of functions from earlier):

(* Dict printf - a la Python. Only Str for now. *)

module type DFPRINTF =
	val dsprintf: string -> (string * string) list -> string
	val dfprintf: out_channel -> string -> (string * string) list -> unit
	val dprintf : string -> (string * string) list -> (string * string) list -> unit

(* Find all occurences of the % character in a string and return their indexes *)
let findq s = 
	let rec findq' s offset acc =
			let pos = String.index_from s offset '%' in
				findq' s (pos+1) (pos::acc)
			Not_found -> List.rev acc in
	findq' s 0 []

(* replace characters from position x to position y in string a with string b *)
let splice a x y b =
	let before = String.sub a 0 x in
	let after = String.sub a y ((String.length a) - y) in
		before ^ b ^ after

(* substitute from dict h into string s *)
let dsprintf s h =
	let q = findq s in
	let s' = ref "" in
		s' := String.copy s;
		for i = (List.length q) - 1 downto 0 do
			let ob = (List.nth q i) + 1 in (* the open bracket immediately following the % *)
				match s.[ob] with
					let cb = String.index_from s ob ')' in 
					let k = String.sub s (ob + 1) (cb - ob - 1) in (* dict key between ( and ) *)
					if s.[cb + 1] == 's' then
						s' := splice !s' (ob -1) (cb + 2)  (List.assoc k h) (* seems to be no way to use a format string at runtime *)
					else ();
				|_ 	-> ();
				Not_found -> () (* not really a format/key not found *) 
				| Invalid_argument e -> () (* string ends "%(...)" *)

let dfprintf c s h = output_string c (dsprintf s h)
let dprintf s h = dfprintf stdout (dsprintf s h)

(* End of file *)
# dsprintf "Hello %(name)s my name is %(lang)s." [("name", "Gaius"); ("lang", "OCaml")];;
- : string = "Hello Gaius my name is OCaml."

The most obvious problem with this is, it only handles strings, and even then, does not permit any of the flags for formatting those strings, e.g. justification. That seems to be a non trivial problem. In Python again:

>>> x="%s"
>>> print x % "hello"

However back in OCaml:

# open Printf;;
# printf "%s" "hello";;
hello- : unit = ()
# let x = "%s";;
val x : string = "%s"
# printf x "hello";;
Error: This expression has type string but an expression was expected of type
         ('a -> 'b, out_channel, unit) format =
           ('a -> 'b, out_channel, unit, unit, unit, unit) format6
# printf (format_of_string x) "hello";;
Error: This expression has type string but an expression was expected of type
         ('a, 'b, 'c, 'd, 'e, 'f) format6

There doesn’t seem to be a way I can tell for getting functions in the Printf module to get a format string from a value – only from a string literal that is known at compile time, otherwise I could use the characters between the ) and the s (or d or whatever – not that a mixed type assoc list is allowed) and just pass them to sprintf on line 39. Hmm. Annoying, but not a showstopper – just means I need to cast everything I want to the appropriate string (using literals with sprintf if necessary) when building the association list. So my immediate string-constructing need is met, but it feels like bit of a hack.


About Gaius

Jus' a good ol' boy, never meanin' no harm
This entry was posted in Ocaml, Python. Bookmark the permalink.

15 Responses to Dictionary format strings

  1. Benedikt Grundmann says:

    The standard library has this function in the Buffer module, which you could use straightforwardly:

    val add_substitute : t -> (string -> string) -> string -> unit

    add_substitute b f s appends the string pattern s at the end of the buffer b with substitution. The substitution process looks for variables into the pattern and substitutes each variable name by its value, as obtained by applying the mapping f to the variable name. Inside the string pattern, a variable name immediately follows a non-escaped $ character and is one of the following:

    * a non empty sequence of alphanumeric or _ characters,
    * an arbitrary sequence of characters enclosed by a pair of matching parentheses or curly brackets. An escaped $ character is a $ that immediately follows a backslash character; it then stands for a plain $. Raise Not_found if the closing character of a parenthesized variable cannot be found.

    • Gaius says:

      Very interesting – I hadn’t spotted that, but I don’t think it does what I want either – there doesn’t seem to be a way to get format characters into it, e.g.:

      # sprintf "Hello %15s" "Gaius";;
      - : string = "Hello Gaius"

      But it is definitely more concise!

      # open Buffer;;
      # let b = create 10;;
      val b : Buffer.t =
      # add_substitute b (fun x -> List.assoc x [("name", "Gaius")]) "Hello $(name)";;
      - : unit = ()
      # contents b;;
      - : string = "Hello Gaius"

      Thanks for the tip! It seemed too obvious a feature to actually be missing :-)

    • Gaius says:

      That hasn’t formatted correctly, it should look like:

      # sprintf "Hello %15s" "Gaius";;
      - : string = "Hello           Gaius"
  2. Yoric says:

    You should take a look at Batteries’ extended printf. It features extensible tags, so you might be able to pull out exactly what you want.

    • Gaius says:

      Hmm, that’s odd. I have OCaml 3.11.2, FindLib and oUnit, just installed Camomile and Batteries isn’t happy:

      root@debian:~/batteries-1.2.2# make
      test ! -e src/batteries_config.ml || rm src/batteries_config.ml
      ocamlbuild syntax.otarget byte.otarget src/batteries_help.cmo META shared.otarget
      Finished, 1 target (0 cached) in 00:00:00.
      + ocamlfind ocamldep -package camomile,num,str -package camlp4.lib -pp camlp4of -pp camlp4of -modules libs/estring/pa_estring.mli > libs/estring/pa_estring.mli.depends
      sh: camlp4of: command not found
      Preprocessing error on file libs/estring/pa_estring.mli
      Command exited with code 2.
      Compilation unsuccessful after building 1 target (0 cached) in 00:00:00.
      make: *** [all] Error 10

      I’ll see if apt-get upgrade ocaml sorts it…

      Thanks for the tip :-)

      • There is the package ocaml-batteries-included in Debian Squeeze/Sid. Better you use it, rather than recompiling.

        To fix your specific problem (if you still want to compile), install camlp4-extra.

  3. To add to the former comment by Benedikt

    # let buf = Buffer.create 13;;
    val buf : Buffer.t = abstr
    # Buffer.add_substitute buf
    (fun str -> List.assoc str \["name", "Gaius"; "lang", "Python"\])
    "Hello $(name) my name is $(lang).";;
    - : unit = ()
    # Buffer.contents buf;;
    - : string = "Hello Gaius my name is Python."

    • Gaius says:

      Except this time my name is OCaml :-)

      Hey, I was just about to code my own wrapper for df -Ph then I noticed you’d already written statvfs in ExtUnix. Merci beaucoup!

  4. asmanur says:

    To read a printf-format from a string, you can use the Scanf module with the { fmt %} tag :

  5. rich says:

    Another way to do it is with camlp4 macros. They can be type-safe too.

    You could have a look at how we did SQL-safe type-safe string interpolation in PG’OCaml. There is also a camlp4 extension which does something similar to what you want, although the name of it escapes me right now …

  6. ChriS says:

    # let x : (_,_,_,_) format4 = "%s";;
    val x : (string -> 'a, 'b, 'c, 'a) format4 = 
    # printf x "hello";;
    hello- : unit = ()

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s