Scope in database code (2)

Following on from a previous post, here is some simple database code in hopefully idiomatic Haskell. My strategy is

Use the Reader monad to pass the database connection around the functions that use it. The description, computations which read values from a shared environment seems ideal, and could be extended to carry a record containing any configuration data only known at runtime. Using this approach for a database connection tho’ has a subtle and interesting property: even tho’ the variable that represents the database connection is read-only, the connection itself is not! So a function can perform INSERT on the database, but it can’t (for example) point the connection at a different database and “fool” the next function. Tho’ I can’t offhand think of a case in which I might want to do that.
Use the Writer monad to return the result set to the calling function. This is described as computations which produce a stream of data in addition to the computed values. This is effectively an append-only list; a function can neither access nor modify results returned from earlier functions (i.e. in SQL terms a UNION). Again I can’t think of a case in which I’ve necessarily needed to do that, but it’s a nice property to guarantee. If I did want to do either of those things I could of course just use the State monad (computations which maintain state) to carry around a record containing everything, but there is separation of concerns to consider.

Here is the code. We start as ever with the imports. Haskell allows forward declarations (unlike OCaml) so the main function is here too.

module Main where

import Control.Monad.Reader
import Control.Monad.Writer
import Database.HDBC
import Database.HDBC.Sqlite3

main:: IO ()
main = handleSqlError $ do
  lda <- connectSqlite3 "test.db"     
  rs <- runDbApp lda $ 
           do deleteAllRows
              getRows -- returns nothing
              doInsert ["Huw", "Pugh", "Barney McGrew"]
              getDbTime
              getRows
              
  mapM_ putStrLn rs

The main function is almost spookily clean; all the plumbing of passing the database connection created on line 10 and getting back the result set on line 11 has been completed hidden inside the monads. The do block executes all the functions within in sequence. Desugaring that would let us write the main body of the application on a single line just as cleanly:

rs <- runDbApp lda $ deleteAllRows >> getRows >> doInsert ["Huw", "Pugh", "Barney McGrew"] >> getDbTime >> getRows

There is definitely something to be said for this style, at the top level of a program anyway, it allows very, very simple composition of “building blocks” to make an application that executes as a pipeline, building up a final result as it goes. Next is the function that sets up the nested Reader and Writer.

-- set up a monadic context for database-using functions
runDbApp::(Monad m) => r -> ReaderT r (WriterT w m) a -> m w
runDbApp lda f = 
  do (a, rs) <- runWriterT (runReaderT f lda)
     return rs

This function uses monad transformers, nesting the Reader within the Writer to create a new combined monad with properties of both in which function f will execute (f being the entire do block above, composed) . The database connection lda is the cargo of the Reader, but it could have easily been a complex record containing, for example, parsed command line arguments. The Writer actually returns a pair, the final result of a computation, and its “log” of the events that occurred. As it is this log that we are interested in, a is discarded.

                                 
getDbTime::MonadIO m => ReaderT Connection (WriterT [String] m) ()
getDbTime = do
  lda <- ask -- get the database handle from the Reader
  rs <- liftIO $ quickQuery lda "select datetime ('now')" []

  let rs' = map rowToString rs
  tell rs' -- store the results in the Writer (one row of [Char])

This is where things start to get a bit more complicated. To get the time, a one row result set, from the database, the connection is requested from the Reader monad with ask, and as the query involves I/O this is “lifted” into the context of the IO monad to run, then the result extracted back into the current context with ←. Finally tell invokes the Writer monad. It can be seen that this is more complex than say the OCaml equivalent, lda is not simply a locally bound variable passed in as an argument to the function.

getRows::ReaderT Connection (WriterT [String] IO) ()
getRows = do
  rs <- (\lda -> liftIO $ do sth <- prepare lda "select c1 from t1"
                             execute sth []
                             fetchAllRows sth) =<< ask

  tell (map rowToString rs)

This function returns multiple rows. Here I am trying a slighly different structure in an attempt to get some of the elegance of main into the actual functions that do the work. This is complicated by the fact that I don’t think HDBC was designed to be used monadically; functions that are typically have their argument that would be a monadic value last, e.g. randomR was obviously meant to be used with the State monad, as its state is the last argument it is very easy to curry it and get the monad to apply it in pointfree style.

 
doInsert xs = do
  ask >>=
    (\lda -> liftIO $ prepare lda "insert into t1 values (?)" >>=
             (flip executeMany $ map stringToRow xs) >> commit lda)

Here is another attempt. prepare expects the stateful thing, the connection, as its first parameter of two and returns a statement handle. Similarly executeMany expects that statement handle as its first parameter, and commit wants the output of the step prior to the step immediately before it. Some monkeying around with a λ function and flip gets everything into the right order for >>= (take the result of one function and pipe it to the next, like the Unix shell’s |) and preserves lda by name to be used twice.

dbfunc f b c = (\a -> liftIO $ f a b c)
deleteAllRows = do
  ask >>= dbfunc run "delete from t1" []

This is as elegant as I can get it, abstracting out the reordering and lifting to a reusable helper function. But – I happen to know because I wrote this program that I have a COMMIT in the near future. In real code, if this was meant to be transactional in and of itself, I would have to keep hold of my connection somehow so I can use it for commit lda, either with ask, or by making the commit part of dbfunc, which would limit me to single-statement transactions (hardly better than setting AUTOCOMMIT ON). Finally a couple of helper functions to handle casts between datatypes:

  
-- cast a single-column result set from [SqlValue] to String
rowToString [x] = (fromSql x)::String
-- cast a single value into a [SqlValue] (a row)
stringToRow x = [toSql x]

-- end of file

Running it:

gaius@debian:~/Projects/MonadDb$ ls
MonadDb.hs
gaius@debian:~/Projects/MonadDb$ echo "create table t1 (c1 varchar2(32));"|sqlite3 test.db
gaius@debian:~/Projects/MonadDb$ ghc --make MonadDb.hs -hide-package monads-fd
[1 of 1] Compiling Main             ( MonadDb.hs, MonadDb.o )
Linking MonadDb ...
gaius@debian:~/Projects/MonadDb$ ./MonadDb 
2010-09-19 10:30:09
Huw
Pugh
Barney McGrew
gaius@debian:~/Projects/MonadDb$

Conclusion: this would have been a lot more elegant had the HDBC API been designed with monads in mind up-front, but I suppose it was a tradeoff between that and making it radically dissimilar in convention to every other database API (e.g. the convention of lda or sth being the first parameter is the same in OCaml’s Occi module, and in Oratcl and that has made it easier for me to learn).

The monadic style could be said to simply shift complexity away from the main function and into the worker functions. That’s not necessarily a bad thing; it depends who your audience is. If main will usually be written by people who would otherwise write a “one-off” shell script and the function library is maintained as a project in its own right, then that’s perfectly OK, an excellent use of this technology, even. If the same person/people will be writing an entire application this way then the benefits are less clear – perhaps they will become clearer at vast scale? A “large” program for me, for the kind of work I do, would be a thousand new lines of Python + various libraries (some of which I have written, and sometimes discover unexpected behaviors when “reusing”). The more can be put into thoroughly engineered libraries the more leverage Haskell gives in this scenario.

Thanks to Daniel Fischer from the Haskell list for help with this

7 Responses to Scope in database code (2)

Gaius says:

September 19, 2010 at 15:38

Thinking about it, I should use a record in the Reader monad regardless if the Connection is the only thing in it, and use the accessor function on the value I ← from ask – that makes the code much easier to reuse.

Also I wonder how this can be applied to interface, e.g. curses. Updating an interface is conceptually the same as updating a database, it’s just an external thing with an existing state to send deltas to.

Lakshmi Narasimhan says:

September 21, 2010 at 04:39

“as the query involves I/O this is “lifted” into the context of the IO monad to run, then the result extracted back into the current context with ←.”

Shouldn’t it be the case that liftIO lifts the action into the current monad context?
Brent Yorgey says
“I think of liftIO as lifting its argument *from* the IO monad (at the
bottom of the stack) into a monad higher up the stack.

liftIO :: (MonadIO m) => IO a -> m a”

- Gaius says:
  
  September 21, 2010 at 06:20
  
  In this case – paraphrasing Daniel Fischer, the IO context is MonadIO.
  
Lakshmi Narasimhan says:

September 21, 2010 at 04:40

The observation from Brent was in reply to your post in haskell beginners list

- Gaius says:
  
  September 22, 2010 at 06:38
  
  Ah, sorry, only just saw that!
  
Pingback: LDAP (1) / Scope in database code (3) | So I decided to take my work back underground
Pingback: How to “debug” Haskell with printfs? – tuatphukien.com