How Haskell helped me write a correct algorithm I didn’t understand

Here’s a toy problem with a goofy setup: at a certain company, everyone tries to avoid responsibility, so whenever anyone gets any kind of request, they just pass it along.  Each person has a fixed list of people to whom they delegate all their requests; each request goes to everyone on the list.  Of course, some people are so junior that they have no delegates, so they have to do all the work.  The question is whether, given a particular such setup, there are some requests that simply fall into an endless cycle.  That is, whether there is a person A who forwards to (among others) B, who forwards to C, …, and eventually back to A.

This problem has all sorts of serious applications, but those don’t matter here.  What matters is an algorithm to solve it.  My purpose in this post is to present such an algorithm in Haskell which is correct, and for the right reasons, but for which I didn’t know what those reasons were at first.  Nonetheless, by using standard Haskell typeclasses, I was led to write code that expressed those reasons.

I’ll begin by laying out the types that express the problem.

import Data.Map (Map)

type Delegate = String

data DelegateError = DelegateMissing Delegate | DelegationCycle Delegate

checkDelegations :: Map Delegate [Delegate] -> Either DelegateError ()

This is simple enough.  Now, the natural language version of the algorithm I want to use is to “slurp” the delegations map like a straw, lifting the lists of delegations back along the chains of delegation, checking as I go that no list passes anyone on that list.  To make this a technical reality, I observe that at any time in this process, a particular delegate can be in any one of three states:

  • In possession of a nonempty list of other delegates (the normal state)
  • In an error state: either known to be part of a cycle or known to have requested a delegate who for some reason isn’t in the map
  • In possession of an empty list (that is, done)

Although the first and third states need not be separate, and indeed in the definition below they overlap, I will point out soon why it’s important. Here are the types that encapsulate this overview:

{-# LANGUAGE DeriveFunctor #-}
data SlurpState a = Normal a | Fail DelegationError | Done deriving (Functor)

slurpOnce :: Map Delegate (SlurpState [Delegate]) -> Map Delegate (SlurpState [Delegate])

Why do I parametrize SlurpState and make it a Functor?  I’ll explain that also.

To use this in checkDelegations, I’ll need a way of deciding the overall state of a Map Delegate (SlurpState a); that is, I want a function of type Map Delegate (SlurpState a) -> SlurpState a.  That’s obviously a fold of some kind, so I need a combining operation SlurpState a -> SlurpState a -> SlurpState a.  This suggests that I ought to make SlurpState a a monoid.  Well, it suggests a semigroup, for which this operation is the only requirement, but making it an instance of Monoid means that I can use the foldMap function from Data.Foldable, since Map itself is an instance of Foldable.  Here, then, is the instance I want:

import Data.Monoid

instance (Monoid a) => Monoid (SlurpState a) where
  mappend x@(Fail _) _ = x
  mappend _ x@(Fail _) = x
  mappend (Normal x) (Normal y) = Normal (x <> y)
  mappend Done x = x
  mappend x Done = x
  mempty = Done

Now, why is this the instance I want?  Because you can never undo an error, and you’re not done unless everyone is done, and otherwise combining two lists of delegates is just putting them together.  I’m making a bit of a space/time trade here by not taking the nub of this concatenation, so there may be duplicates, but that’s fine: as long as both x and y are subsets it’s correct.

I can now provide a definition for checkDelegations:

import Data.Foldable
import qualified Data.Map as Map

checkDelegations dMap =
  case foldMap (const [] <$>) dMap' of
    Done -> Right ()
    Normal _ -> checkDelegations dMap'
    Fail e -> Left e
  where dMap' = slurpOnce dMap''
        dMap'' = map Normal dMap

As you can see, I have employed the Functor instance of SlurpState.  The reason for this is that when folding, I really don’t care about the list of delegates (as you can see from the pattern matching); I only care about the high-level state. Since I don’t care about it, it is inefficient to compute it, so I just zero it out, leaving only the constructor.  This is why I need the Done state: it could be replaced by Normal [] as far as Monoid is concerned, but then I wouldn’t be able to distinguish it from any other normal state in this code, leaving me only the inefficient option.  I suppose the alternative would be to map to some other monoid with only two values, but there isn’t an obvious one available.

Perhaps a more Haskellish reason is simply that using Normal [] to mean completion is not well-typed, and my type should be more finely granular to reflect what’s going on in the algorithm.

Here, at last, is the real work, the definition of slurpOnce:

slurpOnce dMap = Map.mapWithKey shiftState dMap
  where
     shiftState delegate (Normal delegates) = checkCycles $ foldMap id $ map getDelegate delegates
       where
         getDelegate d = Map.findWithDefault (Fail $ DelegateMissing d) d dMap
         checkCycles x@(Normal delegates) =
           if elem delegate delegates
           then Fail $ DelegationCycle delegate
           else x
        checkCycles x = x
     shiftState _ x = x

Aside from the parts that actually concern the validity of the delegates map, the interesting feature here is another, perhaps unexpected appearance of foldMap.  It is necessary because dMap here contains not [Delegate] but SlurpState [Delegate], which is only sometimes a [Delegate].  However, its monoidal append does exactly what we want; in fact, it is here that we use its case that actually appends lists.

Here is where I realized I didn’t know why the algorithm terminates.  I mean, I check for Done in checkDelegations, but I don’t write Done anywhere in slurpState, and the map is initialized to all Normal, so where does it come from?  I’ll leave you a moment to think about this.

The answer is that Done == foldMap id [],  so whenever one of the delegates has nothing, its state moves to Done.  This is, of course, exactly what Done means, and it agrees with my earlier comment that we could replace Done with Normal [] if we weren’t deriving Functor.  This is in hindsight, of course; I didn’t design the type this way.  In fact, the only reason this works at all is that Monoid must have an mempty, and that mempty is the monoidal sum of the empty list.  That mempty = Done is natural from the semantics of the equivalence between Done and Normal []; that is, even though I wasn’t thinking of this equivalence, I was thinking of the meaning of mappend, which reflects it.

So that’s my story of how Haskell had my back.  Even though the algorithm is inherently procedural (it doesn’t use laziness, nor real recursion, nor does it pass around anything but data types), because Haskell so nicely abstracts procedural operations like loops as folds with various typeclass constraints, I was led to ask what typeclass described what I wanted to do.  That class required a neutral value that the fold handled properly, so simply the fact that I’d framed the problem the right way meant that I couldn’t help but write the right code.

Advertisements
This entry was posted in Haskell, Programming. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s