Monads (a Computer Programming paradigm)

The concept of monad, used in Haskell, has acquired a reputation for being a bit hard to understand. I think it's pretty simple, if you view it this way:

A "monad" is a single-argument function that returns a function as a result, and it's called a "monad" because it is created and used in a way that enables a chain of two or more such functions to produce the same result as a single "monolithic" multiple-argument function (a "dyadic" or "triadic", etc. function).

We'll wee how and why in the following discussion.

Contents

The Premise

The Wrong Way: Function of a Tuple

A Simpler Sub-Problem

Function Returning a Function

Stop Cheating on the Arithmetic

So What? Or in Other Words, Why?!?

NaN (Not-A-Number) Handling: Code Made Ugly by Constant Detours

Chained Method-Calling in Object-Oriented Languages

The Premise

Let's imagine that for some reason (which we'll get to later) it is desirable to define a computer language that has functions, but that each function is only allowed to take one argument. Apart from that "restriction", the language will be fairly familiar in many ways:

its data types include numbers and lists (or tuples) of numbers,

it includes functions (both pre-defined and user-definable), and functions themselves are a data type,

therefore a function may take any of these data types as a parameter, and may return any of them as its value. In particular, a function may take a function as an argument and return a function as a result.

All of these features should be familiar to programmers of many conventional "imperative" languages such as C/C++. The only unusual feature of our language is that functions may take only one argument. So these functions would be syntactically legal:

f(x) := x² # Square of x

g(f()) := [ f(1.0), f(-1.0) ] # Image of [1.0,-1.0] under f()

r(x) := x^1/3 # Cube root of positive number

s(x) := -((-x)^1/3) # Cube root of negative number

t(x) :=                # Returns one of the cube root functions
         r() if x >= 0
         s() if x < 0

but these would be illegal in the language:

f(a,b) := √(a²+b²) # Pythagorean distance

g(r, θ) := [ θcos(r) , θsin(r) ] # polar to rectangular coordinate conversion

h(r, θ) := θcos(r) # like g(,), gives vertical coordinate only

The question that will lead us to monads is:

How would we implement something like h(r, θ) using function(s) that take only one argument?

The Wrong Way: Function of a Tuple

Given the examples above, it seems clear we could just pass r and θ to the function as a list or tuple:

h'(t) := t₀cos(t₁)

Given this h', we could then get our answer by passing a tuple to h' like so:

h'([r, θ]) # will return θ cos(r)

This is not the intended solution. In particular, observe that the tuple [r, θ] is just a two-argument function in disguise:

tuple(a, b) := [a, b] # Make a and b into a tuple

So the h'([r, θ]) solution is not a solution at all, because the tuple syntax itself breaks the rules.³ I merely mention it here to make the point, that we do not intend to achieve single-argument functions just by placing multiple pieces of information into a vector or some other multi-element data structure. Although that is legal, it is not the path to the innovation of monads. In particular it doesn't lose any complexity: we still have a functions taking different types of parameters, which is just about as bad as having functions that take different numbers of parameters.

We're going to assume that all functions must take a single real number as their argument. How do we do the polar to vertical coordinate function h(r, θ) now?

This seems impossible: there are two dimensions of information, because r and θ enumerate points in a 2-dimensional plane ¹.

A Simpler Sub-Problem

So let's try to solve a simpler problem first. Consider these functions h₁, h₂, and h₃:

h₁(x) := θcos(1)

h₂(x) := θcos(2)

h₃(x) := θcos(3)

These functions are specific instances of the desired h(r, θ), with fixed values of θ.

Function Returning a Function

Suppose we had a way of returning a function as the result of a function? Consider a "magic" function hr() defined like so:

hr(r) :=
         h₁( ),       if r = 1,
         h₂( ),       if r = 2,
         h₃( ),       if r = 3, (etc.)

This function is "magic", because it can return a function h_i() for any i that you might happen to want to give it. For now don't worry about how this is possible² If we had this "magic" function hr, we could be able to get our result h(r, θ) like so:

hr(r) == (hr(r))(θ)

What we're doing here is calling hr(r), which returns a function; this function is one of the class of functions exemplified by h₁(), h₂(), etc.: it takes an argument θ and returns the value of θ cos(i) for a specific i namely i=r. So we can take this function, which was the return value of hr(), and pass θ to it, and voila! we get the desired answer θ cos(r).

Stop Cheating on the Arithmetic

Right at the start I said this language has no two-argument functions, but we can't get very far without adding 2 + 2. So of course, these are implemented by chained one-argument functions too.

The well-known functional language Haskell does this to every symbol in a program. It is difficult for beginners to realize this because Haskell also has the most over-engineered lexical analysis and syntax handling of all time. As proof of this claim, I merely need to show five different and equally valid ways to calculate 2³/5 in native Haskell without any extensions:

Prelude> ((/)(((^)(2))(3)))(5) -- explicit cars and cdrs 1.6 Prelude> ((/ 5) ((^ 3) 2)) -- monadic functions 1.6 Prelude> ((/) ((^) 2 3) 5) -- normal Lisp syntax 1.6 Prelude> 2 ^ 3 / 5 -- colloquial non-bracketed infix 1.6 Prelude> 2^3/5 -- punctuation and whitespace optional 1.6 Prelude>

Except for the first one, "((/)(((^)(2))(3)))(5)", all of the above are a lie. In order for Haskell to actually handle a calculation of 2^3/5, it must first transform it into a chain of single-argument functions that return other single-argument functions as their value. Step by step, we have:

(^)(2) : Call an exponentiation monad on the first argument 2.

((^)(2))(3) : Apply the result of this (which is a monad serving the purpose of a function f(x)=2^x) to the second argument 3. This does not return 8, it returns a constant-valued function whose value is always 8.

In fact, the original "(2)" and "(3)" are also monads embodying constant-valued functions.

(/)(((^)(2))(3)) : Call the division monad giving it the 8-valued "((^)(2))(3)" as is argument. This returns a monad that supplies the function 8/x.

((/)(((^)(2))(3)))(5) : The aforementioned function 8/x is chained to a constant-valued 5() function giving the result, which is still a function, but harbors the answer 1.6 within.

Okay, I'm being a bit sarcastic here, but not much.

An Epiphany of Simplicity

What I've shown above is that if we are allowed to write functions of one variable that return a function of one variable as their value, then we can use multiple functions of this type to "emulate" functions that take two or more arguments. Thus, we don't need multiple-argument functions at all!

Now the question on most readers' minds may well be:

So What? Or in Other Words, Why?!?

Here is an ordinary arithmetic expression:

2×(3+4)-6²/5

In order to compute it, a lot of zigs and zags must be performed. It starts out simply enough: we have a 2. We know what that is. But the next symbol, ×, cannot be handled until we get something else to combine with the 2. It would be nice if we could get rid of one or the other, and keep track of only one thing, but we can't. "2×" is incomplete. It gets worse as we go along. Things get temporarily better when we've finished "2×(3+4)" and have a single 14, but this tranquility does not last long.

remember 2 somewhere remember '{x}' somewhere remember 3 somewhere remember '+' somewhere apply the '+' to 3 and 4 apply the '{x}' to 2 and 7, and remember that somewhere remember '-' somewhere remember 6 somewhere remember '^' (exponent) somewhere apply the '^' to 6 and 2 remember '/' somewhere apply the '/' to 36 and 5 apply the '-' to 14 and 7.2 we're done!

Here is another math example, using single-argument functions:

sin(ln(1/(sqrt(e⁷))))

Here we have five functions: e^x, square root, reciprocal, natural logarithm, and sine. If we read from left-to-right the situation here is just about as bad as the first example, but if we read right-to-left, it becomes very easy: Start with 7, then raise e to that power, then take the square root, ... and so on. At any point there is only one thing to handle.

apply '*e*^{*x*}' to 2 apply 'sqrt' to that apply '1/*x*' to that apply 'ln' to that apply 'sin' to that we're done!

Compare to the above "computer instructions" and note that we didn't need to "remember" anything (except for the transient act of remembering something from one step to the next).

In most languages, the above calculations are handled easily by the built-in features of the language. We can put "2×(3+4)-6²/5" in our programs without needing to worry about the fact that the computer is actually saving things many times during the process of working out that calculation. Considering this, Haskell's act of turning "2×(3+4)-6²/5" into "((-)(((*)(2))(((+)(3))(4))))(((/)(((^)(6))(2)))(5))" seems like just another way of saying the same thing. Certainly we can ignore what's happening under the surface, because what we get is "2×(3+4)-6²/5" and that works fine.

However, this beautiful world of things-nicely-being-done-for-us ends very very soon. If you want to do anything beyond basic arithmetic, and a few other paradigms that are built into ordinary programming languages, you'll soon have to start adding various kinds of "petty bookkeeping" for yourself.

NaN (Not-A-Number) Handling: Code Made Ugly by Constant Detours

An early example of this necessary-but-unwanted descent into complexity must have been dealing with math errors, such as division by zero or taking the square root of a negative number. In serious applications many other errors need to be avoided, like overflow, loss of precision from subtracting two nearly-equal values, and so on. Beginning programmers think they can get away with "result = sqrt(q)+(a-b)×d/f", but in practice this is needed:

if (a nearly equal to b) then error c = a - b if (c and/or d too large) then error e = c * d if (f is zero) then error t = e/f if (t is too large) then error if (q is negative) then error result = sqrt(q) + t # oops, we forgot to check if t is nearly equal to -sqrt(q) ...

The chains-of-monads technique allows us to tackle this complexity: We get to define every function! If we decide that addition needs to check for loss of precision, then we can tell the (+) monad (the function that takes as its argument the first of two things to be added) to return a function that knows that it needs to check for the situation "a almost equal to -b" when it is called. Then, if the error condition happens, instead of returning a constant-valued function as its result, it can return a special "This-function-has-no-result" function. This function in turn simply returns itself regardless of what gets passed to it, and the non-result goes through the whole chain of functions and comes out at the end.

IEEE Arithmetic does in fact have something like this, and its implementation of the arithmetic functions is used by many languages. Therefore, these languages get this automatic-all-the-time-error-checking "for free". Inside the arithmetic functions, it is handled by setting aside a few binary values to represent "Not-a-Number" results (NaNs). The arithmetic functions all conspire with one another to look for these special values, and if any such value is seen, they immediately return another NaN value instead of trying to do the requested arithmetic.

Chained Method-Calling in Object-Oriented Languages

IEEE floating-point's NaN values are sort of like hidden data fields in a data structure. You could imagine that numbers are stored in memory calls that have some space set aside for the number itself, and some more space set aside for error flags. It's almost like that, although IEEE floating point does not reserve any bits exclusively for error flags.

But object-oriented languages make use of this sort of thing all the time. Here is an example in a hypothetical object-oriented language:

mypoly := polygon new addVertex: #(100 100) addVertex: #(150 100) addVertex: #(150 150) addVertex: #(100 150) close color: 'red' draw.

When we get to the 'draw' operation, a lot of information has been built up: a list of four 2-dimensional points, knowledge that it is "closed" (making a square rather than just 3 lines) and that it should be drawn in red. We did not specify the thickness of the lines to draw or the color (if any) to fill the interior of the square, so these will take on default values chosen by the new method. The result of this code, if written in a non-object-oriented language without data structures, might well look something like this:

DrawOpenPolygon({(100,100), (150,100), (150,150), (100,150)}, closed=true, color=red, thickness=1);

Footnotes

1 : We are not going to use the fact proven by Cantor that the number of points in the plan is equal to the number of points on a line, because using this in practice is computationally inefficient, and it turns out to just be another sneaky way to pass two arguments to the function.

2 : Clearly, there are not an infinite number of functions h_i() that have been compiled and linked into your program. As we will see, the computer does not even need to compile and link a new h_i() each time a new value is passed to hr(), but you may think of it this way if you wish.

3 : Yes, I said that our language has tuples, but this does not make it inconsistent. We will recover the tuple notation later, when we see how it can be defined in the form of chained single-argument functions!

This page was written in the "embarrassingly readable" markup language RHTF, and was last updated on 2013 Feb 20.

s.27