I’m Derek. This is my blog.

Guaranteed code

June 3rd, 2010 · No Comments

I was thinking lately about the different philosophies programming languages expose when it comes to guaranteeing code semantics. It’s interesting to compare two of my favorite languages - Ruby and Clojure - and a third I like, but don’t know well enough to love: Haskell, since each lies at a different point along the imperative to purely functional spectrum, and each can give you a different level of confidence in your code’s execution.

In particular, I’m referring to the level of confidence a language can impart that when you call some method (or evaluate some function) in a particular language, your code will work the same afterward. This is directly related to side effects, and how much you have to “know” about the function’s implementation in order to use it. I’ll colloquially refer to this as “trustworthiness”.

At one extreme, a language might not especially concern itself with ensuring trustworthiness. If whoever wrote the method you’re calling Did The Right Thing, the method is trustworthy. Ruby is an example of a language like this: fully dynamic (”duck typing”) and extensible at runtime (”monkey patching”). I can do normal mutable stuff and touch storage locations that other code might care about in Ruby, but I can also write a method which affects the semantics of other methods; for example, when you’re inside Rails, the String class is enhanced with a bunch of utility functions. Here is an extremely contrived example:

def foo(a)
  a.respond_to?(:bar) ? a.bar : nil
end

def baz(a)
  a.class.instance_eval do
    define_method(:bar) { "barbarbar" }
  end
  nil
end

foo calls its argument’s bar method, if such a method exists; baz takes an object and extends its class with a bar method. Then, this series of calls evaluates like:

foo("abc") # => nil
baz("abc") # => nil
foo("abc") # => "barbarbar"

As you can see, the method foo isn’t idempotent. It’s not possible to write a Ruby method that you can guarantee to be idempotent unless it evaluates to a literal, because everything else is a method call on an object, and the behavior of existing classes and object instances can be modified at runtime. So, for example, even:

1 + 2

Isn’t guaranteed to be 3, because + is a method call, equivalent to 1.+(2). Of course, no one ever worries about this in real life, because you would be evil if your Ruby library changed the definition of the + method on Fixnum. But there is nothing to prevent it. So, Ruby is very versatile in its flexibility; as a trade-off, it is difficult or impossible to guarantee that your code will exhibit some set of particular semantics at runtime.

At the other extreme is something like Haskell, which is purely functional and strongly typed. I’m new at Haskell (though have some experience with other pure FP, typed languages like standard ML), so this next section might have some syntax errors. Anyway. Values are immutable, so you cannot change state, only write code that produces new values. Every expression has a type. Side effects are hidden in monads, which can be made one-way by the type system. In Haskell, producing a side effect like writing output goes to the IO monad, which “sucks” code into a world where the possibility of side effects happening is made clear by the type system. For example, putStrLn looks like this:

putStrLn :: String -> IO ()

It takes a String and returns an IO (), which you can think of like a queued IO action. When the action is eventually performed, it prints the given string out followed by a newline.
You don’t get information about types inside the IO (), so you can’t pull information out of it. The only way to “evaluate” the effect and have something actually print is to use the monadic binding function, whose type looks like this for the IO () values putStrLn produces:

String -> (() -> IO b) -> IO b

So you provide the string to write, but then you must provide a function whose type is () -> IO b. In other words, to perform the side effect, you have to produce code that returns a value wrapped “inside” the IO monad type. For example, check out this function, times2, which takes one argument (an Integer for this example, although without the type signature Haskell will derive this to a typeclass function on anything under Num a), and produces a new integer, the argument times 2.

times2 :: Integer -> Integer
times2 i = let x = (putStrLn "a line")
               y = (putStrLn "another line")
               z = 2 in
           i * z

Does calling this function cause those strings to be printed? Nope. Haskell’s evaluation is lazy, and the things we are binding x and y to in the let block are expressions. That means they will only be evaluated if they are actually needed. We return i * z, and so x and y don’t make it out of the function. Nothing can ever get to them, so they can never be evaluated and no side effects will happen. In fact, because Haskell is strongly typed, any values that could get out would have to be a part of the function’s type signature, so without even looking at the implementation, you could guarantee that no side effect would happen:

times2 :: Integer -> Integer

See? There are no IO’s in there. So no values that could possibly produce IO side effects can come out of this function. Additionally, there is no function in the type system that lets you “get out” of the IO monad; in other words, any function that takes a value of type IO must ultimately produce another value of type IO. It’s like a roach motel for side effecting execution: code goes in, but it doesn’t get out. So, if there were any deep, hidden code in our times2 function that made the IO happen, IO would have to percolate out into the type signature of times2.

That means that I can use any arbitrary Haskell function, and if IO isn’t somewhere in the type signature, I am guaranteed no IO effect can happen. There is no well-typed statement that would allow it. Code that doesn’t involve the IO type must willingly pass a “side effects ahoy” line to a domain that allows side effects, by “getting into” the IO monad and picking up the type. So code that doesn’t involve it can have guaranteed semantics, no side effects allowed, no changing of state. The tradeoff is that you have to adhere to strict rules.

Clojure is somewhere closer to the middle, at least in spirit, if not in strictness. The core Clojure data structures are immutable (as in their values are in Java final variables), so if you stick within the domain of those data structures and the functions that operate on them, you can have code that’s idempotent. Evaluation is eager, but much of the standard library supports lazy evaluation by producing sequences, which are like cons cells whose cars are thunked. There is no strict type system to force you to be visible about side effects or warn callers about things you might do. So most Clojure code is written in such a way that the side effecting parts are hidden in private namespace functions, and the public interface is functional. There are well-defined ways to have statefulness in Clojure, so you can write code that must be explicit about its side effect-ness:

(let [a (atom 0)]
  (defn post-inc! [n]
    (let [curr @a]
      (do
         (swap! a #(+ % n))
         curr))))

(post-inc! 1) ;; => 0
(post-inc! 1) ;; => 1
(post-inc! 1) ;; => 2

Here, post-inc! is obviously not idempotent, but we had to explicitly create a reference type and use a subset of functions in our code restricted to that type. So, if we stick within Clojure’s core types and semantics, we can have areas of guarantee. For Clojure code operating with other classes on the JVM, say a Java API, though, all bets are off, since we can’t guarantee anything about the side-effecting nature of external JVM code.

As far as laziness, you can opt-in to it, and code generally shouldn’t have to care if it operates at the sequence level of abstraction.

(defn int-range [a b]
   (when (< a b) (cons a (int-range (inc a) b))))

(defn lazy-int-range [a b]
   (when (< a b) (lazy-seq (cons a (lazy-int-range (inc a) b)))))

(int-range 0 5) ;; => (0 1 2 3 4), realized in memory
(lazy-int-range 0 5) ;; => a lazy sequence of  (0 1 2 3 4), realized when consumed

Which allows you to write expressions like (lazy-int-range 0 100000) with causing a stack overflow, but doesn’t require sequence consuming code to know whether the sequence is lazy or not to do what it needs to logically. So you are able to opt-in to many interesting language features like laziness when solving a problem; the tradeoffs are that there are more rules than Ruby in the optional parts, and that because the strict parts are optional, you don’t get the semantic guarantees of Haskell.

In a future post, I hope to get into the “related behavior” bit, discussing the ways that the 3 languages approach it: Ruby with its object-orientation, where data and behavior are packed up in classes and instances; Haskell’s typeclasses and instance derivation, where behavior is specified in a named unit, a typeclass, and concrete implementations give types membership in that typeclass), and Clojure’s new protocols/deftype feature (behavior is specified in a named unit, the protocol, and concrete “types” that implement it are made through deftype‘d factories), which seems spiritually very similar to Haskell’s typeclasses. As always, thanks for reading!

Tags: Programming

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment