I’m Derek. This is my blog.

Clojure, with a Project Euler example

May 12th, 2010 · No Comments

Most of this post will be about Clojure, a LISP dialect that is implemented on top of the JVM. The Clojure community is quickly growing; every day you’ll find new blog posts describing people’s experiences with it. The Clojure mailing list is filled with helpful posters and great advice; the Clojure IRC channel is, similarly, a great resource. It’s no surprise that this is the case, since Clojure as a language comes with some very compelling features. In no particular order, I’ll enumerate some of the things I’m really enjoying about Clojure. Then, I’ll discuss a solution to Euler problem #24 in Clojure, since examples are always fun.

Java is not my favorite language to write. I find it to be verbose to the point of lowering productivity; I feel as if I write more metainformation to the compiler about what I am doing (specific types, access and visibility modifiers everywhere, checked exception declarations) than actually implementing the logic to solve a problem. I understand the point of view that Java’s explicitness and compile-time type checking produce better code, I just think it is wrong. There are many legitimate reasons to dislike the Java language’s design choices. I won’t detail them here; you’ve either heard them all before, or you can find many examples with a simple Google search.

I am, however, very impressed with the JVM as a platform. It’s stable, well supported, and well understood; whoever is in charge of deploying your software is probably comfortable with deploying and administering a JVM instance. Optimization techniques in the JVM continue to get better and better, particularly with some vendor implementations (such as HotSpot with Sun). You can get heavily optimized code that is very fast without a large amount of effort. As a digression, watching recent developments in optimizing bytecode is very cool (see clang/llvm, the jvm/hotspot, the .net clr, etc) and I’m excited to see how things progress in the future with those projects.

The takeaway here is that the JVM is a great platform. And now that there are several languages written to run on the JVM that are not Java, you have some great choices as a developer who maybe doesn’t so much enjoy writing Java. You can choose a JVM bytecode-compiled language that best fits your problem space (Groovy, Scala, Clojure, etc.), and immediately be able to leverage the huge number of libraries out there written for the JVM, including the Java standard library. And get nice, fast code while you’re at it. Great! All we need now is a specific bytecode instruction for dynamic method dispatch on the JVM, and life is sweet - so it’s a good thing that such an instruction is planned for the Java 7 specification.

  1. Clojure is a LISP…
    It displays homoiconicity, which means (roughly) that code and data are represented in the same way in Clojure (as s-expressions). This has deeper implications that I can cover in a blog post, but, for example, if you have written code to do a particular kind of list manipulation, it can act equally on actual lists of data, or a list that represents an s-expr to be evaluated. Clojure fully supports macros, which allow you to modify the s-expr tree as it passes through the reader. It has a minimal syntax that elegantly models computation, especially if you view computation through the lens of the lambda calculus. In other words, it’s a LISP!
  2. …minus some historical baggage

    Clojure sports a selection of core data structures that offer particular semantics and performance guarantees; the core structures are implemented efficiently and their design is well-considered. When applicable, functions exist to act on various abstractions as a structure (for example, there are functions that operate on anything that looks “sequential”, and most of the core data structures provide a way they can be viewed sequentially). From my perspective, Clojure very much wants you to “use the right tool for the job”, and provides several high quality tools in its toolbox to enable you to do so.
  3. It’s functional, with immutable data types.

    What a pleasure it is to be able to view problems through a few different lenses, and how tiring it is to always view a problem through the OO lens. There are many arguments for (and against) functional programming; if, like me, you are a fan, you will be happy to learn that Clojure data structures are immutable (let’s omit discussion of the new “transients” feature for now). You do not (and in fact, if you are using one of the built-in Clojure datatypes, cannot) modify an existing value. “Changing a value” means “creating a new value with my changes”, implemented efficiently through structural sharing. This means a lot of good things, one of which is that…
  4. Clojure does concurrency the right way.

    Writing concurrent software is hard, especially if you model your concurrency around controlling access to a central mutable storage location. Unfortunately a lot of concurrent software is modelled this way. Most of the difficulty in this method comes in controlling access to the central mutable storage. Code written in this style is usually about acquiring locks, doing stuff, and releasing locks, and constantly balancing lock acquisition with its tradeoffs (acquiring a lock is usually an expensive operation, so you can’t spray lock acquisition everywhere; if you hold a lock, you are solely responsible for preventing deadlocks and making sure you release it at the proper time, etc). By contrast, having immutable data types remove a lot of this complexity: threads can’t modify the same value, because modifying values is verboten. I’ll just give you a value I want you to do some work with, and I never have to worry about you modifying some state that might affect my execution, because you can’t modify any values in my thread: they are immutable! See Erlang’s message-passing concurrency model for an example of this type of approach. Like datatypes, concurrency in Clojure is exposed by a handful of discrete concurrency operators offering particular semantics and performance, each is well-supported, and each is well thought out. Check out the ref documentation on the Clojure page for one example.
  5. It does many other things right.

    Lazy evaluation, robust interaction with the host platform, and a comprehensive core library are just some examples of this point. You may not like some of Clojure’s design choices, but if you watch some of Rich Hickey’s screencasts, you can’t argue that the design isn’t a carefully considered one.

Enough talking; let’s write some code. Euler problem #24’s problem statement basically says:

Given the digits 0 through N, and all permutations of those digits, arrange the permutations lexicographically, and return the nth permutation, for N=9 and n=1000000.

Like many Euler problems, there are many ways to solve this one, but I’m feeling lazy - literally. My solution will be to construct a lazy data structure that enumerates all the permutations lexicographically, and then write code that realizes the nth permutation. Easy, right?

So let’s write a method that produces this lazy data structure. Let’s suppose we’re given a set of numbers that are ordered lexicographically (0, 1, 2… N). Here’s an inductive method we could use to enumerate every permutation of those elements in lexicographic order:

  1. Take the first element of the set; call it H. The remaining elements of the set (still in order) are T.
  2. Recursively generate all permutations (in lexicographic order) of T.
  3. Take each permutation of T and prepend H to it: the resultant list is the ordered permutations of the input set that begin with H.
  4. Repeat with the second element of the set as the next H and T to be the set with the second element removed, then with H as the third element, and so on until the set is exhausted.
  5. The sequence of lists produced in this way represent every permutation of the original set in lexicographic order.
  6. As a base case, the empty list has only one permutation: itself (the identity permutation).

That’s a lot of pseudocode, but easy to express in a couple lines of Clojure:

(defn permute [coll]
   (when-let [s (seq coll)]
         (map (fn [x] (cons x (permute (remove #(= x %) s)))) s)))

This works for my solution because map and remove produce lazy sequences in Clojure. Laziness is pervasive in most Clojure sequence operators; operators that realize sequences generally tell you explicitly that they will do so. In other words, the result of applying permute to a collection will be a bunch of queued function application, and the evaluation will only happen when you actually reference a particular item (and then only as much as needed to realize that item).

The method above will return a sequence of two-element (cons) cells; the head (car) of each cell is an item in the permutation, and the tail (cdr) of each cell is another (ordered) sequence of all the permutations that start with the head element. It’s essentially a tree modeled as cons cells. Finally, the when-let handles our base case: when the sequence is empty, when-let will evaluate to nil.

Let’s assume there’s a factorial method in clojure. It’s trivial to write one of your own. We’ll use this to count how many permutations are in a particular subsequence, since the number of permutations of N elements is N!. Since each cell in the sequence that permute returns is the head of all subsequences that start with that element, as we move through the sequence and skip over a cell, we’ll want to know how many permutations we passed over. This will let us navigate to a particular permutation, like permutation number 1000000.

Next we’ll write the method to seek to the particular permutation we want in this data structure; let’s call it seek. This is a helper function, so we’ll declare it with defn-, so it doesn’t pollute our namespace (if we compiled this code to a class, it would become a private method). Here it is:

(defn- seek
  ([s parts i acc]
   (if (empty? s) acc
     (let [partsize (factorial (dec parts))
           permutation (nth s (quot i partsize))
           data (first permutation)
           remainder (rest permutation)]
       (recur remainder (dec parts) (mod i partsize) (conj acc data)))))
  ([s parts i] (seek s parts i [])))

Most of the logic of the problem is here. There are a few things to note. One is using arity overloading to make the default for acc the empty vector ([]), if one isn’t passed in; this seems to be idiomatic Clojure, as far as I can tell. You may also notice that the seek function is written in fake tail-recursive style (passing an accumulator and using recur), since the JVM doesn’t natively support tail-call optimization, this is syntactic sugar Clojure gives you to simulate it, by making iteration look less iterate-y.

So, this function takes a sequence of permutations in the format returned by permute, the number of elements the permutations represent (called parts), the element you want to seek to (i), and an accumulator; the permutation found at position i will be appended to the accumulator.

If the sequence is exhausted, we can’t iterate any further, so we return the accumulator. If not, the method finds which element of the top-level sequence the ith permutation lives under, remembering that every element we skip over really skips over parts! permutations that lie underneath it. So, we identify how many top-level elements of the sequence we want to skip over and do so: (nth s (quot i partsize)). We identify, within this subsequence, the index of the permutation we want to find, which is (mod i partsize), in other words, if you have 100 elements divided into 5 buckets of 20 items each, and you want element 43, you want element 3 (43 mod 20) of bucket 2 (43 quot 20). Then we recur with the subsequence to find that element.

Finally, we tie it all together to answer the problem:

(defn ith-permutation [n i]
  (let [coll (range (inc n))]
    (seek (permute coll) (count coll) (dec i))))

We generate the numbers from 0 to 9, inclusive, create all of the permutations of those numbers with permute, then seek to the ith element with seek. It’s not even that slow for the parameters of the euler problem (microbenchmarking caveats ahoy).

user=> (time (ith-permutation 9 1000000))
"Elapsed time: 4.103 msecs"
-answer omitted-

Not bad, especially since this method was our first pass and we haven’t done anything to make it more optimal. This blog post became a lot of text, but think about what we did once we started coding:

  • We expressed the problem in a simple way: finding the right element in a structure of all possible elements.
  • We then naively generated that data structure in code. We didn’t eat a bunch of CPU and memory because the data structure was generated lazily.
  • We expressed searching through this data structure inductively, in terms of finding the right sub-structure to look in, then recursively looking in it.
  • Our code solves the problem and is reasonably fast.

There are certainly more succinct ways to solve this problem, and like many Euler problems, doing a little math on paper can help you discover methods that are faster; we essentially brute-forced it, but we didn’t have to incur too much pain to do the brute-forcing, thanks to Clojure’s lazy sequences.

Whew! That was a long blog post! If you’re still here, thanks for reading! You can find the code for this post at a gist here. In addition, if you were intrigued by Clojure, you’ll find good links and excellent documentation at the home page. I also really enjoyed watching some of the slide presentations at the blip.tv clojure site; in particular I recommend “Clojure for LISP programmers” (or “Clojure for Java Programmers” if that’s your thing).

I hope to blog more about Clojure soon.

Tags: Programming

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment