Learning Clojure: November 2009

Saturday, November 28, 2009

classpath

These expressions keep coming in handy, for finding out what clojure thinks the classpath is today:


(println (seq (.getURLs (java.lang.ClassLoader/getSystemClassLoader))))

(pprint (map (memfn getPath) (seq (.getURLs (java.lang.ClassLoader/getSystemClassLoader)))))

(clojure.contrib.pprint/pprint (sort (map (memfn getPath) (seq (.getURLs (java.lang.ClassLoader/getSystemClassLoader))))))

Thanks Pablo! ( http://pupeno.com/blog/printing-the-class-path-in-clojure/ )

Namespaces and Unit Tests

OK, if I'm going to be doing this for more than toy programs I need to learn how to use the namespace and unit testing facilities

Here I'm typing at the repl, or more accurately I have this file open in emacs and I'm passing the expressions one by one to the repl.



(ns namespaces-and-tests
  (:use clojure.test))

here's a factorial, with some tests included



(with-test

    (defn factorial [n]
      (if (< n 3) n
          (* n (factorial (dec n)))))

  (is ( = (factorial 1)     1) "base case")
  (is ( = (factorial 4)     (* 1 2 3 4)) "first recursion")
  (is ( = (factorial 10)    (apply * (range 1 10))) "general case" ))

I can run my tests like so:


(run-tests)

or like


(run-tests 'namespaces-and-tests)

I can also define a separate factorial testing function, perhaps even in a different namespace



(ns separate-tests
  (:use clojure.test))

(use 'namespaces-and-tests);;however this breaks, whining something about classpaths and non-existent files
(ns-publics 'namespaces-and-tests) ;;even though we can still see the namespace and its function

(namespaces-and-tests/factorial 4) ;;and indeed it still works!

It appears that require defines libraries by filename, whilst refer talks about namespaces. use is a combination of the two.



(refer 'namespaces-and-tests) ;;try this instead
(factorial 4)      ;;aha!

So here is a more complex test for the factorial


(deftest factorial-tests
  (testing "factorial"
    (testing "base cases"
      (is (= (factorial 1) 1))
      (is (= (factorial 2) 2))
      (is (= (factorial 0) 0)))
    (testing "general case"
      (is ( = (factorial 10) (apply * (range 1 11))))
      (is ( = (factorial 10000) (apply * (range 1 10001))) "tail recursion"))
    (testing "bad arguments"
      (is (thrown? java.lang.StackOverflowError (factorial -1)))
      (is (thrown? java.lang.StackOverflowError (factorial 0.5))))))

which we can run like this:


(run-tests 'separate-tests)

or like this


(run-tests)

we can run both at once with


(run-tests 'namespaces-and-tests 'separate-tests)

or we can use


(run-all-tests)

which claims to be testing everything in the clojure libraries, and in swank, which I'm using to connect clojure to emacs, but doesn't actually seem to do any testing on them. Bug perhaps?

Tuesday, November 24, 2009

Mutable State in the Sudoku Solver

One never really understands poetry until one has attempted to translate it.

Dr. Norvig's beautiful Sudoku Solver is very much in that line.

I got myself into all sorts of knots translating the Sudoku Solver, basically because of the newbie mistake of not realising that Clojure's for is lazy.

Mutability and laziness never go well together.

I'm also scared to play with it because it's actually easy to break it in such a way that it occasionally crashes, or takes very sub-optimal routes which nevertheless find correct solutions after long times.

Those bugs are easy to spot, but I wonder if there might be subtle differences between it and Dr Norvig's version, and I'm not quite sure how to find out, even though many approaches occur.

I could replace the occasional uses of for and any? with explicitly coded loop/recur constructions, which will allow me to make things like the order of evaluation and early return on failure explicit.

But it's still quite hard to reason about the algorithm. Every function call can set off a cascade of new calls, and even with everything explicit it's difficult to work out whether a certain elimination will have been done at the point that a certain check is called.

I have a different plan, which is to explicitly keep a list of pending tasks, so that a call to assign might return both a updated grid and a list of eliminations to be done on it, or an elimination might return a list of more eliminations and checks, or a check might return new assignations to be done.

This of course will mean that we don't actually need to mutate the grid, we can just return a new copy from every function, which will bring in clojure's purely functional data structures.

I expect that to bring in a constant time slow-down of about 4, but maybe I'll be pleasantly surprised.

More importantly, it will allow me to watch the algorithm as it runs.

It already occurs to me that it is silly to run a check when it might be affected by a later elimination that affects the thing checked.

Perhaps it will be possible to reorder the list of pending tasks so that eliminations are always done as soon as they can be, and checks postponed until all eliminations have been done.

I don't know whether that will be more efficient or not, and the thing is so fast anyway that it hardly matters, but it looks like it will be fun to try.

I will report back if there's anything interesting down this road.

Sudoku Solver

;;Sudoku solver

;;As direct a translation as I could make of Peter Norvig's famous python solver
;;Which is explained in detail at:
;;http://norvig.com/sudoku.html

;;Algorithm is constraint propagation coupled with depth-first search

;;Constraint propagation is performed by mutually recursive functions modifying state
;;So in clojure we need to put our strings in atoms.


;;I split the eliminate function into two (eliminate! and check!) to make it easier to read.

(defn cross [A, B]
  (for [a A b B] (str a b)))

(def rows "ABCDEFGHI")
(def cols "123456789")
(def digits "123456789")

;;the grid is divided into subsquares
(def subsquaresize 3)
(def rowgroups (partition subsquaresize rows))
(def colgroups (partition subsquaresize cols))


;;When we encode the grids as strings we may use any of these characters to encode blank squares
(def separators "0.-")

;;Squares are indexed by strings A1 -> I9
(def squares (cross rows cols))


;;units are the groups into which squares are grouped: rows, columns and subsquares
(def unitlist (map set  (concat 
                         (for [c cols] (cross rows [c]))
                         (for [r rows] (cross [r] cols))
                         (for [rs rowgroups
                               cs colgroups] (cross rs cs)))))


;;helper functions for making maps and sets
(defn dict [x] (apply sorted-map (apply concat x)))
(defn set-union [x] (apply sorted-set (apply concat x)))

;;use clojure's every? like python's all
(defn all? [coll] (every? identity coll))


;;which units are associated with a given square?
(def units (dict (for [s squares]  
                   [s (for [u unitlist :when (u s)] u)] )))


;;which other squares are linked to a given square through its units?
(def peers (dict (for [s squares]  
                   [s (disj (set-union (units s)) s)])))


;;three mutually recursive functions to propagate constraints. All of them return false 
;;if the constraints can not be satisfied.
(declare assign! eliminate! check!)


;;filter only the significant characters from an input string
(defn strip-grid [grid] (filter (set (concat digits separators)) grid))

;;make a grid where every square can contain every digit
(defn make-grid [] (dict (for [s squares] [s,(atom digits)])))

;;turn a string representing a grid into a dictionary of possible values for each square
(defn parse_grid [grid]
  (let [grid (strip-grid grid)
        values (make-grid)]
    (if (all? (for [[square digit] (zipmap squares grid) :when ((set digits) digit)]
                  (assign! values square digit)))
      values
      false)))


;;assign a definite value to a square by eliminating all other values.    
(defn assign! [values square digit]
  (if (all? (for [d @(values square) :when (not (= d digit))] 
              (eliminate! values square d)))
    values
    false))


;;remove a potential choice from a square. If that leaves no values, then that's a fail
;;if it leaves only one value then we can also eliminate that value from its peers.
;;either way, perform checks to see whether we've left the eliminated value with only one place to go.           
(defn eliminate! [values s d]
  (if (not ((set @(values s)) d)) values ;;if it's already not there nothing to do

      (do
        (swap! (values s) #(. % replace (str d) "")) ;;remove it
        (if (= 0 (count @(values s))) ;;no possibilities left

          false                       ;;fail
          (if (= 1 (count @(values s))) ;; one possibility left
            (let [d2 (first @(values s))]
              (if (not (all? (for [s2 (peers s)] (eliminate! values s2 d2))))
                false
                (check! values s d)))
            (check! values s d))))))


;;check whether the elimination of a value from a square has caused contradiction or further assignment
;;possibilities
(defn check! [values s d]
  (loop [u (units s)] ;;for each row, column, and block associated with square s
    (let [dplaces (for [s (first u) :when ((set @(values s)) d)] s)] ;;how many possible placings of d 

      (if (= (count dplaces) 0) ;;if none then we've failed
        false
        (if (= (count dplaces) 1) ;;if only one, then that has to be the answer

          (if (not (assign! values (first dplaces) d)) ;;so we can assign it.
            false
            (if (not (empty? (rest u))) (recur (rest u)) values))
          (if (not (empty? (rest u))) (recur (rest u)) values))))))


;;the function to print out the board is the hardest thing to translate from python to clojure!
(defn centre[s width]
  (let [pad (- width (count s))
        lpad (int (/ pad 2))
        rpad (- pad lpad)]
  (str (apply str (repeat lpad " ")) s (apply str (repeat  rpad " ")))))

(defn join [char seq]
  (apply str (interpose char seq)))

(defmacro forjoin [sep [var seq] body]
  `(join ~sep (for [~var ~seq] ~body)))

(defn board [values]
  (if (= values false)
    "no solution"

    (let [ width (+ 2 (apply max (for [s squares] (count @(values s)))))
          line (str \newline 
                    (join \+ (repeat subsquaresize 
                                     (join \- (repeat subsquaresize 
                                                      (apply str (repeat width "-"))))))
                    \newline)]
      (forjoin line [rg rowgroups]
               (forjoin "\n" [r rg]
                        (forjoin "|" [cg colgroups]
                                 (forjoin " " [c cg] 
                                          (centre @(values (str r c)) width))))))))

(defn print_board [values] (println (board values)))


;;We can't use Dr Norvig's trick of avoiding a deep copy by using strings. We have to copy the table
;;by recreating the atoms and copying their contents
(defn deepcopy [values] (dict (for [k (keys values)] [k (atom @(values k))])))

;;I've added a frill here where the search function keeps track of the search branches that it's following.

;;This means that we can print the branches out when debugging.
(defn search 
  ([values] (search values ""))
  ([values, recurse] 
     (println "recursion: " recurse)
     (if values
       (if (all? (for [s squares] (= 1 (count @(values s))))) ;;if all squares determined

         values                                               ;;triumph!
         (let [ pivot 
               (second (first (sort     ;;which square has fewest choices?
                               (for [s squares :when (>(count @(values s)) 1)] 
                                 [(count @(values s)),s]))))] 
           (let [results (for [d @(values pivot)] ;;try all choices

                           (do ;(print_board values)
                               (search (assign! (deepcopy values) pivot d) (str recurse d))))] ;(format "%s->%s;" pivot d)
                (some identity results)))) ;;and if any of them come back solved, return solution

         
       false)))


;;here's a demo:
(def hardestsudokuinworld "
850002400
720000009
004000000
000107002
305000900
040000000
000080070
017000000
000036040
")

(defn solve [grid]
     (do

       (println "\nproblem:")
       (println (join \newline (map #(apply str %) (partition 9 (filter (set (concat digits separators)) grid)))))
       (println "\nsolution:")
       (print_board (search (parse_grid grid)))))

(solve hardestsudokuinworld)


;;Dr Norvig provides a couple of files of easy and difficult sudokus for demonstration purposes.
;;Here is some code to read them in and solve them

(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.duck-streams)

(def easy-sudokus (re-split #"\s*Grid\s.*\s*" (slurp "sudoku.txt")))
(def hard-sudokus (read-lines "sudoku_hard.txt"))

(defn show-off []
  (solve hardestsudokuinworld)
  (doall (map solve easy-sudokus))
  (doall (map solve hard-sudokus)))


;; Lessons learned during translation process

;; Lazy evaluation and mutation really don't work together very well.

;; Solver appeared to work but seemed to take infinite time on 3rd sudoku
;; Actually it took several hundred thousand iterations, but got the right answer
;; run next to python program showed that python code was getting there in a couple of hundred
;; Realised that constraints were not being propagated properly
;; Added doalls to every for
;; Now program crashes because last values have been eliminated without returning false

;; Actually we need loops with early return, otherwise we keep eliminating things from already false branches
;; Now notice that the doalls are actually making things slower because any? would have short-circuited once anything was false. Get rid of them and get a 2x speedup.
;; now running at half the speed of python

Wednesday, November 18, 2009

The Sieve of Eratosthenes

We start with no primes, and a list of candidate integers.


#{} #{2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20}

The lowest of the candidates is a prime. Add it to the list of primes and remove all multiples of it from the candidates


#{2} #{3 5 7 9 11 13 15 17 19}

and repeat


#{2 3} #{5 7 11 13 17 19}

and repeat.


#{2 3 5} #{7 11 13 17 19}

Nothing interesting happens after here, because 5*5 is larger than 19, so we'd already have eliminated anything that we're going to eliminate from now on, but let's not get too clever, and follow the recursion to its natural end.


#{2 3 5 7} #{11 13 17 19}
#{2 3 5 7 11} #{13 17 19}
#{2 3 5 7 11 13} #{17 19}
#{2 3 5 7 11 13 17} #{19}
#{2 3 5 7 11 13 17 19} #{}

And we're done. No more candidates to sieve


#{2 3 5 7 11 13 17 19} #{}
#{2 3 5 7 11 13 17 19} #{}

so nothing happens at all from now on.

How to model this recursion with a function? We can use the clojure set library to do the striking off.


(use 'clojure.set)

Here's a function which takes one row from above, and produces the next:


(defn sievefn [[primes, candidates]]
  (if (empty? candidates) [primes, candidates]
      (let [prime (first candidates)             ;;the first candidate is always a prime
            end (inc (apply max candidates))     ;;we want to strike out all multiples
            multiples (range prime end prime)    ;;up to max(candidates)
            newprimes (conj primes prime)
            newcandidates (clojure.set/difference candidates multiples)]
        [ newprimes, newcandidates])))

Let's try it:


(def to20 [(sorted-set) #{2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20}])
(sievefn to20)
(sievefn (sievefn to20))

Here are the first ten iterations:


(take 10 (iterate sievefn to20))

Of course, this is all a bit long-winded. Let us instead define sieve like this, mixing the iteration in with the transforming function:


(defn sieverecur [primes candset]
  (if (empty? candset) primes
      (let [prime (first candset)
            end (inc (apply max candset))]
        (recur (conj primes prime), (clojure.set/difference candset (range prime end prime))))))

(sieverecur (sorted-set) (apply sorted-set (range 2 100)))

We shouldn't have to recalculate the end value every time, though, so we can pull that bit out:


(defn sieverecur1 
  ([primes candset]
     (sieverecur1 primes candset (inc (apply max candset))))
  ([primes candset end]
     (if (empty? candset) primes
         (let [prime (first candset)]
           (recur (conj primes prime)
                  (clojure.set/difference candset (range prime end prime))
                  end)))))

(sieverecur1 (sorted-set) (apply sorted-set (range 2 100)))

Finally, we can incorporate our initial conditions, and also the optimization that we noticed above, that nothing happens once the largest prime is above the square root of the range.


(defn sieve
  ([n]
     (sieve (sorted-set) (apply sorted-set (range 2 (inc n))) (+ n 2)))
  ([primes candset end]
     (let [prime (first candset)]
       (if ( > (* prime prime) end)
         (clojure.set/union primes candset)
         (recur (conj primes prime)
                (clojure.set/difference candset (range prime end prime))
                end)))))

That's about it for the Sieve of Eratosthenes over a finite list.

What about producing an infinite list of primes?

A similar algorithm on paper might look like:

Start with a counter set to 2. Two is prime, so we can add it to a list of primes The first multiple of 2 is 2, so we write [[multiple, prime]] counter


[[2,2]] 2

Now up the counter to 3.


[[2,2]] 3

The multiple of 2 is lower than 3, so add 2 to get the next multiple of 2, 4


[[4,2]] 3

Now our multiple is higher, so we know 3 is also a prime, and we add it to the list The first multiple of 3 is 3


[[3,3][4,2]] 3

Three is already on the list, so up the counter to 4.


[[3,3][4,2]] 4

4 is larger than our lowest prime multiple, so increase the multiple, by adding its prime.


[[4,2][6,3]] 4

We will keep the list of primes and multiples sorted in order of the multiples, so that we always know which one to look at next. We now have the primes 2 and 3, and their multiples 4 and 6 4 is equal to the lowest multiple. So discard it and increase the counter to 5


[[4,2][6,3]] 5

The lowest multiple is 4, less than 5, so we add its prime 2+4=6


[[6,2][6,3]] 5

5 is lower than any multiple, so it is a prime too. And so on........ the iteration goes:


[[5,5][6,2][6,3]] 5
[[5,5][6,2][6,3]] 6
[[6,2][6,3][10,5]] 6
[[6,2][6,3][10,5]] 7
[[6,3][8,2][10,5]] 7
[[8,2][9,3][10,5]] 7
[[7,7][8,2][9,3][10,5]] 7
[[7,7][8,2][9,3][10,5]] 8
[[8,2][9,3][10,5][14,7]] 8
[[8,2][9,3][10,5][14,7]] 9
[[9,3][10,2][10,5][14,7]] 9
[[9,3][10,2][10,5][14,7]] 10

and as the counter increases, the primes accumulate on the left hand side.

We can construct a function which performs this iteration, too:


(defn infinite-sieve-fn [[testset int]]
  (let [pair (first testset)
        [multiple prime] pair]
    (cond (= int multiple) (list testset (inc int))
          (> int multiple) (list (conj (disj testset pair) [(+ multiple prime) prime]) int)
          (< int multiple) (list (conj testset [int int]) int))))

and iterate it infinitely:


(def sieves (iterate infinite-sieve-fn [(sorted-set [2,2]) 3]))

note the use of a sorted set, so that when we add in new pairs, the lowest will be the first element

Here are the first five iterations:


(take 5 sieves)

How to extract the primes? Consider the 20th iteration


(nth sieves 20)


'(#{[10 2] [10 5] [12 3] [14 7]} 11)

So we'd like to extract the second elements of the first element


(map second (first (nth sieves 200)))

and it might be better if we sort them


(sort (map second (first (nth sieves 200))))

again we can construct an infinite list, derived from the first one


(def sieveprimes (map (fn[x] (sort (map second (first x)))) sieves))

What primes have we got after 1000 iterations?


(nth sieveprimes 1000)

after 10000 iterations, what are the last ten primes we found?


(take 10 (reverse (nth sieveprimes 10000)))

If all we want is a list of primes, it's silly to construct an entire list of iterations. Just like above, we can fold the iteration into the function, but this time we need to decide when to stop iterating.


(defn infinite-sieve-recur [testset int stop]
  (if (> int stop) testset
      (let [pair (first testset)
            [multiple prime] pair]
        (cond (= int multiple) (recur testset (inc int) stop)
              (> int multiple) (recur (conj (disj testset pair) [(+ multiple prime) prime]) int stop)
              (< int multiple) (recur (conj testset [int int]) int stop)))))

here's the test set when the counter has got to ten.


(infinite-sieve-recur (sorted-set [2,2]) 2 10)

now we can ask for all the primes up to 100.


(map second (infinite-sieve-recur (sorted-set [2,2]) 2 100))

we've lost memoization by abandoning the infinite sequence, but we also don't need to keep all that intermediate data in memory for ever. This alone has given us a speed up of a factor of 100.

Here are the last ten primes before 10000


(take 10 (reverse (sort (map second (infinite-sieve-recur (sorted-set [2,2]) 2 10000)))))

Again, there's no point in testing numbers for factors over their square root, so we can optimise that by setting the first test multiple of a prime to be its square. Note that we now need to remember to up the counter at the same time!



[[4,2]] 2
[[4,2]] 3
[[4,2][9,3]] 4
[[4,2][9,3]] 5
[[6,2][9,3][25,5]] 5
[[6,2][9,3][25,5]] 6
[[8,2][9,3][25,5]] 7
[[8,2][9,3][25,5][49,7]] 8
[[8,2][9,3][25,5][49,7]] 9
[[9,3][10,2][25,5][49,7]] 9
[[9,3][10,2][25,5][49,7]] 10
[[9,3][10,2][25,5][49,7]] 11

And again, we may as well fold in our initial conditions to make a tidy function


(defn infinite-sieve 
  ([n] (sort (map second (infinite-sieve (sorted-set [2,2]) 2 n))))
  ([testset int stop]
  (if (> int stop) testset
      (let [pair (first testset)
            [multiple prime] pair]
        (cond (= int multiple) (recur testset (inc int) stop)
              (> int multiple) (recur (conj (disj testset pair) [(+ multiple prime) prime]) int stop)
              (< int multiple) (recur (conj testset [(* int int) int]) (inc int) stop))))))

(infinite-sieve 100)
(take 10 (reverse (infinite-sieve 10000)))

Now all we have to do is figure out why the thing is so slow! Which I think will be a blog post for another day.

Installing Clojure on Ubuntu 9.10 (Karmic Koala)

I have a nice fresh Ubuntu installation, on which I want to install Clojure, Emacs, and Slime. And I thought I'd document how it went:

There are packages for some things, but for this stuff I'd rather install bleeding edge versions by hand.

Instructions are here
http://riddell.us/tutorial/slime_swank/slime_swank.html

clojure syntax highlighting code

This post contains magic javascript that finds all the clojure code and syntax highlights it.


(defn foo [] 1) ;here's an example

Learning Clojure

About Me

Blog Archive

Search This Blog