Search This Blog

Friday, January 29, 2010

How the Clojure dojo actually worked in practice.

We decided that I'd try out (the first half of) my Clojure dojo on the Chemistry Department.

The audience was the non-lispers in our group, so there were about eight very bright Java guys, one of whom had done some ML before, and one of whom had been through this very exercise with me in a pub some months before as his first contact with LISP since he met McCarthy in the sixties(!).

The dojo format worked very well, and I mainly sat back and let them get on with it.

I didn't say a great deal after posing the problem in the terms outlined at the start. When they completed a subproblem I gave them a new problem and maybe a hint or two. I was surprised how much they managed to infer from limited information.

At first everything went absolutely swimmingly, and the syntax didn't faze them at all. Questions like 'What's the sum of the squares of 3 and 4' were answered without breaking stride. They understood Heron's method immediately. When I suggested that they turned it into a function to improve guesses, they spontaneously put in a target variable instead of hard-coding 10, and started talking about recursion.

They couldn't figure out how to do it, so I suggested trying to write factorial. This they managed to work out for themselves.

There was then much excitement trying to turn that into a solver, but they couldn't do it.

I suggested trying to write good-enough?

They realised the need for an absolute value function, and to my utter amazement, guessed the syntax for calling Math/abs, and then for Math/sqrt!

They declared the problem solved.

So I asked them to generalize the solution to make a general root-finder, which got them back to trying to get Heron's method to work.

This was the last time I said anything helpful. Much more thinking, and they got on the right track. They defined an okguess function which took the tolerance as a parameter, and were well on the way to solving the problem, when the ML guy and the guy who'd done it before took the chairs.

I said they were only allowed a couple of minutes, but they had the thing done almost immediately.

I still wanted to go back and show them iterate and map, so I asked them to make a function that returned a function. Can we make a function that takes a number and returns a function which multiplies things by it?

I hinted that defn was a combination of def and fn, and that was enough

That solved, I asked them to make a function which took a target and gave back the relevant improver.

Then I told them to try (doc iterate), and see if they could figure out what it was for. And then I told them to look up map and take. They were playing happily with infinite sequences when our hour and a half ran out.

Tuesday, December 8, 2009

Understanding the REPL

A REPL is a subtle thing.

Consider what happens when you type in 23

user>23
23

The program reads in the sequence of characters 2, 3, <return>, and prints the sequence of characters 2, 3, <return>

It looks very much as if nothing has happened at all!

But in fact, a great deal has happened.

A REPL is a read-eval-print loop.

Any interaction with it involves reading in a stream of characters, which are passed to the function read. The output from read is passed to the evaluator, eval The output from eval is passed to the print function, and turned into a sequence of characters, which are then rendered on the screen by a mysterious process known as the operating system.

We can forget about the process which reads the keyboard and interprets physical movements as a stream of characters We can also forget about the process which prints the characters on the screen.

Clojure would work in just the same way if it lived on a computer half-a-mile away, and you sent and recieved characters to it by Morse code.

So let us concentrate only on the three processes which take the character stream coming in and return the character stream going out.

Read, Eval, and Print.

One way to understand what is going on is to write your own read, eval, and print functions.

One way to get started on that project is to borrow the read and print functions from an existing lisp, and then use that lisp to write your own eval function, which is both the most interesting bit of the system, and the easiest to construct.

This is a traditional rite of passage for lispers, and I refer you to chapter 4 of the Structure and Interpretation of Computer Programs if you wish to pursue it.

Now most lisps will give us the power to call their read, eval, and print functions, and very useful they are too.

But Clojure is a lisp written in java, and it also gives us the power to create objects in the underlying java system.

And we can use this power to conduct experiments on the three functions.

Let's look a little at the reader first.

The reader reads characters from any stream which is of type java.io.PushbackReader

We can create one of these streams from our REPL!

Remember that clojure is just a java program. We could have created these streams in a java program, but we will use clojure's REPL to manipulate them. We must remember that whenever anything is typed in, it is the reader and the evaluator that interpret them, and the printer that displays the result, but if we are careful we can pick the process apart.

So here's our input stream with the characters 2,3, in it.



(java.io.PushbackReader. (java.io.StringReader. "23\n"))
#<PushbackReader java.io.PushbackReader@1f9f0f2>

Remember, we typed in (...). The reader interpreted that stream of characters, and passed it to the evaluator.

The evaluator, as a result created a java.lang.String, wrapped that in a java.io.StringReader, wrapped that in a java.io.PushbackReader, and then gave that object to the printer, which decided that an appropriate printed representation would be the sequence of characters #,<,P,u,s,...........2,>,

Let's instead assign it to a value



(def inputstream (java.io.PushbackReader. (java.io.StringReader. "23\n")))
#'user/inputstream

Now the evaluator has done much the same thing, but then rather than handing the PushbackReader to the print function, it has attached it to the var #'user/inputstream, noted that the symbol foo should be associated in the user namespace with the var #'user/inputstream, and then passed the var to the printer, which has decided that #,',u,s,....,m is an appropriate sequence of characters with which to represent the var to the user.

We can check that the symbol->var mapping is now in the user namespace:



(ns-publics 'user)
{inputstream #'user/inputstream}

This is a side-effect of evaluating a def form. In fact the return value of the expression is not particularly interesting. It is the side-effect that we are after.

Since we've only just started our REPL, the user namespace is empty apart from the association that the evaluator has just made.

Now the question in which we are interested is: What does the reader produce when it is given a PushbackReader which yields the characters 2,3, ?

We can ask the REPL



(read inputstream)
23

And it tells us that it is a thing which the printer chooses to represent as 2,3,

PushbackReaders are mutable objects, and this one is now exhausted. It has no more characters to give.

Let us create another similar stream, pass it again to the reader, and this time save the value that the reader gives us.



(def read23 (read (java.io.PushbackReader. (java.io.StringReader. "23\n"))))
#'user/read23


Another var has been created by the def, and the printer has turned it into a stream of characters for us.

But we can examine the object to which read23 refers:



(type read23)
java.lang.Integer

Aha! the reader returned, not the string of characters 2,3,, but an object of type java.lang.Integer I wonder what else we can learn about it?

Remember that it is a java class, and that we can use clojure to call java methods on it.



(.getMethods (class read23))
#

(map str (.getMethods (class read23)))

java.lang.Integer is a class with many methods and we will have an easier time understanding it if we cheat.



(use 'clojure.contrib.repl-utils)
(javadoc read23)

let's try a method. doubleValue returns the value as a string.


(.doubleValue read23)
23.0

Unless the printer is being incredibly perverse, we are dealing with a java.lang.Integer whose value is 23.

But notice that this is a very different thing from a character stream made from 2,3,

Let's forget all about the details of creating streams, and instead use the read-string function, which can be fed strings directly.

What does the reader return when we give it the characters (,+,,2,,2,)?



(def twoplustwo (read-string "(+ 2 2)\n"))
#'user/twoplustwo

(type twoplustwo)
clojure.lang.PersistentList

That's a new one.



(.count twoplustwo)
3

A list with three things in it.



(map type twoplustwo)
(clojure.lang.Symbol java.lang.Integer java.lang.Integer)

A list of one symbol, and two integers.



(javadoc clojure.lang.Symbol)

(.getName (first twoplustwo))
"+"

(.doubleValue (first (rest twoplustwo)))
2.0

So the brackets have gone. The reader's interpretation of the string "(+ 2 2)" is a list of a clojure.lang.Symbol whose name is the (java.lang.)String "+", and two java.lang.Integers whose value as doubles is 2.0.

I reckon we have this reader thing pretty well sorted out. But look at this:



(def line-noise (read-string "'foo"))
#'user/line-noise

(type line-noise)
clojure.lang.Cons

(.first line-noise)
quote

(type (.next line-noise))
clojure.lang.PersistentList

(.count (.next line-noise))
1

(type (.first (.next line-noise)))
clojure.lang.Symbol

(.getName (.first (.next line-noise)))
"foo"

So it appears that when the reader's presented with ',f,o,o It produces a Cons, whose first element is the Symbol whose name is "quote", and whose next element is a list, whose sole element is the Symbol "foo".

The printer represents this little tree as (quote foo).

But if we wrote our own printer we might well call it (cons (symbol "quote") (list (symbol "foo")))

indeed this is a clojure expression:



(cons (symbol "quote") (list (symbol "foo")))

which we can type in at the REPL, which will return the same string as if we typed in (read-string "'foo")

It appears that there are some subtleties to the reader.

Tuesday, December 1, 2009

xxdiff and hg view with mercurial

Not really clojure, but now I'm using it properly on a project where we're using mercurial for version control

Enabling xxdiff


You can link xxdiff into mercurial, but you have to modify the hgrc file


add to  $HOME/.hgrc

[extensions]
hgext.extdiff =

and then we can say

hg extdiff -p program -o options

e.g.

$ hg extdiff -p xxdiff -o -r

which causes xxdiff -r to run
or
 
$ hg extdiff -p kdiff3




We can be even more subtle by adding an explicit command for xxdiff -r

[extdiff]
cmd.xxdiff = xxdiff
opts.xxdiff = -r

which allows us to use


$ hg xxdiff

to call xxdiff directly in its recursive mode.


Enabling hg view

While we're messing about with .hgrc, add

[extensions]
hgext.extdiff=
hgk=

which will enable the hg view command.

Saturday, November 28, 2009

classpath

These expressions keep coming in handy, for finding out what clojure thinks the classpath is today:


(println (seq (.getURLs (java.lang.ClassLoader/getSystemClassLoader))))

(pprint (map (memfn getPath) (seq (.getURLs (java.lang.ClassLoader/getSystemClassLoader))))) 
 
(clojure.contrib.pprint/pprint (sort (map (memfn getPath) (seq (.getURLs (java.lang.ClassLoader/getSystemClassLoader))))))

Thanks Pablo! ( http://pupeno.com/blog/printing-the-class-path-in-clojure/ )

Namespaces and Unit Tests

OK, if I'm going to be doing this for more than toy programs I need to learn how to use the namespace and unit testing facilities

Here I'm typing at the repl, or more accurately I have this file open in emacs and I'm passing the expressions one by one to the repl.



(ns namespaces-and-tests
  (:use clojure.test))

here's a factorial, with some tests included



(with-test

    (defn factorial [n]
      (if (< n 3) n
          (* n (factorial (dec n)))))

  (is ( = (factorial 1)     1) "base case")
  (is ( = (factorial 4)     (* 1 2 3 4)) "first recursion")
  (is ( = (factorial 10)    (apply * (range 1 10))) "general case" ))



I can run my tests like so:


(run-tests)

or like


(run-tests 'namespaces-and-tests)


I can also define a separate factorial testing function, perhaps even in a different namespace



(ns separate-tests
  (:use clojure.test))

(use 'namespaces-and-tests);;however this breaks, whining something about classpaths and non-existent files
(ns-publics 'namespaces-and-tests) ;;even though we can still see the namespace and its function

(namespaces-and-tests/factorial 4) ;;and indeed it still works!

It appears that require defines libraries by filename, whilst refer talks about namespaces. use is a combination of the two.



(refer 'namespaces-and-tests) ;;try this instead
(factorial 4)      ;;aha!


So here is a more complex test for the factorial


(deftest factorial-tests
  (testing "factorial"
    (testing "base cases"
      (is (= (factorial 1) 1))
      (is (= (factorial 2) 2))
      (is (= (factorial 0) 0)))
    (testing "general case"
      (is ( = (factorial 10) (apply * (range 1 11))))
      (is ( = (factorial 10000) (apply * (range 1 10001))) "tail recursion"))
    (testing "bad arguments"
      (is (thrown? java.lang.StackOverflowError (factorial -1)))
      (is (thrown? java.lang.StackOverflowError (factorial 0.5))))))

which we can run like this:


(run-tests 'separate-tests)

or like this


(run-tests)

we can run both at once with


(run-tests 'namespaces-and-tests 'separate-tests)

or we can use


(run-all-tests)

which claims to be testing everything in the clojure libraries, and in swank, which I'm using to connect clojure to emacs, but doesn't actually seem to do any testing on them. Bug perhaps?

Tuesday, November 24, 2009

Mutable State in the Sudoku Solver

One never really understands poetry until one has attempted to translate it.

Dr. Norvig's beautiful Sudoku Solver is very much in that line.

I got myself into all sorts of knots translating the Sudoku Solver, basically because of the newbie mistake of not realising that Clojure's for is lazy.

Mutability and laziness never go well together.

I'm also scared to play with it because it's actually easy to break it in such a way that it occasionally crashes, or takes very sub-optimal routes which nevertheless find correct solutions after long times.

Those bugs are easy to spot, but I wonder if there might be subtle differences between it and Dr Norvig's version, and I'm not quite sure how to find out, even though many approaches occur.

I could replace the occasional uses of for and any? with explicitly coded loop/recur constructions, which will allow me to make things like the order of evaluation and early return on failure explicit.

But it's still quite hard to reason about the algorithm. Every function call can set off a cascade of new calls, and even with everything explicit it's difficult to work out whether a certain elimination will have been done at the point that a certain check is called.

I have a different plan, which is to explicitly keep a list of pending tasks, so that a call to assign might return both a updated grid and a list of eliminations to be done on it, or an elimination might return a list of more eliminations and checks, or a check might return new assignations to be done.

This of course will mean that we don't actually need to mutate the grid, we can just return a new copy from every function, which will bring in clojure's purely functional data structures.

I expect that to bring in a constant time slow-down of about 4, but maybe I'll be pleasantly surprised.

More importantly, it will allow me to watch the algorithm as it runs.

It already occurs to me that it is silly to run a check when it might be affected by a later elimination that affects the thing checked.

Perhaps it will be possible to reorder the list of pending tasks so that eliminations are always done as soon as they can be, and checks postponed until all eliminations have been done.

I don't know whether that will be more efficient or not, and the thing is so fast anyway that it hardly matters, but it looks like it will be fun to try.

I will report back if there's anything interesting down this road.


Sudoku Solver

;;Sudoku solver

;;As direct a translation as I could make of Peter Norvig's famous python solver
;;Which is explained in detail at:
;;http://norvig.com/sudoku.html

;;Algorithm is constraint propagation coupled with depth-first search

;;Constraint propagation is performed by mutually recursive functions modifying state
;;So in clojure we need to put our strings in atoms.


;;I split the eliminate function into two (eliminate! and check!) to make it easier to read.

(defn cross [A, B]
  (for [a A b B] (str a b)))

(def rows "ABCDEFGHI")
(def cols "123456789")
(def digits "123456789")

;;the grid is divided into subsquares
(def subsquaresize 3)
(def rowgroups (partition subsquaresize rows))
(def colgroups (partition subsquaresize cols))


;;When we encode the grids as strings we may use any of these characters to encode blank squares
(def separators "0.-")

;;Squares are indexed by strings A1 -> I9
(def squares (cross rows cols))


;;units are the groups into which squares are grouped: rows, columns and subsquares
(def unitlist (map set  (concat 
                         (for [c cols] (cross rows [c]))
                         (for [r rows] (cross [r] cols))
                         (for [rs rowgroups
                               cs colgroups] (cross rs cs)))))


;;helper functions for making maps and sets
(defn dict [x] (apply sorted-map (apply concat x)))
(defn set-union [x] (apply sorted-set (apply concat x)))

;;use clojure's every? like python's all
(defn all? [coll] (every? identity coll))


;;which units are associated with a given square?
(def units (dict (for [s squares]  
                   [s (for [u unitlist :when (u s)] u)] )))


;;which other squares are linked to a given square through its units?
(def peers (dict (for [s squares]  
                   [s (disj (set-union (units s)) s)])))


;;three mutually recursive functions to propagate constraints. All of them return false 
;;if the constraints can not be satisfied.
(declare assign! eliminate! check!)


;;filter only the significant characters from an input string
(defn strip-grid [grid] (filter (set (concat digits separators)) grid))

;;make a grid where every square can contain every digit
(defn make-grid [] (dict (for [s squares] [s,(atom digits)])))

;;turn a string representing a grid into a dictionary of possible values for each square
(defn parse_grid [grid]
  (let [grid (strip-grid grid)
        values (make-grid)]
    (if (all? (for [[square digit] (zipmap squares grid) :when ((set digits) digit)]
                  (assign! values square digit)))
      values
      false)))


;;assign a definite value to a square by eliminating all other values.    
(defn assign! [values square digit]
  (if (all? (for [d @(values square) :when (not (= d digit))] 
              (eliminate! values square d)))
    values
    false))


;;remove a potential choice from a square. If that leaves no values, then that's a fail
;;if it leaves only one value then we can also eliminate that value from its peers.
;;either way, perform checks to see whether we've left the eliminated value with only one place to go.           
(defn eliminate! [values s d]
  (if (not ((set @(values s)) d)) values ;;if it's already not there nothing to do

      (do
        (swap! (values s) #(. % replace (str d) "")) ;;remove it
        (if (= 0 (count @(values s))) ;;no possibilities left

          false                       ;;fail
          (if (= 1 (count @(values s))) ;; one possibility left
            (let [d2 (first @(values s))]
              (if (not (all? (for [s2 (peers s)] (eliminate! values s2 d2))))
                false
                (check! values s d)))
            (check! values s d))))))


;;check whether the elimination of a value from a square has caused contradiction or further assignment
;;possibilities
(defn check! [values s d]
  (loop [u (units s)] ;;for each row, column, and block associated with square s
    (let [dplaces (for [s (first u) :when ((set @(values s)) d)] s)] ;;how many possible placings of d 

      (if (= (count dplaces) 0) ;;if none then we've failed
        false
        (if (= (count dplaces) 1) ;;if only one, then that has to be the answer

          (if (not (assign! values (first dplaces) d)) ;;so we can assign it.
            false
            (if (not (empty? (rest u))) (recur (rest u)) values))
          (if (not (empty? (rest u))) (recur (rest u)) values))))))


;;the function to print out the board is the hardest thing to translate from python to clojure!
(defn centre[s width]
  (let [pad (- width (count s))
        lpad (int (/ pad 2))
        rpad (- pad lpad)]
  (str (apply str (repeat lpad " ")) s (apply str (repeat  rpad " ")))))

(defn join [char seq]
  (apply str (interpose char seq)))

(defmacro forjoin [sep [var seq] body]
  `(join ~sep (for [~var ~seq] ~body)))

(defn board [values]
  (if (= values false)
    "no solution"

    (let [ width (+ 2 (apply max (for [s squares] (count @(values s)))))
          line (str \newline 
                    (join \+ (repeat subsquaresize 
                                     (join \- (repeat subsquaresize 
                                                      (apply str (repeat width "-"))))))
                    \newline)]
      (forjoin line [rg rowgroups]
               (forjoin "\n" [r rg]
                        (forjoin "|" [cg colgroups]
                                 (forjoin " " [c cg] 
                                          (centre @(values (str r c)) width))))))))

(defn print_board [values] (println (board values)))


;;We can't use Dr Norvig's trick of avoiding a deep copy by using strings. We have to copy the table
;;by recreating the atoms and copying their contents
(defn deepcopy [values] (dict (for [k (keys values)] [k (atom @(values k))])))

;;I've added a frill here where the search function keeps track of the search branches that it's following.

;;This means that we can print the branches out when debugging.
(defn search 
  ([values] (search values ""))
  ([values, recurse] 
     (println "recursion: " recurse)
     (if values
       (if (all? (for [s squares] (= 1 (count @(values s))))) ;;if all squares determined

         values                                               ;;triumph!
         (let [ pivot 
               (second (first (sort     ;;which square has fewest choices?
                               (for [s squares :when (>(count @(values s)) 1)] 
                                 [(count @(values s)),s]))))] 
           (let [results (for [d @(values pivot)] ;;try all choices

                           (do ;(print_board values)
                               (search (assign! (deepcopy values) pivot d) (str recurse d))))] ;(format "%s->%s;" pivot d)
                (some identity results)))) ;;and if any of them come back solved, return solution

         
       false)))


;;here's a demo:
(def hardestsudokuinworld "
850002400
720000009
004000000
000107002
305000900
040000000
000080070
017000000
000036040
")

(defn solve [grid]
     (do

       (println "\nproblem:")
       (println (join \newline (map #(apply str %) (partition 9 (filter (set (concat digits separators)) grid)))))
       (println "\nsolution:")
       (print_board (search (parse_grid grid)))))

(solve hardestsudokuinworld)


;;Dr Norvig provides a couple of files of easy and difficult sudokus for demonstration purposes.
;;Here is some code to read them in and solve them

(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.duck-streams)

(def easy-sudokus (re-split #"\s*Grid\s.*\s*" (slurp "sudoku.txt")))
(def hard-sudokus (read-lines "sudoku_hard.txt"))

(defn show-off []
  (solve hardestsudokuinworld)
  (doall (map solve easy-sudokus))
  (doall (map solve hard-sudokus)))


;; Lessons learned during translation process

;; Lazy evaluation and mutation really don't work together very well.

;; Solver appeared to work but seemed to take infinite time on 3rd sudoku
;; Actually it took several hundred thousand iterations, but got the right answer
;; run next to python program showed that python code was getting there in a couple of hundred
;; Realised that constraints were not being propagated properly
;; Added doalls to every for
;; Now program crashes because last values have been eliminated without returning false

;; Actually we need loops with early return, otherwise we keep eliminating things from already false branches
;; Now notice that the doalls are actually making things slower because any? would have short-circuited once anything was false. Get rid of them and get a 2x speedup.
;; now running at half the speed of python

Followers