Learning Clojure

Thursday, July 11, 2013

Randomized Quicksort

;; Randomized Quicksort

;; Quicksort has very unfortunate performance characteristics. It's
;; very fast on randomized data, and uses very little extra memory,
;; but if there's too much order in its input, it becomes a quadratic
;; algorithm which blows the stack.

;; Such inputs are very rare, but very easy to find (there's glory for you!)

;; Luckily there's an easy fix, which is why Quicksort's a very popular sorting algorithm.

;; Here's Quicksort as we left it:

(defn swap [v i1 i2] 
   (assoc v i2 (v i1) i1 (v i2)))

(defn ^:dynamic partition-by-pivot [v start end]
  (loop [v v i (inc start) j (inc start)] ;; this extra loop just sets up the indices i and j
    (if (> j end)
      [(dec i) (swap v start (dec i))]
      (if (< (v start) (v j))
        (recur v i (inc j))
        (recur (swap v i j) (inc i) (inc j))))))

(defn ^:dynamic qsort
  ([v start end]
     (if (> (+ 1 start) end) v  ;; short array, nothing to do
         (let [[index newv] (partition-by-pivot v start end) ;; otherwise partition
               leftsorted (qsort newv start (dec index))] ;; and sort the left half
           (qsort leftsorted (inc index) end)))))


(defn quicksort [v] (let [vec (into [] v)] (qsort vec 0 (dec (count vec)))))

;; Potentially a very fast sorting algorithm, which apart from its recursion stack
;; needs no extra memory.

;; But although it reliably looks good on random input:
(time (first (quicksort (shuffle (range 2048))))) "Elapsed time: 435.670347 msecs"

;; If you feed it an array which is already in order:
(time (first (quicksort (range 2048)))) "Elapsed time: 2413.521649 msecs"
;; performance drops off horribly
(time (first (quicksort (range 4096))))
;; and the stack blows.

;; The underlying problem is that when the data has a structure which makes the pivots
;; far from the medians of the arrays to be partitioned, the recursion tree unbalances, and
;; rather than doing roughly log 2048=11 recursions, quicksort does 2048 recursions.

;; If only we could get the usual performance characteristics of quicksort on random data
;; on the sort of almost-sorted data that we often encounter in practice.

;; One way to do this, of course, would be to shuffle the data before sorting it.
;; The chances of hitting a bad case with properly shuffled data are astronomically small.
;; It would work, but isn't that an odd thing to do? And shuffling is just as difficult as sorting. 

;; Turns out we don't have to. If we pick the pivot at random, then we get the same performance.
;; There are still bad cases, but if the random number generator is uncorrelated with the data to 
;; be sorted, we'll never hit one in a million years.

;; How shall we do that? Turns out it's dead easy to frig
;; partition-by-pivot. We just swap the first element of the array
;; with a randomly chosen one before we pick the pivot:

(defn ^:dynamic partition-by-pivot [v start end]
  (let [randomel (+ start (rand-int (- end start)))
        v (swap v start randomel)] ;; swap first element with randomly chosen one
    (loop [v v i (inc start) j (inc start)] 
      (if (> j end)
        [(dec i) (swap v start (dec i))]
        (if (< (v start) (v j))
          (recur v i (inc j))
          (recur (swap v i j) (inc i) (inc j)))))))

(time (first (quicksort (range 2048)))) "Elapsed time: 666.069407 msecs"

;; The bad cases are still around. But they're very hard to find now.

;; It occurs to me that one way to blow up randomized quicksort might
;; be to use a pseudo-random number generator and a well chosen
;; algorithm to shuffle the data, and then use the same PRNG to
;; quicksort it. So if you've got a clever adversary who knows what
;; you mean by random he should be able to make your quicksorts
;; explode. 

;; We're something like a factor of twenty away from clojure's built in sort.
;; That's really not bad for an unoptimized algorithm.
(time (first (sort (shuffle (range 2048))))) "Elapsed time: 33.509778 msecs"

;; It should be possible to speed our quicksort up quite a bit (especially if
;; we restrict ourselves to arrays of integers).

;; I leave this as an exercise for the reader.

Quicksort

;; Quicksort

(shuffle (range 20)) ;; [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19]

(def original [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19])

;; The fundamental notion in quicksort is partition-by-pivot

;; The first thing we need to do is to choose the 'pivot'.
;; The obvious choice is the first element of our array, 7

;; So our fundamental operation will be to reorder the array so that everything less than 
;; 7 comes before it, and everything greater than 7 comes after it.

;; It's a fair bet that the ability to swap elements will come in handy:
(defn swap [v i1 i2] 
   (assoc v i2 (v i1) i1 (v i2)))

(swap [1 2 3] 0 1)   ;-> [2 1 3]
(swap original 1 19) ;-> [7 19 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 13]

;; Here's a function to partition-by-pivot, taking the first element of the array as
;; the pivot.

;; To understand what it's doing, notice that it leaves the pivot
;; alone at the beginning of the array, and maintains this pattern:
;; [ pivot | less than pivot | more than pivot | unscanned ]
;; in the array as it scans through:

;; Once the scan is ended we can put the pivot in its rightful place with a single swap.

(defn ^:dynamic partition-by-pivot [v i j]
  (if (= j (count v))  ;; if the scan is ended
    (swap v 0 (dec i)) ;; put the pivot in the right place
    (if (< (v 0) (v j)) ;; otherwise compare elements
      (partition-by-pivot v i (inc j)) ;; and move the boundaries/swap elements as appropriate
      (partition-by-pivot (swap v i j) (inc i) (inc j))))) ;; to preserve the pattern

;; i and j are the indices of the two moving boundaries in the vector. 
;; We don't need an index for the pivot since it stays at the front.

;; Actions having a way of speaking louder than words, so:  
(require 'clojure.tools.trace)
(clojure.tools.trace/dotrace [partition-by-pivot] (partition-by-pivot original 1 1))

;; TRACE t1304: (partition-by-pivot [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 1 1)
;; TRACE t1305: | (partition-by-pivot [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 1 2)
;; TRACE t1306: | | (partition-by-pivot [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 1 3)
;; TRACE t1307: | | | (partition-by-pivot [7 2 12 13 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 2 4)
;; TRACE t1308: | | | | (partition-by-pivot [7 2 12 13 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 2 5)
;; TRACE t1309: | | | | | (partition-by-pivot [7 2 12 13 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 2 6)
;; TRACE t1310: | | | | | | (partition-by-pivot [7 2 12 13 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 2 7)
;; TRACE t1311: | | | | | | | (partition-by-pivot [7 2 1 13 15 14 16 12 3 0 9 18 17 10 11 8 4 6 5 19] 3 8)
;; TRACE t1312: | | | | | | | | (partition-by-pivot [7 2 1 3 15 14 16 12 13 0 9 18 17 10 11 8 4 6 5 19] 4 9)
;; TRACE t1313: | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 10)
;; TRACE t1314: | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 11)
;; TRACE t1315: | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 12)
;; TRACE t1316: | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 13)
;; TRACE t1317: | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 14)
;; TRACE t1318: | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 15)
;; TRACE t1319: | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 16)
;; TRACE t1320: | | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 4 16 12 13 15 9 18 17 10 11 8 14 6 5 19] 6 17)
;; TRACE t1321: | | | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 4 6 12 13 15 9 18 17 10 11 8 14 16 5 19] 7 18)
;; TRACE t1322: | | | | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 4 6 5 13 15 9 18 17 10 11 8 14 16 12 19] 8 19)
;; TRACE t1323: | | | | | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 4 6 5 13 15 9 18 17 10 11 8 14 16 12 19] 8 20)
;; TRACE t1323: | | | | | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1322: | | | | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1321: | | | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1320: | | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1319: | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1318: | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1317: | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1316: | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1315: | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1314: | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1313: | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1312: | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1311: | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1310: | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1309: | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1308: | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1307: | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1306: | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1305: | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; TRACE t1304: => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]



;; Notice that in the result:
;; [ 5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]
;; (which is much better ordered than the original)
;; the pivot element 7 is in its correct place (element 7) already. 
;; It need never move again.

;; We'd now like to repeat the partition on the two parts of the array on either side of 7 
;; And to do this, we need to know the index where the pivot ended up.

;; Let's firstly modify our function so that it returns the final index of the pivot
(defn ^:dynamic partition-by-pivot [v i j]
  (if (= j (count v))
    [(dec i) (swap v 0 (dec i))] ;; just returning the index as well
    (if (< (v 0) (v j))
      (partition-by-pivot v i (inc j))
      (partition-by-pivot (swap v i j) (inc i) (inc j)))))


(partition-by-pivot original 1 1) ;-> [7 [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]]

;; Now we want to partition-by-pivot only the array from 0 to 6, so
;; let's further modify our routine so that it can partition just a
;; part of the array

(defn ^:dynamic partition-by-pivot [v start end]
  (loop [v v i (inc start) j (inc start)] ;; this extra loop just sets up the indices i and j
    (if (> j end)
      [(dec i) (swap v start (dec i))]
      (if (< (v start) (v j))
        (recur v i (inc j))
        (recur (swap v i j) (inc i) (inc j))))))


;; So now we can partition subarrays with respect to their first elements:
(partition-by-pivot [7 13 12 2 15 14 16 1 3 0 19 18 17 10 11 8 4 6 5 9]  0 19)
;-> [7 [5 2 1 3 0 4 6 7 13 15 19 18 17 10 11 8 14 16 12 9]]
(partition-by-pivot ['_ 13 12 2 15 14 16 1 3 0 19 18 17 10 11 8 4 6 5 '_]  1 18)
;-> [12 [_ 5 12 2 1 3 0 10 11 8 4 6 13 15 14 16 19 18 17 _]]
(partition-by-pivot ['_ '_ '_ '_ '_ 14 16 1 3 0 19 '_ '_ '_ '_ '_ '_ '_ '_ '_]  5 10) 
;-> [8 [_ _ _ _ _ 0 1 3 14 16 19 _ _ _ _ _ _ _ _ _]]

;; Once we've got this function, it's obvious how to write quicksort:

;; To quicksort, partition by a pivot and then quicksort both halves
(defn ^:dynamic qsort
  ([v start end]
     (if (> (+ 1 start) end) v  ;; short array, nothing to do
         (let [[index newv] (partition-by-pivot v start end) ;; otherwise partition
               leftsorted (qsort newv start (dec index))] ;; and sort the left half
           (qsort leftsorted (inc index) end))))) ;; and then the right half


;; This makes the interface a bit more user-friendly
(defn quicksort [v] (let [vec (into [] v)] (qsort vec 0 (dec (count vec)))))


(quicksort original) ;-> [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 16 18 19]

;; ta-daaah...

(clojure.tools.trace/dotrace [qsort partition-by-pivot] (quicksort original))

;; So how fast is quicksort? 

;; Well...

(time (first (quicksort (shuffle (range 64))))) "Elapsed time: 4.112325 msecs"
(time (first (quicksort (shuffle (range 128))))) "Elapsed time: 9.163108 msecs"
(time (first (quicksort (shuffle (range 256))))) "Elapsed time: 32.184819 msecs"
(time (first (quicksort (shuffle (range 1024))))) "Elapsed time: 135.90394 msecs"
(time (first (quicksort (shuffle (range 2048))))) "Elapsed time: 298.712637 msecs"
(time (first (quicksort (shuffle (range 4096))))) "Elapsed time: 733.392983 msecs"
(time (first (quicksort (shuffle (range 8192))))) "Elapsed time: 1461.434799 msecs"
(time (first (quicksort (shuffle (range 16384))))) "Elapsed time: 3530.573197 msecs"

;; But....

(time (first (quicksort (range 64))))   "Elapsed time: 4.902854 msecs"
(time (first (quicksort (range 128)))) "Elapsed time: 14.469927 msecs"
(time (first (quicksort (range 256)))) "Elapsed time: 36.480616 msecs"
(time (first (quicksort (range 512)))) "Elapsed time: 146.244439 msecs"
(time (first (quicksort (range 1024)))) "Elapsed time: 526.623253 msecs"
(time (first (quicksort (range 2048)))) "Elapsed time: 2131.757814 msecs"
(time (first (quicksort (range 4096)))) ;; stack overflow!!
(time (first (quicksort (range 8192))))

;; And also:

(time (first (quicksort (reverse (range 64))))) "Elapsed time: 15.704443 msecs"
(time (first (quicksort (reverse (range 128))))) "Elapsed time: 57.875302 msecs"
(time (first (quicksort (reverse (range 256))))) "Elapsed time: 170.641883 msecs"
(time (first (quicksort (reverse (range 512))))) "Elapsed time: 658.467259 msecs"
(time (first (quicksort (reverse (range 1024))))) "Elapsed time: 2569.858778 msecs"
(time (first (quicksort (reverse (range 2048))))) "Elapsed time: 11171.441353 msecs"
(time (first (quicksort (reverse (range 4096))))) ;; stack overflow!!
(time (first (quicksort (reverse (range 8192)))))

;; It looks as though it's linear-ish for random input, but quadratic (with a vast recursion depth) for at least some cases.
;; And those cases are easy to find!

Sunday, April 7, 2013

Destructuring

;; Destructuring 

;; One of Clojure's great strengths is the ability to create maps ad-hoc and to deal with them easily.

;; You can create an ad-hoc structure with named fields:

{:a 3 :b 1}

;; And unpack it, providing defaults for its values if they are not present, as easily as:

(defn add-maps [{:keys [a b] :or {a 0 b 0} }]
  (+ a b))

(add-maps {:a 3 :b 1}) ;-> 4
(add-maps {:a 3}) ;-> 3
(add-maps {}) ;-> 0
(add-maps {:c "hello"}) ;-> 0

;; I love this and use it all the time. It's well worth learning the syntax.

;; Try playing with this expression:
(take 5 (iterate (fn [{:keys[a b] :or {b 55} :as m}] {:a a :b b :m m}) {:a 0 :b 0}))

Friday, April 5, 2013

It occured to me that I was never actually going to get round to finishing off the simple plotter like I said I would. So I just pushed it to clojars and github as is. I find it pretty useful, and I've never been motivated to change it much from its current form.

http://github.com/johnlawrenceaspden/simple-plotter

[simple-plotter "0.1.1-SNAPSHOT"]

;; Barnsley's Famous Fern
;; generated by an iterated function system.

(use 'simple-plotter.core)

(defn transform [[xx xy yx yy dx dy]]
  (fn [[x y]] [(+ (* xx x) (* xy y) dx)
               (+ (* yx x) (* yy y) dy)]))

(def barnsleys-fern '((2  [  0    0     0     0.16 0 0    ])
                      (6  [  0.2 -0.26  0.23  0.22 0 0    ])
                      (7  [ -0.15 0.28  0.26  0.24 0 0.44 ])
                      (85 [  0.85 0.04 -0.004 0.85 0 1.6  ])))

(defn choose [lst] (let [n (count lst)] (nth lst (rand n))))

(defn iteration [transforms]
  (let [transform-list (mapcat (fn [[n t]] (repeat n (transform t))) transforms)]
    (fn [x] ((choose transform-list) x))))

(def barnsley-points (iterate (iteration barnsleys-fern) [0 1]))

(create-window "Barnsley's Fern" 350 450)

(ink green)
(scaled-scatter-plot (take 10000 barnsley-points) 50 300 50 400 100)

Wednesday, March 27, 2013

partition-by-equivalence

;; I keep finding that I need a function which will partition a sequence into runs of things.

;; For instance, you might want

'(1 1 1 2 3 3 4 4 4 3 3 3 5 5)

;; to go to:

'((1 1 1) (2) (3 3) (4 4 4) (3 3 3) (5 5))

;; Which is what partition-by does:

(partition-by identity '(1 1 1 2 3 3 4 4 4 3 3 3 5 5)) ;-> ((1 1 1) (2) (3 3) (4 4 4) (3 3 3) (5 5))

;; But partition-by isn't quite what I want.

;; I'd like to be able to turn

'( 1 2 3 4 1 2 3 2 3 4 6 9 10)

;; Into

'((1 2 3 4) (1 2 3) (2 3 4) (6) (9 10))

;; By defining a comparator like:

(defn ^:dynamic sameish [a b] ( = (inc a) b))

;; And then saying:
;; (partition-by-equivalence sameish '( 1 2 3 4 1 2 3 2 3 4 6 9 10))

;; Doing this by hand:

;; () () '( 1 2 3 4 1 2 3 2 3 4 6 9 10)
;; ->
;; () (1) (2 3 4 1 2 3 2 3 4 6 9 10)
;; ->
;; () (1 2) (3 4 1 2 3 2 3 4 6 9 10)
;; ->
;; () (1 2 3) (4 1 2 3 2 3 4 6 9 10)
;; ->
;; () (1 2 3 4) (1 2 3 2 3 4 6 9 10)
;; -> test is false, start a new list
;; ((1 2 3 4)) (1) (2 3 2 3 4 6 9 10)
;; ->
;; ((1 2 3 4)) (1 2) (3 2 3 4 6 9 10)
;; ->
;; ((1 2 3 4)) (1 2 3) (2 3 4 6 9 10)
;; -> test is false, start a new list
;; ((1 2 3 4) (1 2 3))  (2 3 4 6 9 10)
;; ->
;; ((1 2 3 4) (1 2 3) (2))  (3 4 6 9 10)
;; -> etc

;; makes me think that this looks like a tail recursion with two accumulators

(defn ^:dynamic recaccacc [ f acc1 acc2 coll]
  (if (empty? coll) (cons acc2 acc1)
      (if (empty? acc2) (recaccacc f acc1 (cons (first coll) acc2) (rest coll))
          (if (f (first acc2) (first coll))
            (recaccacc f acc1 (cons (first coll) acc2) (rest coll))
            (recaccacc f (cons acc2 acc1) '() coll)))))



;; Unfortunately, this comes out backwards
(use 'clojure.tools.trace)

(dotrace [recaccacc] (recaccacc  sameish '() '() '(1 2 3 4 1 2 3 2 3 4 6 9 10)))

;; TRACE t1169: (recaccacc #<user$sameish user$sameish@11b99c4> () () (1 2 3 4 1 2 3 2 3 4 6 9 10))
;; TRACE t1170: | (recaccacc #<user$sameish user$sameish@11b99c4> () (1) (2 3 4 1 2 3 2 3 4 6 9 10))
;; TRACE t1171: | | (recaccacc #<user$sameish user$sameish@11b99c4> () (2 1) (3 4 1 2 3 2 3 4 6 9 10))
;; TRACE t1172: | | | (recaccacc #<user$sameish user$sameish@11b99c4> () (3 2 1) (4 1 2 3 2 3 4 6 9 10))
;; TRACE t1173: | | | | (recaccacc #<user$sameish user$sameish@11b99c4> () (4 3 2 1) (1 2 3 2 3 4 6 9 10))
;; TRACE t1174: | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((4 3 2 1)) () (1 2 3 2 3 4 6 9 10))
;; TRACE t1175: | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((4 3 2 1)) (1) (2 3 2 3 4 6 9 10))
;; TRACE t1176: | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((4 3 2 1)) (2 1) (3 2 3 4 6 9 10))
;; TRACE t1177: | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((4 3 2 1)) (3 2 1) (2 3 4 6 9 10))
;; TRACE t1178: | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((3 2 1) (4 3 2 1)) () (2 3 4 6 9 10))
;; TRACE t1179: | | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((3 2 1) (4 3 2 1)) (2) (3 4 6 9 10))
;; TRACE t1180: | | | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((3 2 1) (4 3 2 1)) (3 2) (4 6 9 10))
;; TRACE t1181: | | | | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((3 2 1) (4 3 2 1)) (4 3 2) (6 9 10))
;; TRACE t1182: | | | | | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((4 3 2) (3 2 1) (4 3 2 1)) () (6 9 10))
;; TRACE t1183: | | | | | | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((4 3 2) (3 2 1) (4 3 2 1)) (6) (9 10))
;; TRACE t1184: | | | | | | | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((6) (4 3 2) (3 2 1) (4 3 2 1)) () (9 10))
;; TRACE t1185: | | | | | | | | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((6) (4 3 2) (3 2 1) (4 3 2 1)) (9) (10))
;; TRACE t1186: | | | | | | | | | | | | | | | | | (recaccacc #<user$sameish user$sameish@11b99c4> ((6) (4 3 2) (3 2 1) (4 3 2 1)) (10 9) ())
;; TRACE t1186: | | | | | | | | | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1185: | | | | | | | | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1184: | | | | | | | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1183: | | | | | | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1182: | | | | | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1181: | | | | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1180: | | | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1179: | | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1178: | | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1177: | | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1176: | | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1175: | | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1174: | | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1173: | | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1172: | | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1171: | | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1170: | => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))
;; TRACE t1169: => ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))



;-> ((10 9) (6) (4 3 2) (3 2 1) (4 3 2 1))

;; Which we can fix:

(reverse (map reverse (recaccacc sameish '() '() '(1 2 3 4 1 2 3 2 3 4 6 9 10))))
;-> ((1 2 3 4) (1 2 3) (2 3 4) (6) (9 10))

;; Hooray!

;; So our first definition is

(defn partition-by-equivalence [f coll]
  (let [recaccacc (fn [f acc1 acc2 coll]
                    (if (empty? coll) (reverse (cons (reverse acc2) acc1))
                        (if (empty? acc2) (recur f acc1 (cons (first coll) acc2) (rest coll))
                            (if (f (first acc2) (first coll))
                              (recur f acc1 (cons (first coll) acc2) (rest coll))
                              (recur f (cons (reverse acc2) acc1) '() coll)))))]
    (recaccacc f '() '() coll)))



(partition-by-equivalence sameish '(1 2 3 4 1 2 3 2 3 4 6 9 10)) ;-> ((1 2 3 4) (1 2 3) (2 3 4) (6) (9 10))

(partition-by-equivalence sameish '()) ;-> (())
(partition-by-equivalence sameish '(1)) ;-> ((1))
(partition-by-equivalence sameish '(1 1)) ;-> ((1) (1))
(partition-by-equivalence sameish '(1 2)) ;-> ((1 2))
(partition-by-equivalence sameish '(1 2 1)) ;-> ((1 2) (1))
(partition-by-equivalence sameish '(1 2 1 1)) ;-> ((1 2) (1) (1))
(partition-by-equivalence sameish '(1 2 1 1 2 2)) ;-> ((1 2) (1) (1 2) (2))

;; Here's some incomprehensible maths-stuff about numbers of digits and logarithms and so on.
(map count (partition-by (fn[a] (int (Math/log a))) (range 1 10000))) ; (2 5 13 34 94 255 693 1884 5123 1896)
(partition-by-equivalence (fn [a b] (= (int (Math/log a)) (int (Math/log b)))) (range 1 100)) ;-> ((1 2) (3 4 5 6 7) (8 9 10 11 12 13 14 15 16 17 18 19 20) (21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54) (55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99))
(partition-by-equivalence (fn [a b] (= (int (Math/log10 a)) (int (Math/log10 b)))) (range 1 100)) ;-> ((1 2 3 4 5 6 7 8 9) (10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99))

;; ascending subsequences
(partition-by-equivalence <= '(1 2 3 3 4 5 7 8 9 1 2 5 6 1 7 8))
;-> ((1 2 3 3 4 5 7 8 9) (1 2 5 6) (1 7 8))

;; strictly ascending subsequences
(partition-by-equivalence < '(1 2 3 3 4 5 7 8 9 1 2 5 6 1 7 8))
;-> ((1 2 3) (3 4 5 7 8 9) (1 2 5 6) (1 7 8))


;; lengths of increasing runs
(map count (partition-by-equivalence <= '(1 2 3 3 4 5 7 8 9 1 2 5 6 1 7 8)) ) ;-> (9 4 3)
;; lengths of decreasing ones
(map count (partition-by-equivalence >= '(1 2 3 3 4 5 7 8 9 1 2 5 6 1 7 8)) ) ;-> (1 1 2 1 1 1 1 2 1 1 2 1 1)

;; and finally, a simplified version of the latest problem I actually needed this for, pulling a sequence of lists of scores out of a log file
;; so that each full score list only appears once, and all its ancestors are discarded.
(map last (partition-by-equivalence (fn[a b] (= a (drop 1 b))) '( () (1) (2 1) (3 2 1) () (9) (7 9)))) ;-> ((3 2 1) (7 9))




;; It's a strict generalization of partition-by
(defn my-partition-by [f coll]
  (partition-by-equivalence (fn[a b] (= (f a) (f b))) coll))

(map #(/ (Math/log %) (Math/log 2)) (range 1 100))
(my-partition-by #(int(/ (Math/log %) (Math/log 2))) (range 1 100))


;; And I think it's a really nice function, which is helpful in all sort of situations.

;; It should be possible to make it completely lazy, so that it can take infinite inputs without wolfing the lot.

Rerum Cognoscere Causas IV : How to Deal with Larger Samples

;; Rerum Cognoscere Causas IV : How to Deal with Larger Samples.

;; So far we have:

;; Our data, generated at random from a secret algorithm:

(def ^:dynamic *randomizer* (java.util.Random. 0))

(defn rand-int [n] (.nextInt *randomizer* n))

(defn D6 [] (inc (rand-int 6)))

(defn three-D6 [] (reduce + (repeatedly 3 D6)))

(defn three-from-four-D6 [] (reduce + (drop 1 (sort (repeatedly 4 D6)))))

(defn mixed []
  (if (zero? (rand-int 10))
    (three-from-four-D6)
    (three-D6)))

(defn first-edition [] {:str (three-D6) :int (three-D6)})

(defn second-edition [] {:str (mixed) :int (mixed)})

(defn third-edition []
  (if (zero? (rand-int 10))
    {:str (three-from-four-D6) :int (three-from-four-D6)}
    {:str (three-D6) :int (three-D6)}))

(def village
  (binding [*randomizer* (java.util.Random. 0)]
    (doall (repeatedly 100 (case (rand-int 3)
                    0 first-edition
                    1 second-edition
                    2 third-edition)))))


;; And the calculations of our sages, who have determined with prodigious effort:
;; the sides on a six sided die
(def r16 (range 1 7))

;; the probabilities of each result for each suggested method of generating a characteristic
(def threed6f (frequencies (for [i r16 j r16 k r16] (reduce + [i j k]))))

(def fourd6drop1f (frequencies (for [i r16 j r16 k r16 l r16]
                     (reduce + (drop 1 (sort [i j k l]))))))

(defn p3d6 [char] (/ (threed6f char) (reduce + (vals threed6f))))

(defn p4d6drop1 [char] (/ (fourd6drop1f char) (reduce + (vals fourd6drop1f))))

(defn pmixed [char] (+ (* 9/10 (p3d6 char)) (* 1/10 (p4d6drop1 char))))

;; And thus the probabilities of a villager with particular characteristics coming into being under their scheme

(def ptrad (memoize (fn [{:keys [str int]}]  (* (p3d6 int) (p3d6 str)))))

(def pindep(memoize (fn [{:keys [str int]}] (* (pmixed int) (pmixed str)))))

(def pcommon (memoize (fn [{:keys [str int]}]
  (+
   (* 9/10 (p3d6 int) (p3d6 str))
   (* 1/10 (p4d6drop1 int) (p4d6drop1 str))))))


;; using these calculations, we can take beliefs
(def prior [1 1 1])

;; and update them when we find new evidence.
(defn update [beliefs villager] 
  (map * ((juxt ptrad pindep pcommon) villager) beliefs))

;; since ratios are a bit unreadable, this function will allow us to render them as (approximate) percentages.
(defn approx-odds [[a b c]]
  (let [m (/ (+ a b c) 100)]
    (mapv int [(/ a m) (/ b m) (/ c m)])))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Earlier we found that there's not really enough information in our village to convince anyone one way or the other.
(approx-odds (reduce update prior village)) ;-> [15 31 52]

;; It points half-heartedly towards a common cause model, but we wouldn't be surprised if that was the wrong answer.

;; As it happens, our village is just one of ten in the district
(def district
  (binding [*randomizer* (java.util.Random. 0)]
    (doall (repeatedly 1000 (case (rand-int 3)
                    0 first-edition
                    1 second-edition
                    2 third-edition)))))

(def villages (partition 100 district))

;; paranoid check
(= village (first villages)) ;-> true

;; So let's see what conclusions we can draw from each individual village

(for [v villages] 
  (approx-odds (reduce update prior v)))
;-> ([15 31 52] [53 29 17] [29 38 32] [63 21 14] [34 37 27] [36 27 36] [39 38 22] [49 30 19] [48 24 27] [20 37 42])

;; Some villages are pointing one way, and some the other. 

;; We might want to consider the district as a whole:
;; (approx-odds (reduce update prior district))

;; But unfortunately this expression takes a while to evaluate. Can you see why?

;; Here's a clue:
(reduce update [1.0 1.0 1.0] (take 100 district)) ;-> (1.0019121633199549E-224 2.0261656590064221E-224 3.398128024413593E-224)
;; And rather more worryingly
(reduce update [1.0 1.0 1.0] district) ;-> (0.0 0.0 0.0)

;; It looks as though mathematics itself is failing us!

;; We need to find a new way to multiply numbers.

;; Fortunately we can add their logarithms instead:

(+ (Math/log 6) (Math/log 6)) ;-> 3.58351893845611
(Math/log (* 6 6)) ;-> 3.58351893845611

;; So let's make a logarithmic version of our update function
(defn log-update [beliefs villager] 
  (map + (map #(Math/log %) ((juxt ptrad pindep pcommon) villager)) beliefs))

;; And of our prior:
(def log-prior (map #(Math/log %) prior))

;; And try that on the village
(reduce log-update log-prior village)
;; (-515.7771504932035 -515.0729156616442 -514.5558361317242)

;; If we know the logs, we can get the numbers we actually want:
(map #(Math/exp %) (reduce log-update log-prior village)) 
;-> (1.0019121633198053E-224 2.0261656590065323E-224 3.398128024414225E-224)
;; Notice that these numbers are not exactly the same as the numbers calculated above. 

;; Exercise for the reader: which of the two answers we can calculate is closer to the answer we would like to calculate but can't?

;; We can then turn those numbers into percentages
(approx-odds (map #(Math/exp %) (reduce log-update log-prior village))) 
;-> [15 31 52]

;; So we can do our calculation on the district as a whole:

(reduce log-update log-prior district)
;;-> (-4967.368738149676 -4968.862029975447 -4970.195021140233)

;; Hooray, now we have the logarithms of the numbers we really want!

;; That's done our probability calculation for us, but unfortunately we can't actually recover
;; the number that -4967.36... is the logarithm of:
(Math/exp -4967)
;; Because it's too small to fit into floating point arithmetic.

;; So we need to pull another rabbit out of our hat:

;; All we're interested in is the ratios of the three likelihoods.

;; That's not affected by multiplying through by a constant
(approx-odds [2 1 1]) ; [50 25 25] 
(approx-odds [6 3 3]) ; [50 25 25] 
(approx-odds [1332 666 666]) ; [50 25 25]

;; Similarly, if the likelihoods are expressed as logarithms, the odds ratio
;; isn't affected by adding a constant.
(approx-odds (map #(Math/exp %) [2 1 1])) ; [57 21 21]
(approx-odds (map #(Math/exp %) [3 2 2])) ; [57 21 21]
(approx-odds (map #(Math/exp %) [4 3 3])) ; [57 21 21]

;; What we'd like to be able to calculate is what this would be:
;; (approx-odds (map #(Math/exp %) '(-4967.368738149676 -4968.862029975447 -4970.195021140233)))

;; But we can't, since the Math/exp function is broken on numbers this low.


;; So we'll calculate instead (adding 4967 to each log-likelihood):
(approx-odds (map #(Math/exp %)
                  '(-0.368738149676 -1.862029975447 -3.195021140233))) 

;-> [77 17 4]


;; So after looking at a full thousand people, with perfect models and
;; a perfect prior, looking for what you would have thought was a
;; pretty obvious effect, the existence of gifted superbeings, we're
;; still in a bit of a dubious position.

;; If we decide 'that's good enough', and declare that we live in a
;; first e'dition world, then we've still got a fair chance of being
;; wrong.

;; We can generalize our procedure thus:

(defn log-update [beliefs villager] 
  (doall (map + (map #(Math/log %) ((juxt ptrad pindep pcommon) villager)) beliefs))) ; the doall is papering over a bug in clojure's lazy sequences

(defn log-posterior [log-prior data]
  (reduce log-update log-prior data))

(defn log-prior [& s]
  (map #(Math/log %) s))

(defn percentages-from-log-beliefs [beliefs]
  (let [bmax (apply max beliefs)
        odds (for [b beliefs] (Math/exp (- b bmax)))]
    (approx-odds odds)))

;; paranoid checking again:
(percentages-from-log-beliefs (log-posterior (log-prior 1 1 1) village)) ;-> [15 31 52]
(percentages-from-log-beliefs (log-posterior (log-prior 1 1 1) district)) ;-> [77 17 4]


;; And look at an even larger sample:
(def country
  (binding [*randomizer* (java.util.Random. 0)]
    (doall (repeatedly 10000 (case (rand-int 3)
                               0 first-edition
                               1 second-edition
                               2 third-edition)))))

(percentages-from-log-beliefs (log-posterior (log-prior 1 1 1) (take 100 country))) ; [15 31 52]
(percentages-from-log-beliefs (log-posterior (log-prior 1 1 1) (take 300 country))) ; [27 39 32]
(percentages-from-log-beliefs (log-posterior (log-prior 1 1 1) (take 1000 country))) ; [77 17 4]
(percentages-from-log-beliefs (log-posterior (log-prior 1 1 1) (take 1500 country))) ; [96 3 0]

(percentages-from-log-beliefs (log-posterior (log-prior 1 1 1) (take 3000 country))) ; [99 0 0]

;; Finally, this looks decisive!
;; And everyone agrees. (Their prior beliefs are *overwhelmed* by the evidence)
(percentages-from-log-beliefs (log-posterior (log-prior 10 1 1) (take 3000 country))) ; [99 0 0]
(percentages-from-log-beliefs (log-posterior (log-prior 1 10 1) (take 3000 country))) ; [99 0 0]
(percentages-from-log-beliefs (log-posterior (log-prior 1 1 10) (take 3000 country))) ; [99 0 0]

;; If we're completely paranoid:
(percentages-from-log-beliefs (log-posterior (log-prior 1 1 1) (take 10000 country))) ; [100 0 0]

;; That's close enough for government work.





;; So, in summary, we're looking at a situation where we understand perfectly what's going on, but can't directly see which of the three alternatives

;; (a) everyone's on a bell curve
;; (b) it's not quite a bell curve, it's a bit biased towards good scores
;; (c) there is a sub-race of gifted superbeings

;; was chosen by our Dungeon Master. We just have to look at characteristic scores.

;; And all our measurements are perfect, there is no noise in the data at all, and no systematic bias in our sampling.

;; And we are using, as far as I know, platonically ideal statistical
;; methods perfectly suited to the problem in hand, and guaranteed to
;; extract absolutely all the significance from our data that there
;; is.

;; And still, we need one thousand data points to get a feel for which way the wind is blowing (one hundred was actively misleading!)
;; And rather more than that to make people change their minds.

;; I would be most interested to know if anyone thinks I've done this analysis wrongly. 
;; Things I am not quite certain of are the randomness of the random number generator and the behaviour of the floating point numbers.
;; I'm reasonably confident of the basic method of comparing the three models, and so that means I'll be even more interested and grateful
;; if someone can show me that I'm talking rubbish.

;; If it's true, it makes me wonder how it is possible to know anything at all about anything interesting.

;; Next time I see a study claiming to show something, I might see if I can make this sort of analysis work on it.

Rerum Cognoscere Causas III : What can we tell from our small sample?

;; Rerum Cognoscere Causas III : What can we tell from our small sample?

;; So far we have:

;; Our data, generated at random from a secret algorithm:

(def ^:dynamic *randomizer* (java.util.Random. 0))

(defn rand-int [n] (.nextInt *randomizer* n))

(defn D6 [] (inc (rand-int 6)))

(defn three-D6 [] (reduce + (repeatedly 3 D6)))

(defn three-from-four-D6 [] (reduce + (drop 1 (sort (repeatedly 4 D6)))))

(defn mixed []
  (if (zero? (rand-int 10))
    (three-from-four-D6)
    (three-D6)))

(defn first-edition [] {:str (three-D6) :int (three-D6)})

(defn second-edition [] {:str (mixed) :int (mixed)})

(defn third-edition []
  (if (zero? (rand-int 10))
    {:str (three-from-four-D6) :int (three-from-four-D6)}
    {:str (three-D6) :int (three-D6)}))

(def village
  (binding [*randomizer* (java.util.Random. 0)]
    (doall (repeatedly 100 (case (rand-int 3)
                    0 first-edition
                    1 second-edition
                    2 third-edition)))))


village ;-> ({:str 13, :int 18} {:str 11, :int 18} {:str 14, :int 15} {:str 6, :int 12} {:str 14, :int 13} {:str 18, :int 10} {:str 15, :int 11} {:str 12, :int 15} {:str 7, :int 8} {:str 16, :int 12} {:str 8, :int 7} {:str 9, :int 14} {:str 10, :int 9} {:str 11, :int 10} {:str 5, :int 10} {:str 7, :int 9} {:str 9, :int 13} {:str 12, :int 9} {:str 13, :int 9} {:str 5, :int 9} {:str 8, :int 13} {:str 9, :int 11} {:str 13, :int 14} {:str 12, :int 14} {:str 12, :int 17} {:str 14, :int 9} {:str 10, :int 11} {:str 18, :int 17} {:str 11, :int 9} {:str 8, :int 9} {:str 15, :int 13} {:str 8, :int 5} {:str 11, :int 9} {:str 10, :int 8} {:str 9, :int 12} {:str 5, :int 11} {:str 10, :int 7} {:str 9, :int 14} {:str 11, :int 9} {:str 11, :int 12} {:str 12, :int 13} {:str 15, :int 9} {:str 12, :int 12} {:str 6, :int 13} {:str 5, :int 4} {:str 12, :int 13} {:str 15, :int 10} {:str 14, :int 14} {:str 11, :int 4} {:str 12, :int 9} {:str 10, :int 12} {:str 7, :int 12} {:str 8, :int 11} {:str 10, :int 10} {:str 9, :int 8} {:str 8, :int 12} {:str 7, :int 9} {:str 13, :int 3} {:str 14, :int 9} {:str 8, :int 9} {:str 10, :int 11} {:str 15, :int 4} {:str 10, :int 11} {:str 8, :int 10} {:str 15, :int 10} {:str 8, :int 13} {:str 12, :int 5} {:str 8, :int 16} {:str 4, :int 8} {:str 10, :int 18} {:str 12, :int 12} {:str 11, :int 10} {:str 12, :int 8} {:str 12, :int 13} {:str 8, :int 12} {:str 9, :int 12} {:str 12, :int 10} {:str 15, :int 10} {:str 8, :int 11} {:str 7, :int 11} {:str 4, :int 8} {:str 12, :int 11} {:str 13, :int 9} {:str 14, :int 13} {:str 5, :int 9} {:str 17, :int 10} {:str 8, :int 13} {:str 9, :int 10} {:str 5, :int 14} {:str 15, :int 12} {:str 13, :int 13} {:str 11, :int 8} {:str 8, :int 6} {:str 12, :int 8} {:str 10, :int 3} {:str 14, :int 9} {:str 15, :int 12} {:str 15, :int 14} {:str 6, :int 10} {:str 16, :int 13})

;; And the calculations of our sages, who have determined with prodigious effort:
;; the sides on a six sided die
(def r16 (range 1 7))

;; the probabilities of each result for each suggested method of generating a characteristic
(def threed6f (frequencies (for [i r16 j r16 k r16] (reduce + [i j k]))))

(def fourd6drop1f (frequencies (for [i r16 j r16 k r16 l r16]
                     (reduce + (drop 1 (sort [i j k l]))))))

(defn p3d6 [char] (/ (threed6f char) (reduce + (vals threed6f))))

(defn p4d6drop1 [char] (/ (fourd6drop1f char) (reduce + (vals fourd6drop1f))))

(defn pmixed [char] (+ (* 9/10 (p3d6 char)) (* 1/10 (p4d6drop1 char))))

;; And thus the probabilities of a villager with particular characteristics coming into being under their scheme

(def ptrad (memoize (fn [{:keys [str int]}]  (* (p3d6 int) (p3d6 str)))))

(def pindep(memoize (fn [{:keys [str int]}] (* (pmixed int) (pmixed str)))))

(def pcommon (memoize (fn [{:keys [str int]}]
  (+
   (* 9/10 (p3d6 int) (p3d6 str))
   (* 1/10 (p4d6drop1 int) (p4d6drop1 str))))))


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Very strong, very clever villagers are more likely under the common
;; cause model than under the independent mixed model and even less
;; likely under the traditional model:

(apply < ((juxt ptrad pindep pcommon) {:str 18 :int 18})) ;-> true

;; Let us imagine a neutral observer, employed to fairly determine the correctness of the three schools.

;; He starts off looking at each school's argument, and can see no way
;; to decide between the two, so he assigns them odds of 1:1:1 ,
;; meaning that he thinks each one is equally likely, or equivalently,
;; that he will take or place bets on any of the three schools at odds
;; of 2:1 against ( 1 gold piece gets you 2 if you picked right ),
;; (plus a small commission for bookkeeping and accepting risk.)

(def prior [1 1 1])

;; He now considers the first villager:

(def Olaf (first village))

;; Who is strong, and extremely clever:

Olaf ;-> {:str 13, :int 18}

;; And he considers the probability that Olaf would exist under each of the models.

(ptrad Olaf)   ;-> 7/15552
(pindep Olaf)  ;-> 653/1119744
(pcommon Olaf) ;-> 217/349920

;; Having observed some data, he now considers that he should re-weight his beliefs in the three models accordingly:

(map * ((juxt ptrad pindep pcommon) Olaf) prior) ;-> (7/15552 653/1119744 217/349920)

;; A ratio of odds can always be rescaled. 10:5 is the same as 2:1.

(map #(float (* 15552/7 %)) (map * ((juxt ptrad pindep pcommon) Olaf) prior))
;-> (1.0 1.2956349 1.3777778)

;; So here's a function to take any odds ratio and turn it into (approximate) percentages
(defn approx-odds [[a b c]]
  (let [m (/ (+ a b c) 100)]
    (mapv int [(/ a m) (/ b m) (/ c m)])))

(approx-odds (map * ((juxt ptrad pindep pcommon) Olaf) prior)) ;-> [27 35 37]

;; And we can now see that the arbitrator's judgement has shifted a little away from the traditionalists, and towards the second and third e'ditions.

;; This is reasonable, given that he has just observed a man who is both cleverer and stronger than would be expected by the traditionalists

;; Suppose he had seen Magnus instead:
(def Magnus (second village))

Magnus ;-> {:str 11, :int 18}

;; Magnus is average in strength, but very clever. This is probably more likely under the independent rules than it is under the common cause rules.

(approx-odds (map * ((juxt ptrad pindep pcommon) Magnus) prior)) ;-> [28 35 35]

;; Seeing Magnus should change a neutral person's beliefs towards the second e'dition, mostly at the expense of the first.

;; But of course, our observer has seen both:

(approx-odds
 (map * ((juxt ptrad pindep pcommon) Magnus)
      (map * ((juxt ptrad pindep pcommon) Olaf)
           prior))) ;-> [23 37 39]

;; Implying that the combined effect of both men is to discredit the traditional school while slightly favouring the common-cause hypothesis.

;; We could generate the series representing how the assessors beliefs should change as he 
;; considers each villager like this:

(reductions 
 (fn [beliefs villager] (map * ((juxt ptrad pindep pcommon) villager) beliefs))
 prior
 (list Magnus Olaf))
;-> ([1 1 1] (1/1728 803/1119744 247/349920) (7/26873856 524359/1253826625536 53599/122444006400))

;; More readably, we can separate the function which updates our beliefs given a datum.
(defn update [beliefs villager] 
  (map * ((juxt ptrad pindep pcommon) villager) beliefs))

(map approx-odds
 (reductions update prior (list Magnus Olaf)))
;-> ([33 33 33] [28 35 35] [23 37 39])

;; What if we look at the first ten villagers?
(map approx-odds
 (reductions update prior (take 10 village)))
;-> ([33 33 33] [27 35 37] [23 37 39] [19 37 42] [20 37 41] [18 38 43] [16 40 43] [14 41 44] [13 41 45] [14 40 45] [12 40 46])

;; The first twenty:
(approx-odds
 (reduce update prior (take 20 village)))
;-> [19 36 44]

;; And at the whole village?
(approx-odds
 (reduce update prior (take 100 village)))
;;-> [15 31 52]


;; So if we look at our whole village, it looks as though we'd be
;; slightly more confident that we lived in a third e'dition world.

;; But we're really not terribly confident about that.
;; We know that the models and priors are spot on, since
;; we've seen the source code for the world.

;; But even then, if we declared on the basis of this one village that
;; we lived in a third edition world, we'd literally expect to be
;; wrong half the time.  

;; That's only a slight improvement on being wrong two thirds of the
;; time, which we'd expect if we hadn't bothered to look at any data at
;; all. Most of our opinion is coming from our prior beliefs and there
;; is really very little evidence in the data that we have.

;; Another way to look at this is to ask what happens to the beliefs of the three competing philosophical schools.

;; Suppose each of the three is initially so convinced of its own
;; correctness that it will place or lay bets at 5:1 on its pet
;; hypothesis. But that each school will adjust its beliefs as it sees evidence.

;; So the first e'dition guys will suffer a serious loss of confidence
(approx-odds
 (reduce update [10 1 1] (take 100 village))) ; [64 13 22]
;; But not actually change their minds, just the strength of their convictions.

;; The second e'dition guys barely notice.
(approx-odds
 (reduce update [1 10 1] (take 100 village))) ; [4 82 13]

;; While the third e'dition find their beliefs slightly strengthened
;; after looking at the evidence.
(approx-odds
 (reduce update [1 1 10] (take 100 village))) ; [2 5 91]

;; Clearly more research is needed!

Learning Clojure

About Me

Blog Archive

Search This Blog